Good quality images are usually made from planes and not satellites. The plane has a GPS, so knows its position. The image is usually taken straight down, so the center point of each picture is also rather good. But the images need to be stitched together: two images next to each other need to be skewed and stretched so the edges fit.
On flat ground, that's pretty simple. The camera has a certain viewing angle, so if you know the plane's altitude, you can calculate the ground distances from it. But whenever you are in mountainous terrain, this calculation becomes a lot harder (a high point will appear closer to the image center than a low point).
This gives the interesting result that higher-quality photos (which are usually taken by flying lower) are harder to align as the terrain effects are more severe. And lower-quality photos are taken from high and are usually better to align.
In the given example, it appears that Bing will be the lower resolution source, but have higher accuracy. And Maxar imagery will be higher resolution and lower accuracy. Note that this is not universally true. Most of these imagery providers buy pre-made sets of imagery, so they are not uniform in quality.
And high-resolution imagery can be aligned quite precisely too, when local measurements are used (buildings are measured on the ground, and used as reference points). But this is a lot more costly to do, and probably only worth it in rich countries.
Sometimes you can also find the places where the images are stitched together. Like in the example below, the street should clearly be connected.
![alt text][1]
[1]: https://help.openstreetmap.org/upfiles/2020-04-30-171425_461x282_scrot.pnghttps://help.openstreetmap.org/upfiles/2020-04-30-171425_461x282_scrot.png
So in short, you can keep the alignment for a small group of features. But if you pan away, you will need to realign.