Structure from Motion: Revealing the 3D World

Structure from Motion: Revealing the 3D World

Discover how to determine the 3D structure of the world and camera motion using multiple images. This technique enables us to understand the world as it exists in three dimensions.

  • Uploaded on | 1 Views
  • carleigh carleigh

About Structure from Motion: Revealing the 3D World

PowerPoint presentation about 'Structure from Motion: Revealing the 3D World'. This presentation describes the topic on Discover how to determine the 3D structure of the world and camera motion using multiple images. This technique enables us to understand the world as it exists in three dimensions.. The key topics included in this slideshow are Structure from Motion, 3D structure, Camera motion, Sequential images, Focal length,. Download this presentation absolutely free.

Presentation Transcript

1. Announcements

2. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera. Equivalently, we can think of the world as moving and the camera as fixed. Like stereo, but the position of the camera isnt known (and its more natural to use many images with little motion between them, not just two with a lot of motion). We may or may not assume we know the parameters of the camera, such as its focal length.

3. Structure-from-Motion As with stereo, we can divide problem: Correspondence. Reconstruction. Again, well talk about reconstruction first. So for the next few classes we assume that each image contains some points, and we know which points match which.

4. Structure-from-Motion

5. Movie

6. Reconstruction A lot harder than with stereo. Start with simpler case: scaled orthographic projection (weak perspective). Recall, in this we remove the z coordinate and scale all x and y coordinates the same amount.

7. First: Represent motion Well talk about a fixed camera, and moving object. Key point: Points Some matrix The image Then:

8. Remember what this means. We are representing moving a set of points, projecting them into the image, and scaling them. Matrix multiplication: take inner product between each row of S and each point. First row of S produces X coordinates, while second row produces Y. Projection occurs because S has no third row. Translation occurs with tx and ty. Scaling can be encoded with a scale factor in S. The rest of S must be allowing the object to rotate.

9. Examples: S = [s, 0, 0, 0; 0, s, 0, 0]; This is just projection, with scaling by s. S = [s, 0, 0, s*tx; 0, s, 0, s*ty]; This is translation by (tx,ty,something), projection, and scaling.

10. Structure-from-Motion S encodes: Projection: only two lines Scaling, since S can have a scale factor. Translation, by tx and ty. Rotation:

11. Rotation Represents a 3D rotation of the points in P.

12. First, look at 2D rotation (easier) Matrix R acts on points by rotating them. Also, RR T = Identity. R T is also a rotation matrix, in the opposite direction to R.

13. Why does multiplying points by R rotate them? Think of the rows of R as a new coordinate system. Taking inner products of each points with these expresses that point in that coordinate system. This means rows of R must be orthonormal vectors (orthogonal unit vectors). Think of what happens to the points (1,0) and (0,1). They go to (cos theta, -sin theta), and (sin theta, cos theta). They remain orthonormal, and rotate clockwise by theta. Any other point, (a,b) can be thought of as a(1,0) + b(0,1). R(a(1,0)+b(0,1) = Ra(1,0) + Ra(0,1) = aR(1,0) + bR(0,1). So its in the same position relative to the rotated coordinates that it was in before rotation relative to the x, y coordinates. That is, its rotated.

14. Simple 3D Rotation Rotation about z axis. Rotates x,y coordinates. Leaves z coordinates fixed.

15. Full 3D Rotation Any rotation can be expressed as combination of three rotations about three axes. Rows (and columns) of R are orthonormal vectors. R has determinant 1 (not -1).

16. Intuitively, it makes sense that 3D rotations can be expressed as 3 separate rotations about fixed axes. Rotations have 3 degrees of freedom; two describe an axis of rotation, and one the amount. Rotations preserve the length of a vector, and the angle between two vectors. Therefore, (1,0,0), (0,1,0), (0,0,1) must be orthonormal after rotation. After rotation, they are the three columns of R. So these columns must be orthonormal vectors for R to be a rotation. Similarly, if they are orthonormal vectors (with determinant 1) R will have the effect of rotating (1,0,0), (0,1,0), (0,0,1). Same reasoning as 2D tells us all other points rotate too. Note if R has determinant -1, then R is a rotation plus a reflection.

17. Putting it Together Scale Projection 3D Translation 3D Rotation We can just write st x as t x and st y as t y .

18. Affine Structure from Motion

19. Affine Structure-from-Motion: Two Frames (1)

20. Affine Structure-from-Motion: Two Frames (2) To make things easy, suppose:

21. Affine Structure-from-Motion: Two Frames (3) Looking at the first four points, we get:

22. Affine Structure-from-Motion: Two Frames (4) We can solve for motion by inverting matrix of points. Or, explicitly, we see that first column on left (images of first point) give the translations. After solving for these, we can solve for the each column of the s components of the motion using the images of each point, in turn.

23. Affine Structure-from-Motion: Two Frames (5) Once we know the motion, we can use the images of another point to solve for the structure. We have four linear equations, with three unknowns.

24. Affine Structure-from-Motion: Two Frames (6) Suppose we just know where the k th point is in image 1. Then, we can use the first two equations to write a k and b k as linear in c k . The final two equations lead to two linear equations in the missing values and c k . If we eliminate c k we get one linear equation in the missing values. This means the unknown point lies on a known line. That is, we recover the epipolar constraint. Furthermore, these lines are all parallel.

25. Affine Structure-from-Motion: Two Frames (7) But, what if the first four points arent so simple? Then we define A so that: This is always possible as long as the points arent coplanar.

26. Affine Structure-from-Motion: Two Frames (8) Then, given: We have: And:

27. Affine Structure-from-Motion: Two Frames (9) Given: Then we just pretend that: is our motion, and solve as before.

28. Affine Structure-from-Motion: Two Frames (10) This means that we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A . Since if a structure and motion explains the points: So does another of the form:

29. Affine Structure-from-Motion: Two Frames (11) Note that A has the form: A corresponds to translation of the points, plus a linear transformation.

30. For example, there is clearly a translational ambiguity in recovering the points. We cant tell the difference between two point sets that are identical up to a translation when we only see them after they undergo an unknown translation. Similarly, theres clearly a rotational ambiguity. The rest of the ambiguity is a stretching in an unknown direction.