Stereovision using more than two images

Overview

Stereovision is a technique used to estimate depth by comparing two images of the same scene taken from slightly different viewpoints, similar to how human eyes perceive depth. Two cameras are placed at a fixed distance apart, and each captures an image from its own angle. By identifying corresponding points in both images and calculating the horizontal shift (disparity) between them, it becomes possible to estimate how far objects are from the cameras using triangulation. The greater the disparity, the closer the object is to the cameras. Stereovision is commonly used in robotics, 3D mapping, autonomous vehicles, virtual reality, and other fields where depth perception is needed. This projects aims to extract depth information from more than two image sets, thereby, improving accuracy of depth map. To do so, depth map set, obtained from all valid pair of images, is averaged out to reduce number of unresolved pixels in final depth map. For comparison, ground truth for depth map was also obtained using depth sensor.

Theory of depth estimation from multiple images

For each pair of images, we find location of similar points in the two images. Call them \(x_L\) and \(x_R\) for left and right image, respectively. To calculate depth image we use the following formula.

\[ Z = \frac{f \cdot B}{d} \]

where \( Z \) is the depth, \( f \) if the focal length of the camera, \( B \) is the distance between two cameras, and \( d \) is the disparity which is equal to \( x_L - x_R \).

For Multiple Image Pairs

For \( N \) stereo pairs of images, calculate pairwise depths

\[ Z_i = \frac{f \cdot B}{d_i}, \quad \text{for } i = 1, 2, \dots, N \]

Depth image construction

To estimate depth of a point in the depth image, we take the average of depth obtained from all stereo pairs.

Result

For demonstration, we use three images to construct depth image of a scene.

Input

Ground truth

The ground truth for the scene was obtained using LIDAR for comparison.

Constructed depth image using stereovision

Depth map obtained from left and center image

Depth map obtained from center and right image

Average depth map — Depth map obtained from center and right image

Lecture notes

Go through the lecture notes on computer vision and sterevision for a detailed exposition.

Author

Anurag Gupta is an M.S. graduate in Electrical and Computer Engineering from Cornell University. He also holds an M.Tech degree in Systems and Control Engineering and a B.Tech degree in Electrical Engineering from the Indian Institute of Technology, Bombay.

Comment

Past Comments

No comments yet. Be the first!