Understanding the Depth: How Stereo Cameras See Our World in 3D

Vaibhav Nandwani

Vaibhav Nandwani

Shaswat Panda

Shaswat Panda

Oct 1, 2023 | Data Science

In the expansive realm of visuals, there's a unique way to capture the world that adds depth and dimension: seeing in 3D using stereo cameras. But what exactly is behind this captivating technology? Let's delve into it.

Setting the Stage for 3D Vision

To bring the world to life in three dimensions, we require a few essential elements:

  • Stereo Camera Setup: Two cameras positioned side by side, each capturing images from slightly varied viewpoints.
  • Python and OpenCV: Make sure you have Python installed (preferably Python 3.x) along with the OpenCV library. You can install OpenCV using pip:
pip install opencv-python

Decoding the Depth

Just as our two eyes provide us with depth perception, two cameras can deduce the depth of objects based on their unique perspectives:

Baseline: This refers to the distance between the two cameras.

Focal Length: Cameras use the focal length to determine how an object's image will form on their sensor, similar to how our eyes focus on objects near and far.

Epipolar Geometry: When two cameras observe a distant object, their lines of sight will intersect at a particular point. This concept is pivotal for determining the depth.

Cameras Speak Their Own Language

To fully grasp 3D vision, we need to understand the language of cameras:

Intrinsic Parameters: These internal characteristics of a camera dictate:

  • Focal Length (f_x, f_y): The scaling factors for the x and y axes, converting pixel coordinates to real-world units (e.g., millimeters). It shows how an image is formed

  • Principal Point (c_x, c_y): The coordinates of the image's principal point (usually the image center).

  • Skew Coefficient (s): Measures the skewness of the image axes.

  • Lens Distortion Coefficients (k1, k2, p1, p2, k3): Correct for lens distortion.

  • Intrinsic matrix (K) combines these parameters:

Extrinsic Parameters: These pertain to a camera's orientation and position in space, crucial when working with two cameras in tandem.

  • Rotation Matrix (R): Represents the orientation of one camera relative to the other.

  • Translation Vector (T): Describes the translation from one camera's coordinate system to the other.

  • Projection Matrix (P): Combines intrinsic and extrinsic parameters:

P = K[R|T]

Creating a 3D Masterpiece

With this knowledge, we can embark on the journey of generating 3D visuals: Disparity Map: The difference in view between the two cameras is essential. It's a map that indicates how much each object shifts between the two images. The disparity map is computed from pixel differences between the left and right images. The disparity (d) of a pixel is inversely proportional to the depth (Z) and directly proportional to the baseline (B):

d = f * B/Z

Where, d is the disparity value. f is the focal length. B is the baseline. Z is the depth of the point.

Depth Calculation: Using the disparity map, we can determine how far or close each object is in the scene. To calculate the depth (Z) of a point in the 3D world given its disparity (d), we rearrange the formula:

Z = f * B/d

Making It Real with Video

The magic doesn't stop with still images. When applied to video, the results are truly mesmerizing:

Camera Calibration

Camera calibration is a crucial step in stereo vision. It involves estimating the intrinsic and extrinsic parameters of each camera, which are necessary for accurate depth calculation. We'll start by calibrating each camera individually.

  1. Individual Camera Calibration

To calibrate each camera, follow these steps:

  • Capture multiple images of a chessboard pattern from different angles and positions.
  • Use OpenCV's camera calibration functions to estimate the camera's intrinsic matrix and distortion coefficients. This information is stored in a calibration file for each camera.
  1. Stereo Camera Calibration

Once both cameras are individually calibrated, we can proceed to calibrate them together as a stereo camera setup:

  • Capture a set of stereo image pairs with corresponding points visible in both images.
  • Use OpenCV's stereo calibration functions to estimate the relative transformation between the two cameras (i.e., rotation and translation matrix).
  • Store the calibration results, including the rectification matrices, in a stereo calibration file.

Depth Calculation

With the stereo camera setup calibrated, we can now calculate the depth of a point in the 3D world given its coordinates in both camera frames. The process involves calculating the disparity map and then using it to compute depth.

  1. Disparity Map Calculation The disparity map represents the pixel-wise horizontal shift between the corresponding points in the left and right images. It is computed as follows:
# Get disparity value at pixel (x, y)
disparity_value = disparity[y, x]

# Calculate depth using the disparity
baseline = abs(P2[0, 3] / P2[0, 0])  # Baseline (distance between cameras)
depth = baseline / disparity_value
  1. Depth Calculation To calculate the depth of a point at pixel (x, y) in the left image, you can use the following formula:
# Get disparity value at pixel (x, y)
disparity_value = disparity[y, x]

# Calculate depth using the disparity
baseline = abs(P2[0, 3] / P2[0, 0])  # Baseline (distance between cameras)
depth = baseline / disparity_value

Depth Calculation on Video

To calculate depth on a video in real-time, we'll follow these steps:

  1. Open Video Stream

We will capture the video stream from both cameras in real-time.

import cv2

# Open video streams from left and right cameras
left_camera = cv2.VideoCapture(0)  # Adjust the index as per your camera configuration
right_camera = cv2.VideoCapture(1)  # Adjust the index as per your camera configuration

import cv2

# Open video streams from left and right cameras
left_camera = cv2.VideoCapture(0)  # Adjust the index as per your camera configuration
right_camera = cv2.VideoCapture(1)  # Adjust the index as per your camera configuration

while True:
    # Capture frames from both cameras
    ret1, left_frame = left_camera.read()
    ret2, right_frame = right_camera.read()

    if not ret1 or not ret2:
        break  # Break the loop if frames are not retrieved

    # Perform stereo rectification on the frames (similar to previous calibration steps)
    # ...

    # Calculate depth for a specific point (x, y) in both frames
    x, y = 320, 240  # Example pixel coordinates; adjust as needed

    # Calculate disparity map
    disparity = stereo.compute(left_frame_rectified, right_frame_rectified)

    # Get disparity value at pixel (x, y)
    disparity_value = disparity[y, x]

    # Calculate depth using the disparity
    baseline = abs(P2[0, 3] / P2[0, 0])  # Baseline (distance between

    depth = baseline / disparity_value

    # Display depth information on the frame
    cv2.putText(left_frame, f'Depth: {depth:.2f} meters', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    # Show the frames with depth information
    cv2.imshow('Left Camera', left_frame)

    # Exit the loop when the 'q' key is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video streams and close all windows
left_camera.release()
right_camera.release()
cv2.destroyAllWindows()

In the above code, we continuously capture frames from both cameras, calculate the disparity map, and then compute the depth for a specific point. We display the depth information on the left camera's frame and exit the loop when the 'q' key is pressed.

Real-life Applications of 3D Vision

This technology has vast implications: Robotics: Robots equipped with stereo cameras can understand depth, enabling them to move adeptly and perform specific tasks with precision. Sports Analytics: By analyzing players' movements in 3D, coaches gain unparalleled insights into game dynamics and player strategies. 3D Movies & Augmented Reality: The immersive experiences we enjoy in modern entertainment lean heavily on the principles of 3D vision.

Conclusion

The art and science of stereo vision offer a thrilling glimpse into the future of visual technology. As we continue to blend the boundaries of the real and the digital, the depth and dimension brought by 3D vision will play an ever-increasing role in our experiences. And as we harness this tech, we usher in a new era of innovation and discovery.

Ready to Transform Your Vision?

Intrigued by the potential of 3D vision? At Asynq, we specialize in bringing such technological wonders to life. If you're looking to integrate stereo vision into your projects or explore the frontiers of software development, we're here to guide and assist. Embark on your next technological adventure with us!

Contact us at info@asynq.ai or fill the form here.