Simple Optical Tracking
DIY camera tracking of objects that are moving on the ground, with just a webcam and a few lines of code
The setup that will be covered here, allows you to track an object that moves on a specified plane (e.g. on even ground) using just one webcam and a few lines of Python code. The camera has to be in an elevated position, observing the plane and the object you want to track.
The foundation for the tracking algorithm is the pinhole camera model.
This model allows you to get the real-world position
- The image coordinates
of the object - The intrinsic parameters of the camera
- The Rotation
and translation from World Coordinates to Camera Coordinates:
Intrinsic Parameters
The first step of determining the image coordinates (the actual pixel positions of the object) can be done in various ways. You can use segmentation or object detection algorithms, for example. Choosing a suitable method here depends on the situation and is out of the scope of this post. For simplicity's sake, let's just assume that the image coordinates are known already.
This brings us to the next step, the intrinsic parameters of the camera. In a perfect world, The camera we use behaves like a pinhole camera. In such a scenario, you could just draw a light ray from each pixel in the image, through the pinhole, pointing exactly to the corresponding object in the real world

This means, that the image coordinates are connected directly to the real-world object by the projection rays.
The origin of the camera coordinate system is the center of the projection
These scaled-down coordinates (emphasized by the
All the equations above assume that the camera in use follows the pinhole camera model. However, this model is idealized and doesn't quite represent reality. In reality, camera lenses are used instead of a pinhole. Those lenses are essentially doing the same thing but add some additional distortions to the image so that the pixel/image coordinates aren't quite at the same place as they would be with a pinhole.
Determining the lens distortion parameters as well as the camera matrix can be done with openly accessible libraries like opencv, or with tools like the Matlab Camera Calibrator. The details of that process can be found on those sites and aren't discussed here for that reason.
Rotation and Translation from World to Camera-Coordinates
The third thing you need to determine for the 3D reconstruction is the relation between the camera- and the world coordinates. It is expressed using the rotation

Let's say we have the following corresponding point pairs:
Real World Point | Image Point |
---|---|
With those corresponding points, the camera matrix solvePnP(...)
function to get the rotation matrix
Putting everything together
We determined the camera matrix solvePnP(...)
.
With all this information, it's now possible to transform any image point
The next step is scaling the normalized coordinates by
Unfortunately, we don't know the value of
Let's first turn this equation around so that the world coordinates are alone on one side. The inverse rotation matrix can be obtained by transposing it and the inverse translation is just the negative value of the vector:
Now replace the camera coordinates with the scaled normalized camera coordinates:
The desired result can now be calculated by simplifying the equation and just looking at the Z-value of the 3D coordinates here:
-
(the last row of the inverse rotation matrix) -
-
-
Now, the value of
Example Implementation
I've also created a Jupyter Notebook example that shows how to actually code everything that was described above: