Monocular Visual-Inertial Navigation

I wrote my Honours Thesis for the McGill Department of Mechanical Engineering, under the supervision of Prof. James R. Forbes, and in collaboration with the Dynamics, Estimation, and Controls for Aerospace and Robotics (DECAR) group.

In order to function properly, a robotic system must know three essential things : where it is, where it wants to go, and how to get there. These questions are answered by three fields of research: guidance, navigation, and control. Guidance plans the path to take, navigation localizes the robot as it moves along the path, and control ensures that the robot stays on the path. My research was about robotic navigation.

More specifically, I was concerned with localizing a robot in real time by using only two ‘sensors’ carried by the robot: a monocular camera, and an inertial measurement unit (IMU) that measures acceleration and angular velocity. This is called monocular visual-inertial navigation. Monocular cameras and IMUs are common and inexpensive sensors that can provide a wealth of information about a robot and its surroundings. This makes monocular visual-inertial navigation an ideal navigation solution in a wide range of applications, and especially in environments where GPS is not available. This is the case for autonomous underwater vehicles inspecting the underwater environment for the purpose of marine biology, archeology, or infrastructure maintenance. Other examples include robots navigating the built-environment, or augmented reality software packages running on mobile phones.

Going into this project, some of the key questions were: how do you use pictures taken by a robot to recover information about how it moved? How do you fuse the information obtained from the camera with that obtained from the IMU? How do you minimize the uncertainty that is inherent to any kind of measurement? And how do you do all this fast enough that the robot can keep track of itself as it moves?

The problem of robotic navigation is usually answered by state estimation algorithms. State estimation consists in estimating the state of a system, by combining multiple partial measurements of the state, that is, measurements of quantities that are related to the state but not of the state itself. In robotic navigation, the state to be estimated is often the robot orientation and position, also known as the robot pose. There are various kinds of state estimators, but in this thesis we considered filters. Filters only use past measurements to estimate the robot state at a given time, which means they can be implemented in real time. A key benefit of filters is that, since they provide information in real time, they can communicate with the control systems that stabilize the robot.

In the end, I built a complete navigation solution that I called the monocular delayed-state invariant extended Kalman filter (MDS-IEKF). So far, the MDS-IEKF is in the form of an algorithm written in MATLAB. The input to the algorithm is simply a series of images taken by a robot, and an associated series of IMU measurements; the output is an estimate of the trajectory of the robot, with information about the uncertainty on this estimate. Inside of the algorithm there are two main moments: the front-end and the back-end. The front-end portion of the MDS-IEKF algorithm deals with the problem of monocular visual odometry: it receives images taken by a robot and uses them to measure how the robot has moved from one image to the next. The back-end portion of the MDS-IEKF algorithm is the state estimator: it fuses camera measurements to IMU measurements in order to obtain an optimal estimate of the trajectory of the robot, in real time. I tested the MDS-IEKF on a real dataset (the EuRoC dataset), and it proved to be a simple, functional, and consistent solution for monocular visual-inertial navigation. Nonetheless, the simplicity of the MDS-IEKF has tradeoffs, which make it an insufficient solution for robots that need to navigate autonomously for long periods of time. These shortcomings can and should be addressed in future work, in order to enhance the MDS-IEKF and make it more robust.

These are some of the topics covered in my thesis:

monocular visual odometry, or how to use images taken by a single camera to recover information about how the camera has moved;
Kalman filtering, or how to combine various kinds of measurements in a way that minimizes the uncertainty on the final estimate of the state;
invariant extended Kalman filtering, or how to use matrix Lie group theory to model robotic systems in a more accurate way, and to overcome the shortcomings of traditional Kalman filters;
the MDS-IEKF algorithm, which was ‘my’ (indebted to a lot of wonderful prior work) solution for monocular visual-inertial navigation.

Thesis PDF

orlando.nicoletti@gmail.com