RoMeO: Robust Metric Visual Odometry
Abstract
Visual odometry (VO) aims to estimate camera poses from visual inputs — a fundamental building block for many applications such as VR/AR and robotics. This work focuses on monocular RGB VO where the input is a monocular RGB video without IMU or 3D sensors. Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data; they also cannot recover metric-scale poses. We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues by leveraging priors from pre-trained depth models. RoMeO incorporates both monocular metric depth and multi-view stereo (MVS) models to recover metric-scale, simplify correspondence search, provide better initialization and regularize optimization. RoMeO advances the state-of-the-art by a large margin across 6 diverse datasets covering both indoor and outdoor scenes, reducing relative and absolute trajectory errors by over 50% compared to prior SOTA.
Resources
arXiv: 2412.11530
Citation
@article{cheng2024romeo,
title = {{RoMeO}: Robust Metric Visual Odometry},
author = {Cheng, Junda and Cai, Zhipeng and Zhang, Zhaoxing and Yin, Wei and M{\"{u}}ller, Matthias and Paulitsch, Michael and Yang, Xin},
journal = {arXiv preprint arXiv:2412.11530},
year = {2024}
}