arXiv 2024

RoMeO: Robust Metric Visual Odometry

Junda Cheng, Zhipeng Cai, Zhaoxing Zhang, Wei Yin, Matthias Müller, Michael Paulitsch, Xin Yang

Abstract

Visual odometry (VO) aims to estimate camera poses from visual inputs — a fundamental building block for many applications such as VR/AR and robotics. This work focuses on monocular RGB VO where the input is a monocular RGB video without IMU or 3D sensors. Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data; they also cannot recover metric-scale poses. We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues by leveraging priors from pre-trained depth models. RoMeO incorporates both monocular metric depth and multi-view stereo (MVS) models to recover metric-scale, simplify correspondence search, provide better initialization and regularize optimization. RoMeO advances the state-of-the-art by a large margin across 6 diverse datasets covering both indoor and outdoor scenes, reducing relative and absolute trajectory errors by over 50% compared to prior SOTA.

Resources

PDF

arXiv: 2412.11530

Citation

@article{cheng2024romeo,
  title     = {{RoMeO}: Robust Metric Visual Odometry},
  author    = {Cheng, Junda and Cai, Zhipeng and Zhang, Zhaoxing and Yin, Wei and M{\"{u}}ller, Matthias and Paulitsch, Michael and Yang, Xin},
  journal   = {arXiv preprint arXiv:2412.11530},
  year      = {2024}
}

Copied!