Matthias Müller

RoMeO: Robust Metric Visual Odometry

This work introduces Robust Metric Visual Odometry (RoMeO), a novel monocular RGB visual odometry (VO) method that estimates camera poses from video without requiring IMUs or 3D sensors. RoMeO improves robustness, generalization, and metric-scale pose recovery by leveraging pre-trained depth models, combining monocular depth with multi-view stereo (MVS) for accurate correspondence and optimization, and applying effective training techniques. It reduces trajectory errors by over 50% compared to SOTA across multiple datasets and enhances SLAM pipelines with global bundle adjustment and loop closure.

Junda Cheng, Zhipeng Cai, Zhaoxing Zhang, Wei Yin, Matthias Müller, Michael Paulitsch, Xin Yang

arXiv’24

Details PDF Bibtex

Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

Mesh2NeRF introduces an analytic method to derive ground-truth radiance fields directly from textured meshes, enhancing 3D generation tasks by providing precise supervision for training generative NeRFs and single scene representations. This approach addresses artifacts from traditional methods and demonstrates significant improvements in view synthesis and mesh extraction across various datasets.

Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner

ECCV’24

Details PDF Video Project Bibtex

LiSA: LiDAR Localization with Semantic Awareness

LiSA introduces semantic awareness into Scene Coordinate Regression (SCR) for LiDAR localization, enhancing robustness and accuracy by addressing challenges posed by dynamic objects and repetitive structures. Through knowledge distillation from a segmentation model, LiSA improves performance without additional computation during inference.

Bochun Yang, Zijun Li, Wen Li, Zhipeng Cai, Chenglu Wen, Yu Zang, Matthias Müller, Cheng Wang

CVPR’24

Details PDF Code Bibtex

L-MAGIC: Language Model Assisted Generation of Images with Coherence

L-MAGIC introduces a method that utilizes large language models to guide the generation of coherent 360-degree panoramic scenes from a single input image. By leveraging pre-trained diffusion and language models without fine-tuning, L-MAGIC achieves zero-shot performance, producing panoramic scenes with improved layouts and rendering quality, as demonstrated by extensive experiments and human evaluations.

Zhipeng Cai, Matthias Müller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, JunDa Cheng, Gabriela Ben-Melech Stan, Vasudev Lal, Michael Paulitsch

CVPR’24

Details PDF Video Code Project Bibtex Demo

Evaluation of Test-Time Adaptation Under Computational Time Constraints

This study introduces an online evaluation protocol for Test Time Adaptation (TTA) methods, emphasizing the impact of computational speed on adaptation performance. By simulating real-world data streams, the research demonstrates that faster, simpler TTA methods can outperform more complex, slower ones when inference speed is considered, highlighting the need for efficient and effective adaptation techniques.

Motasem Alfarra, Hani Itani, Alejandro Pardo, Shyma Alhuwaider, Merey Ramazanova, Juan C. Pérez, Zhipeng Cai, Matthias Müller, Bernard Ghanem

ICML’24

Details PDF Code Bibtex Poster

OpenBot-Fleet: A System for Collective Learning with Real Robots

OpenBot-Fleet is an open-source cloud robotics system utilizing smartphones and low-cost wheeled robots for navigation. By leveraging cloud storage and computation, it enables the collection and learning of navigation policies, achieving over 80% success in unseen environments, thus advancing scalable and cost-effective deployment of learning robot fleets.

Matthias Müller, Samarth Brahmbhatt, Ankur Deka, Quentin Leboutet, David Hafner, Vladlen Koltun

ICRA’24

Details PDF Video Code Project Bibtex

SimCS: Simulation for Domain Incremental Online Continual Segmentation

This work introduces SimCS, a parameter-free method that leverages simulated data to regularize continual learning in Online Domain-Incremental Continual Segmentation (ODICS). Experiments demonstrate that SimCS consistently enhances performance when integrated with various continual learning methods.

Motasem Alfarra, Zhipeng Cai, Adel Bibi, Bernard Ghanem, Matthias Müller

AAAI’24

Details PDF Slides Bibtex Poster

GIM: Learning Generalizable Image Matcher from Internet Videos

This work introduces GIM, a self-training framework for generalizable image matching. Utilizing diverse internet videos, GIM generates supervision signals to improve state-of-the-art architectures in zero-shot and cross-domain scenarios. It establishes ZEB, a zero-shot evaluation benchmark, while demonstrating significant gains in 3D reconstruction and visual localization tasks.

Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias Müller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, Cheng Wang

ICLR’24

Details PDF Video Code Project Bibtex Demo

E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning

E2PNet introduces a learning-based approach for event-to-point cloud registration, leveraging a novel Event-Points-to-Tensor (EP2T) network to encode event data into a 2D grid-shaped feature tensor. This facilitates the application of existing RGB-based frameworks to event data, enhancing robustness under extreme illumination and fast motion conditions.

Xiuhong Lin, Changjie Qiu, Zhipeng Cai, Siqi Shen, Yu Zang, Weiquan Liu, Xuesheng Bian, Matthias Müller, Cheng Wang

NeurIPS’23

Details PDF Code Bibtex

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective.

Yunlong Song, Angel Romero, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

SciRob’23

Details PDF Video Bibtex

Champion-level Drone Racing using Deep Reinforcement Learning

Swift is an autonomous system that can race physical vehicles at the level of the human world champions. The system combines deep reinforcement learning (RL) in simulation with data collected in the physical world. Swift competed against three human champions, including the world champions of two international leagues, in real-world head-to-head races. Swift won several races against each of the human champions and demonstrated the fastest recorded race time. This work represents a milestone for mobile robotics and machine intelligence, which may inspire the deployment of hybrid learning-based solutions in other physical systems.

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

Nature’23

Details PDF Video Bibtex Nature Video

CLNeRF: Continual Learning Meets NeRF

CLNeRF introduces continual learning to Neural Radiance Fields, enabling efficient updates and preventing catastrophic forgetting in scenes with changing appearance and geometry over time. It leverages generative replay and the Instant Neural Graphics Primitives architecture, validated on the proposed World Across Time dataset.

Zhipeng Cai, Matthias Müller

ICCV’23

Details PDF Video Code Dataset Bibtex

LDM3D: Latent Diffusion Model for 3D

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360-degree-view experiences using TouchDesigner. This technology has the potential to transform a wide range of industries, from entertainment and gaming to architecture and design. Overall, this paper presents a significant contribution to the field of generative AI and computer vision, and showcases the potential of LDM3D and DepthFusion to revolutionize content creation and digital experiences.

Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Müller, Vasudev Lal

CVPRW’23

Details PDF Video Code Bibtex

Monocular Visual-Inertial Depth Estimation

We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment. We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in inverse RMSE with dense scale alignment relative to performing just global alignment alone. Our approach is especially competitive at low density; with just 150 sparse metric depth points, our dense-to-dense depth alignment method achieves over 50% lower iRMSE over sparse-to-dense depth completion by KBNet, currently the state of the art on VOID. We demonstrate successful zero-shot transfer from synthetic TartanAir to real-world VOID data and perform generalization tests on NYUv2 and VCU-RVI. Our approach is modular and is compatible with a variety of monocular depth estimation models.

Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun

ICRA’23

Details PDF Video Code Short Talk Bibtex

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

This paper introduces ZoeDepth, a novel approach to single-image depth estimation that combines relative and metric depth frameworks. By pre-training on multiple datasets and fine-tuning on specific ones, ZoeDepth achieves state-of-the-art performance and exceptional zero-shot generalization across diverse indoor and outdoor datasets.

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller

arXiv’23

Details PDF Code Model BibTeX

Zero-Shot Transfer of Haptics-based Object Insertion Policies

In this paper we train a contact-exploiting manipulation policy in simulation for the contact-rich household task of loading plates into a slotted holder, which transfers without any fine-tuning to the real robot. We investigate various factors necessary for this zero-shot transfer, like time delay modeling, memory representation, and domain randomization. Our policy transfers with minimal sim-to-real gap and significantly outperforms heuristic and learnt baselines. It also generalizes to plates of different sizes and weights.

Samarth Brahmbhatt, Ankur Deka, Andrew Spielberg, Matthias Müller

ICRA’23

Details PDF Video Code Project Short Talk Bibtex

Training Efficient Controllers via Analytic Policy Gradient

We propose an Analytic Policy Gradient (APG) method to tackle this problem. APG exploits the availability of differentiable simulators by training a controller offline with gradient descent on the tracking error. We address training instabilities that frequently occur with APG through curriculum learning and experiment on a widely used controls benchmark, the CartPole, and two common aerial robots, a quadrotor and a fixed-wing drone. Our proposed method outperforms both model-based and model-free RL methods in terms of tracking error. Concurrently, it achieves similar performance to MPC while requiring more than an order of magnitude less computation time. Our work provides insights into the potential of APG as a promising control method for robotics.

Nina Wiedemann, Valentin Wüest, Antonio Loquercio, Matthias Müller, Dario Floreano, Davide Scaramuzza

ICRA’23

Details PDF Code Short Talk Bibtex

Learning high-speed flight in the wild

We propose an end-to-end approach that can autonomously fly quadrotors through complex natural and human-made environments at high speeds with purely onboard sensing and computation. The key principle is to directly map noisy sensory observations to collision-free trajectories in a receding-horizon fashion. This direct mapping drastically reduces processing latency and increases robustness to noisy and incomplete perception. The sensorimotor mapping is performed by a convolutional network that is trained exclusively in simulation via privileged learning: imitating an expert with access to privileged information. By simulating realistic sensor noise, our approach achieves zero-shot transfer from simulation to challenging real-world environments that were never experienced during training: dense forests, snow-covered terrain, derailed trains, and collapsed buildings. Our work demonstrates that end-to-end policies trained in simulation enable high-speed autonomous flight through challenging environments, outperforming traditional obstacle avoidance pipelines.

Antonio Loquercio, Elia Kaufmann, Rene Ranftl, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

ScienceRobotics’21

Details PDF Video Code Project Bibtex

Training Graph Neural Networks with 1000 Layers

In this work, we study reversible connections, group convolutions, weight tying, and equilibrium models to advance the memory and parameter efficiency of GNNs. We find that reversible connections in combination with deep network architectures enable the training of overparameterized GNNs that significantly outperform existing methods on multiple datasets. Our models RevGNN-Deep (1001 layers with 80 channels each) and RevGNN-Wide (448 layers with 224 channels each) were both trained on a single commodity GPU and achieve an ROC-AUC of 87.74 ± 0.13 and 88.24 ± 0.15 on the ogbn-proteins dataset. To the best of our knowledge, RevGNN-Deep is the deepest GNN in the literature by one order of magnitude.

Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun

ICML’21

Details PDF Slides Code Project Bibtex

OpenBot: Turning Smartphones into Robots

Current robots are either expensive or make significant compromises on sensory richness, computational power, and communication capabilities. We propose to leverage smartphones to equip robots with extensive sensor suites, powerful computational abilities, state-of-the-art communication channels, and access to a thriving software ecosystem. We design a small electric vehicle that costs $50 and serves as a robot body for standard Android smartphones. We develop a software stack that allows smartphones to use this body for mobile operation and demonstrate that the system is sufficiently powerful to support advanced robotics workloads such as person following and real-time autonomous navigation in unstructured environments. Controlled experiments demonstrate that the presented approach is robust across different smartphones and robot bodies.

Matthias Müller, Vladlen Koltun

ICRA’21

Details PDF Video Code Project Bibtex

DDA: Deep Drone Acrobatics

We propose to learn a sensorimotor policy that enables an autonomous quadrotor to fly extreme acrobatic maneuvers with only onboard sensing and computation. We train the policy entirely in simulation by leveraging demonstrations from an optimal controller that has access to privileged information. We use appropriate abstractions of the visual input to enable transfer to a real quadrotor. We show that the resulting policy can be directly deployed in the physical world without any fine-tuning on real data. Our methodology has several favorable properties: it does not require a human expert to provide demonstrations, it cannot harm the physical system during training, and it can be used to learn maneuvers that are challenging even for the best human pilots. Our approach enables a physical quadrotor to fly maneuvers such as the Power Loop, the Barrel Roll, and the Matty Flip, during which it incurs accelerations of up to 3g.

Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

RSS’20

Details PDF Slides Video Code Project Bibtex

SGAS: Sequential Greedy Architecture Search

We introduce sequential greedy architecture search (SGAS), an efficient method for neural architecture search. By dividing the search procedure into sub-problems, SGAS chooses and prunes candidate operations in a greedy fashion. We apply SGAS to search architectures for Convolutional Neural Networks (CNN) and Graph Convolutional Networks (GCN). Extensive experiments show that SGAS is able to find state-of-the-art architectures for tasks such as image classification, point cloud classification and node classification in protein-protein interaction graphs with minimal computational cost.

Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Müller, Ali Thabet, Bernard Ghanem

CVPR’20

Details PDF Slides Video Code Bibtex

SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications

We present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks. To do this, we re-frame the adversarial attack problem as learning a distribution of parameters that always fools the agent. In the semantic case, our proposed adversary (denoted as BBGAN) is trained to sample parameters that describe the environment with which the black-box agent interacts, such that the agent performs its dedicated task poorly in this environment. We apply BBGAN on three different tasks, primarily targeting aspects of autonomous navigation: object detection, self-driving, and autonomous UAV racing. On these tasks, BBGAN can generate failure cases that consistently fool a trained agent.

Abdullah Hamdi, Matthias Müller, Bernard Ghanem

Poster at AAAI’20

Details PDF Bibtex

DeepGCNs: Making GCNs Go as Deep as CNNs

GCNs show promising results, but they are limited to very shallow models due to the vanishing gradient problem. As a result most state-of-the-art GCN algorithms are no deeper than 3 or 4 layers. In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly residual/dense connections and dilated convolutions, and adapt them to GCN architectures. Through extensive experiments, we show the positive effect of these deep GCN frameworks. Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3.7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation.

Guohao Li, Matthias Müller, Ali Thabet, Bernard Ghanem

Oral at ICCV’19

Details PDF Video Code Project Bibtex

OIL: Observational Imitation Learning

We propose Observational Imitation Learning (OIL), a novel imitation learning variant that supports online training and automatic selection of optimal behavior by observing multiple imperfect teachers. We apply our proposed methodology to the challenging problems of autonomous driving and UAV racing. For both tasks, we utilize the Sim4CV simulator that enables the generation of large amounts of synthetic training data and also allows for online learning and evaluation. We train a perception network to predict waypoints from raw image data and use OIL to train another network to predict controls from these waypoints. Extensive experiments demonstrate that our trained network outperforms its teachers, conventional imitation learning (IL) and reinforcement learning (RL) baselines and even humans in simulation.

Guohao Li, Matthias Müller, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

RSS’19

Details PDF Video Project Bibtex

Learning a Controller Fusion Network by Online Trajectory Filtering for Vision-based UAV Racing

In this paper, we propose learning an optimized controller using a DNN that fuses multiple controllers. The network learns a robust controller with online trajectory filtering, which suppresses noisy trajectories and imperfections of individual controllers. The result is a network that is able to learn a good fusion of filtered trajectories from different controllers leading to significant improvements in overall performance. We compare our trained network to controllers it has learned from, end-to-end baselines and human pilots in a realistic simulation; our network beats all baselines in extensive experiments and approaches the performance of a professional human pilot.

Matthias Müller, Guohao Li, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

CVPRW’19 - UAVision’19

Details PDF Video Bibtex

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

We present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, Bernard Ghanem

ECCV’18

Details PDF Code Project Bibtex

Driving Policy Transfer via Modularity and Abstraction

We present an approach for transferring driving policies from simulation to reality via modularity and abstraction. Our approach is inspired by classic driving systems and aims to combine the benefits of modular architectures and end-to-end deep learning approaches. The key idea is to encapsulate the driving policy such that it is not directly exposed to raw perceptual input or low-level vehicle dynamics. We evaluate the presented approach in simulated urban environments and in the real world. In particular, we transfer a driving policy trained in simulation to a 1/5-scale robotic truck that is deployed in a variety of conditions, with no finetuning, on two continents.

Matthias Müller, Alexey Dosovitskiy, Bernard Ghanem, Vladlen Koltun

CoRL’18

Details PDF Video Bibtex

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

We present a photo-realistic training and evaluation simulator [Sim4CV](https://sim4cv.org/) with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network (DNN) architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool.

Matthias Müller, Vincent Casser, Jean Lahoud, Neil Smith, Bernard Ghanem

IJCV’18

Details PDF Video Project Bibtex

End-to-end Driving via Conditional Imitation Learning

A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. Experimental results demonstrate that the presented approach significantly outperforms a number of baselines.

Felipe Codevilla, Matthias Müller, Alexey Dosovitskiy, Antonio López, Vladlen Koltun

ICRA’18

Details PDF Video Bibtex

Teaching UAVs to Race: End-to-End Regression of Agile Controls in Simulation

In this paper, we train a deep neural network to predict UAV controls from raw image data for the task of autonomous UAV racing in a photo-realistic simulation. Training is done through imitation learning with data augmentation to allow for the correction of navigation mistakes. Extensive experiments demonstrate that our trained network (when sufficient data augmentation is used) outperforms state-of-the-art methods and flies more consistently than many human pilots.

Matthias Müller, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

ECCVW’18 - UAVision’18

Details PDF Project Bibtex

Context-Aware Correlation Filter Tracking

In this paper, we present a framework that allows the explicit incorporation of global context within CF trackers. We reformulate the original optimization problem and provide a closed form solution for single and multi-dimensional features in the primal and dual domain. We demonstrate with extensive experiments that this framework can significantly improve the performance of many CF trackers with only a modest impact on their frame rate.

Matthias Müller, Neil Smith, Bernard Ghanem

Oral at CVPR’17

Details PDF Video Code Project Supplement Bibtex

A Benchmark and Simulator for UAV Tracking

In order to evaluate object trackers for aerial tracking applications, we have generated 123 new video sequences from a UAV and annotated them with upright bounding boxes and attributes relevant for tracking. With more than 110,000 frames this is currently the largest data set for aerial tracking by a long shot and the second largest data set for generic tracking. In addition, we have developed a photo-realistic simulator within the UE4 framework that can be used for numerous vision tasks. As an example, we have integrated several state-of-the-art object trackers to control an UAV inside the simulator based on live feedback. In addition our simulator can be used to generate large amounts of realistic vision data.

Matthias Müller, Neil Smith, Bernard Ghanem

Poster at ECCV’16

Details PDF Dataset Project Supplement Simulator Bibtex

Persistent Aerial Tracking System for UAVs

We have developed a persistent, robust and autonomous object tracking system for unmanned aerial vehicles (UAVs). Our computer vision and control strategy integrates multiple UAVs with a stabilized RGB camera and can be applied to a diverse set of moving objects (e.g. humans, animals, cars, boats, etc.). A novel strategy is employed to successfully track objects over a long period, by ’handing over the camera’ from one UAV to another. The popular object tracker Struck was optimized for both speed and performance and integrated into the proposed system.

Matthias Müller, Gopal Sharma, Neil Smith, Bernard Ghanem

Interactive Presentation at IROS’16

Details PDF Video Project Bibtex

Matthias Müller

Robotics Team Lead

About me

Short Bio

Interests

Education

News

Selected Publications

Contact