Selected Publications

We present an approach for transferring driving policies from simulation to reality via modularity and abstraction. Our approach is inspired by classic driving systems and aims to combine the benefits of modular architectures and end-to-end deep learning approaches. The key idea is to encapsulate the driving policy such that it is not directly exposed to raw perceptual input or low-level vehicle dynamics. We evaluate the presented approach in simulated urban environments and in the real world. In particular, we transfer a driving policy trained in simulation to a 1/5-scale robotic truck that is deployed in a variety of conditions, with no finetuning, on two continents.

We present a photo-realistic training and evaluation simulator Sim4CV ( with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network (DNN) architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool.

A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. Experimental results demonstrate that the presented approach significantly outperforms a number of baselines.

In this paper, we train a deep neural network to predict UAV controls from raw image data for the task of autonomous UAV racing in a photo-realistic simulation. Training is done through imitation learning with data augmentation to allow for the correction of navigation mistakes. Extensive experiments demonstrate that our trained network (when sufficient data augmentation is used) outperforms state-of-the-art methods and flies more consistently than many human pilots.

In this paper, we present a framework that allows the explicit incorporation of global context within CF trackers. We reformulate the original optimization problem and provide a closed form solution for single and multi-dimensional features in the primal and dual domain. We demonstrate with extensive experiments that this framework can significantly improve the performance of many CF trackers with only a modest impact on their frame rate.​
Oral at CVPR’17

In order to evaluate object trackers for aerial tracking applications, we have generated 123 new video sequences from a UAV and annotated them with upright bounding boxes and attributes relevant for tracking. With more than 110,000 frames this is currently the largest data set for aerial tracking by a long shot and the second largest data set for generic tracking. In addition, we have developed a photo-realistic simulator within the UE4 framework that can be used for numerous vision tasks. As an example, we have integrated several state-of-the-art object trackers to control an UAV inside the simulator based on live feedback. In addition our simulator can be used to generate large amounts of realistic vision data.
Poster at ECCV’16

We have developed a persistent, robust and autonomous object tracking system for unmanned aerial vehicles (UAVs). Our computer vision and control strategy integrates multiple UAVs with a stabilized RGB camera and can be applied to a diverse set of moving objects (e.g. humans, animals, cars, boats, etc.). A novel strategy is employed to successfully track objects over a long period, by ’handing over the camera’ from one UAV to another. The popular object tracker Struck was optimized for both speed and performance and integrated into the proposed system.
Interactive Presentation at IROS’16

All Publications