Skip to the content.

Transporters with Visual Foresight

Rearrangement tasks have been identified as a crucial challenge for intelligent robotic manipulation, but few methods allow for precise construction of unseen structures. We propose a visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently. In addition, we develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method. Our image-based task planning method, Transporters with Visual Foresight, is able to learn from only a handful of data and generalize to multiple unseen tasks in a zero-shot manner. TVF is able to improve the performance of a state-of-the-art imitation learning method on unseen tasks in simulation and real robot experiments. In particular, the average success rate on unseen tasks improves from 55.4% to 78.5% in simulation experiments and from 30% to 63.3% in real robot experiments when given only tens of expert demonstrations.


Hongtao Wu*, Jikai Ye*, Xin Meng, Chris Paxton, Gregory Chirikjian

* indicates equal contributions.

Laboratory for Computational and Sensing Robotics (LCSR), Johns Hopkins University

Department of Mechanical Engineering, National University of Singapore



Introductory Video