Transporters with Visual Foresight

Rearrangement tasks have been identified as a crucial challenge for intelligent robotic manipulation, but few methods allow for precise construction of unseen structures. We propose a visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently. In addition, we develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method. Our image-based task planning method, Transporters with Visual Foresight, is able to learn from only a handful of data and generalize to multiple unseen tasks in a zero-shot manner. TVF is able to improve the performance of a state-of-the-art imitation learning method on unseen tasks in simulation and real robot experiments. In particular, the average success rate on unseen tasks improves from 55.4% to 78.5% in simulation experiments and from 30% to 63.3% in real robot experiments when given only tens of expert demonstrations.

Authors

Hongtao Wu*, Jikai Ye*, Xin Meng, Chris Paxton, Gregory Chirikjian

* indicates equal contributions.

Laboratory for Computational and Sensing Robotics (LCSR), Johns Hopkins University

Department of Mechanical Engineering, National University of Singapore

NVIDIA

Transporters with Visual Foresight

Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

Transporters with Visual Foresight

Authors

Links

Introductory Video