LEGATO: Cross-Embodiment Imitation Using a Grasping Tool

Mingyo Seo1,2    H. Andy Park2    Shenli Yuan2    Yuke Zhu1†    Luis Sentis1,2†   

1The University of Texas at Austin    2The AI Institute    Equal advising

Paper | Code | Hardware | Appendix

Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. Using this gripper, we train visuomotor policies via imitation learning, applying a motion-invariant transformation to compute the training loss. Gripper motions are then retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework’s effectiveness in learning and transferring visuomotor skills across various robots.


Cross-Embodiment Learning Pipeline

We introcuce a cross-embodiment imitation learning framework that enables human demonstrations via direct interaction or robot teleoperation. Our framework uses the LEGATO Gripper, a versatile wearable gripper, to maintain consistent physical interactions across different robots. During data collection, our gripper records the trajectories and grasping actions of the wearable gripper, as well as visual observations from its egocentric stereo camera. Policies trained from demonstrations by humans or teleoperated robots using the tool can be deployed across various robots equipped with the same gripper. Motion retargeting enables these trajectories to be executed on different robots without requiring robot-specific training data.


Wearable Gripper Design

The LEGATO Gripper is designed for both collecting human demonstrations and robot deployment. It features a shared actuated gripper with adaptable handles, ensuring reliable human handling and consistent grasping across robots while minimizing parts.

A human demonstrator can directly perform tasks using the the LEGATO Gripper by carrying it. The LEGATO Gripper is easily installable on various robots, held securely by their original grippers, and is ready for immediate use.


Whole-body Motion Retargeting

Motion retargeting through IK optimization adeptly navigates the kinematic differences and constraints across robot embodiments, exploiting kinematic redundancy without requiring additional robot-specific demonstrations for deployment.


Real-Robot Deployment

We trained visuomotor policies on direct human demonstrations and successfully deployed them on the Panda robot system. Our method succeeded in 16 trials of the Closing the lid task, 13 trials of the Cup shelving task, and 14 trials of the Ladle reorganization task, respectively.


Simulation Evaluation

On average, LEGATO outperforms the other methods in cross-embodiment deployment by 28.9%, 10.5%, and 21.1%, compared to BC-RNN, Diffusion Policy, and the self-variant of LEGATO trained only on SE3 (LEGATO (SE3)), respectively. Notably, unlike the baselines that only achieved high success rates on specific robot bodies, typically the Abstract embodiment used for training, LEGATO demonstrates consistent success across different embodiments.


Citation


      @misc{seo2024legato,
        title={LEGATO: Cross-Embodiment Visual Imitation Using a Grasping Tool},
        author={Seo, Mingyo and Park, H. Andy and Yuan, Shenli and Zhu, Yuke and
          and Sentis, Luis},
        year={2024}
        eprint={2411.03682},
        archivePrefix={arXiv},
        primaryClass={cs.RO}
      }
    

Acknowledgement

This work was conducted during Mingyo Seo's internship at the AI Institute. We thank Rutav Shah and Minkyung Kim for providing feedback on this manuscript. We thank Dr. Osman Dogan Yirmibesoglu for designing the fin ray style compliant fingers and helping with hardware prototyping. We thank Mitchell Pryor and Fabian Parra for their support with the real Spot demonstration. We acknowledge the support of the AI Institute and the Office of Naval Research (N00014-22-1-2204).