LEGATO: Cross-Embodiment Imitation Using a Grasping Tool

Mingyo Seo1,2    H. Andy Park2    Shenli Yuan2    Yuke Zhu1†    Luis Sentis1,2†   

1The University of Texas at Austin    2The AI Institute    Equal advising

IEEE Robotics and Automation Letters (RA-L), 2025

Paper | Code | Hardware | Appendix

Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. We train visuomotor policies on task demonstrations using this gripper through imitation learning, applying transformation to a motion-invariant space for computing the training loss. Gripper motions generated by the policies are retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework's effectiveness in learning and transferring visuomotor skills across various robots.


Cross-Embodiment Learning Pipeline

We introcuce a cross-embodiment imitation learning framework that enables human demonstrations via direct interaction or robot teleoperation. Our framework uses the LEGATO Gripper, a versatile handheld grasping tool that ensures consistent physical interactions across different embodiments. During data collection, the LEGATO Gripper records its trajectories, grasping actions, and visual observations captured by its egocentric stereo camera. Visuomotor policies trained on demonstrations by humans or teleoperated robots using the tool can be deployed across various robots equipped with the same gripper. Motion retargeting enables the execution of trajectories on different robots without requiring robot-specific training data.


Handheld Gripper Design

The LEGATO Gripper is designed for both human demonstration collection and robot deployment. It features a shared actuated gripper with adaptable handles, ensuring reliable human handling and consistent grasping across robots while minimizing components.

A human demonstrator can directly perform tasks by carrying the LEGATO Gripper in hand. The LEGATO Gripper is easily installable on various robots, securely held by their original grippers, and is ready for immediate use.


Whole-body Motion Retargeting

Motion retargeting through IK optimization adeptly navigates the kinematic differences and constraints across robot embodiments, exploiting kinematic redundancy without requiring additional robot-specific demonstrations for deployment.


Real-Robot Deployment

We trained visuomotor policies on direct human demonstrations and successfully deployed them on the Panda robot system. Our method succeeded in 16 trials of the Closing the lid task, 13 trials of the Cup shelving task, and 14 trials of the Ladle reorganization task, respectively.


Simulation Evaluation

On average, LEGATO outperforms the other methods in cross-embodiment deployment by 28.9%, 10.5%, and 21.1%, compared to BC-RNN, Diffusion Policy, and the self-variant of LEGATO trained only on SE3 (LEGATO (SE3)), respectively. Notably, unlike the baselines that only achieved high success rates on specific robot bodies, typically the Abstract embodiment used for training, LEGATO demonstrates consistent success across different embodiments.


Citation


      @article{seo2024legato,
        title={LEGATO: Cross-Embodiment Imitation Using a Grasping Tool},
        author={Seo, Mingyo and Park, H. Andy and Yuan, Shenli and Zhu, Yuke and
          and Sentis, Luis},
        journal={IEEE Robotics and Automation Letters (RA-L)},
        year={2025}
      }
    

Acknowledgement

This work was conducted during Mingyo Seo's internship at the AI Institute. We thank Rutav Shah and Minkyung Kim for providing feedback on this manuscript. We thank Osman Dogan Yirmibesoglu for designing the fin ray style compliant fingers and helping with hardware prototyping. We thank Mitchell Pryor and Fabian Parra for their support with the real Spot demonstration. We acknowledge the support of the AI Institute and the Office of Naval Research (N00014-22-1-2204).