OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Abstract

We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.

Dexterous Human-to-Humanoid Whole-Body Teleoperation

Verbal Instructions (MDM)

Autonomous Agent (GPT-4o)

Autonomous Agent (Diffusion Policy Learned from Teleoperation Demonstrations)

Robustness Test (the same motion tracking policy)

Outdoor Locomotion (the same motion tracking policy)

Method

OmniH2O

  1. OmniH2O retargets large-scale human motions and filters out infeasible motions for humanoids.
  2. Our sim-to-real policy is distilled through supervised learning from an RL-trained privileged policy using privileged information.
  3. The universal design of OmniH2O supports versatile human control interfaces including VR headset and RGB camera etc. The sim-to-real policy also supports being controlled by autonomous agents like GPT-4 or Diffusion policy trained by teleoperation dataset to generate motion goals.

Failure Cases

Related Work

This work is based on our Previous Work: Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation


Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation
Tairan He*, Zhengyi Luo*, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi
PDF | Video | Project Page

Media


BibTeX

@inproceedings{he2024omnih2o,
      title={OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning},
      author={He, Tairan and Luo, Zhengyi and He, Xialin and Xiao, Wenli and Zhang, Chong and Zhang, Weinan and Kitani, Kris and Liu, Changliu and Shi, Guanya},
      journal={arXiv preprint arXiv:2406.08858},
      year={2024}
    }