HOI-M3: Capture Multiple Humans and Objects Interaction within Contextual Environment

1ShanghaiTech University, 2Shanghai Advanced Research Institute, Chinese Academy of Sciences *These authors contributed equally. Corresponding author

Abstract

Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities. However, recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects, due to fundamental data scarcity. In this paper, we introduce HOI-M3, a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably, it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M$^3$ dataset, we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis.

Video

BibTeX


      @article{zhang2024hoi,
              title={HOI-M3: Capture Multiple Humans and Objects Interaction within Contextual Environment},
              author={Zhang, Juze and Zhang, Jingyan and Song, Zining and Shi, Zhanhe and Zhao, Chengfeng and Shi, Ye and Yu, Jingyi and Xu, Lan and Wang, Jingya},
              journal={arXiv preprint arXiv:2404.00299},
              year={2024}
              }