EgoHumans: An Egocentric 3D Multi-Human Benchmark

ICCV 2023 (Oral)

1Carnegie Mellon University, 2Meta

3D Mesh Visualization: Volleyball sequence with 4 ego views and 15 static cameras.

Overview

Dataset Overview

We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild with annotations to support diverse tasks such as human detection, tracking, 2D/3D pose estimation, and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing soccer, fencing, volleyball, etc. Furthermore, our multi-view setup generates accurate 3D ground truth even under severe or complete occlusion. The dataset consists of more than 125k egocentric images, spanning diverse scenes with a particular focus on challenging and unchoreographed multi-human activities and fast-moving egocentric views. We rigorously evaluate existing state-of-the-art methods and highlight their limitations in the egocentric scenario, specifically on multi-human tracking. To address such limitations, we propose EgoFormer, a novel approach with a multi-stream transformer architecture and explicit 3D spatial reasoning to estimate and track the human pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 and 9.3% HOTA.

In-the-Wild Sequences

Fencing

3D Mesh from the secondary views.

3D Mesh from the ego views.

3D Pose from the secondary views.

3D Pose from the ego views along with the stereo cameras.

Tagging

3D Mesh from the secondary views.

3D Mesh from the ego views.

3D Pose from the secondary views.

3D Pose from the ego views along with the stereo cameras.

Volleyball

3D Mesh from the secondary views.

3D Mesh from the ego views.

3D Pose from the secondary views.

3D Pose from the ego views along with the stereo cameras.

Lego Assembly

3D Mesh from the secondary views.

3D Mesh from the ego views.

3D Pose from the secondary views.

3D Pose from the ego views along with the stereo cameras.

Badminton

3D Mesh from the secondary views.

3D Mesh from the ego views.

3D Pose from the secondary views.

3D Pose from the ego views along with the stereo cameras.

Dodgeball

3D Mesh from the secondary views.

3D Mesh from the ego views.

3D Pose from the secondary views.

3D Pose from the ego views along with the stereo cameras.

BibTeX


      @article{khirodkar2023egohumans,
          title={EgoHumans: An Egocentric 3D Multi-Human Benchmark},
          author={Khirodkar, Rawal and Bansal, Aayush and Ma, Lingni and Newcombe, Richard and Vo, Minh and Kitani, Kris},
          journal={arXiv preprint arXiv:2305.16487},
          year={2023}
        }