PiGraphs Dataset

The PiGraphs dataset is composed of 30 reconstructed scenes and 63 interaction recordings.

To create the reconstructed scenes, we use a volumetric fusion framework on scans obtained using a Structure sensor. Each scene comes with a surface mesh with a labeled segmentation, and a set of labeled voxels.

These 63 observations are video recordings of five subjects (4 male, 1 female) with skeletal tracking provided by the Kinect.v2 devices. The total recording duration is about two hours (100k frames at 15Hz) with a per-recording average length of 2 minutes and an average of 4.9 action annotations.

In total, there are 298 actions, and the average action duration is 8.4s.

There are 43 observed combinations of verb-noun pairs with 13 common action verbs such as look, sit, stand, lie, grasp, and read. 19 object categories are associated with these verbs (e.g., couch, bed, keyboard, monitor).

Download Data (34GB)

Documentation and Code

Browse Scenes

Browse Recordings