Abnormal Human Activity Recognition (Part 10 - Datasets)

A review about Abnormal Human Activity Recognition (cont.).
type: insightlevel: easyguides: abnormal_activity_recognition

Datasets

New technologies have led to a significant increase in the quantity and diversity of datasets that are publicly accessible for research purposes. Table 1 presents a summary of the newly developed 3D AbHAR and AAL Datasets, including the methods, accuracy, precision, recall and challenges of these datasets as reported in the original paper. Some of the widely used datasets for analyzing crowd activity are UMN dataset, UCSD-ped1, ped2 dataset, PETS dataset (Patino et al., 2016), web videos, subway surveillance dataset, the boat-sea, boat-river, traffic-Belleview and airport-wrong-direction datasets. A novel dataset HIT-JUT (Han et al., 2016) has been introduced recently for crowd activity detection that contains fight, shoot, and escape scenarios. It consists of two-camera views of crowded indoor scenes with 15 people. Saini et al., 2017 classified anomalies into Type-I and Type-II and gave their mathematical definitions. Type-I anomaly occurs when a target stays in a region for a long time. Type-II anomaly happens when a target moves between two or more regions for a prolonged period. The aforementioned datasets only display Type-I anomaly. The In-House dataset proposed by Saini et al., 2017 covers both Type-I and Type-II anomalies and offers a broader range of anomaly situations. The dataset is captured by a static camera (25 FPS) mounted on top of a building for 70 min to record public movement on a busy working day. The camera can record both HD and SD quality videos. The video frames are resized to 640 × 480. On the other hand, for single person Abnormal Human Action Recognition, there were very few datasets available from 2011 to 2014. In Khan and Sohn, 2013, 2011 introduced a 2D-silhouette based abnormal dataset that involves six abnormal activities: forward fall, backward fall, chest pain, faint, vomit, and headache, stored in AVI format. Six persons (4 males, 2 females) performed each activity ten times. There are 150 silhouettes for each activity and 900 silhouettes for each person performing the activities. Later, in the context of ADL and elderly health care, several AbHAR 3D datasets were reported such as CAD60, CAD-120, TST fall detection dataset, UR fall detection, SisFall dataset, Le2i datasets.

Table-1

Table 1: 3D AbHAR and AAL datasets summary.

Home monitoring applications need datasets that represent various activities and events in domestic settings. Some of the datasets that have been created for this purpose are CAD-60 (Cippitelli et al., 2016) and CAD-120, which consist of RGBD video sequences of high-level activities and object affordances. The CAD-60 dataset has 60 video sequences of four subjects performing five activities in four different environments, while the CAD-120 dataset has 120 video sequences of four subjects performing 10 activities in four different environments. The activities include making cereal, taking medicine, stacking objects, unstacking objects, microwaving food, picking objects, cleaning objects, taking food, arranging objects, and having a meal. The object affordances include reachable, movable, pourable, pour to, containable, drinkable, openable, placeable, closable, scrubbable, scrubber, and stationary. The datasets provide depth frames, skeleton joints, RGB frames, and activity and affordance labels. The datasets are described and evaluated in A Dataset for Learning and Evaluation of RGB-D Based Recognition of Human Actions and Interaction in Unconstrained Environments by Cippitelli et al. (2016).

Another dataset that is designed for home monitoring applications is TST Fall detection dataset v2 (Gasparrini et al., 2015), which focuses on different types of falls and activities of daily living (ADL) using Microsoft Kinect v2 and IMU sensors. The dataset contains 264 actions performed by 11 volunteers, who wore two IMU devices on their waist and right wrist. The actions include different variations of falls such as forward fall, backward fall, side fall, fall ending up sitting and daily activities such as walk back and forth, sit on a chair, walk and grasp object from the floor, lying down. The dataset provides depth frames, skeleton joints in-depth, time information for synchronization, and two raw acceleration streams from the IMU devices. The dataset is introduced and characterized in A Novel RGB-D Video Dataset for Fall Detection by Gasparrini et al. (2015).

UR fall detection (Kwolek, 2014) dataset is another dataset that captures falls and ADL events with two Microsoft Kinect cameras and accelerometric data from PS Move and x-IMU devices. The dataset includes 70 sequences with 30 falls and 40 ADL events. The falls are recorded with two cameras and corresponding accelerometric data, while the ADL events are recorded with only one camera and accelerometer. The sensor data is collected using PS Move (60 Hz) and x-IMU (256 Hz) devices. The dataset also includes pre- and post-fall situations identified by a survey that asks three questions: (i) which activity the person was performing when the fall happened? (ii) Reason for fall? A sliding, a faint, a trip, other? (iii) In which orientation did the fall happen and what part of the body received the impact? The dataset is presented and used in Improving Fall Detection by the Use of Depth Sensor and Accelerometer by Kwolek (2014).

SisFall (Sucerquia et al., 2017) dataset is based on the same survey as UR fall detection dataset and includes 15 types of falls and 19 ADL activities. The falls include lateral fall/fall forward/fall backward while walking caused by a slip/trip or falling asleep or while getting up/sitting down, vertical fall while fainting. The ADL activities include walking slowly/fast/with turns/with stops/with changes of direction/backward/sideways/upstairs/downstairs; sitting down on a chair/sofa; standing up from a chair/sofa; lying down on a bed; standing up from a bed; sitting down on the floor; standing up from the floor; picking up an object from the floor; jumping; running; crouching; kneeling down; bending over to pick up an object from the floor; stretching out to pick up an object from a table. The dataset is collected using two triaxial accelerometers attached to the waist (S1) and right ankle (S2) of each volunteer at a sampling rate of 200 Hz. The dataset is described and analyzed in SisFall: A Fall and Movement Dataset by Sucerquia et al. (2017).

Le2i robust fall detection dataset (Nguyen et al., 2016) is a dataset that contains 221 videos of various living environments with different daily activities and real-world challenges such as shadow, light reflection, complex background, and clothing variation. The videos are recorded at 320 × 240 pixels at 25 fps with different actors wearing different colored clothes. The videos include different types of falls such as forward/backward/sideways falls while walking/running/sitting down/getting up/standing still/fainting/jumping/crouching/kneeling down/bending over, and different types of ADL activities such as walking/running/sitting down/getting up/standing still/jumping/crouching/kneeling down/bending over/picking up an object from the floor/table. The dataset is introduced and used in Single camera based fall detection using motion and human shape features by Nguyen et al. (2016).