Abnormal Human Activity Recognition (Part 6 - Three-Dimensional AbHAR) (cont.)

A review about Abnormal Human Activity Recognition (3D AbHAR) (cont.).
type: insightlevel: easyguides: abnormal_activity_recognition

Three-Dimensional AbHAR (cont.)

Skeleton based AbHAR

The problem of the need for an efficient segmentation technique to extract 2D silhouettes from depth silhouettes has unquestionably been solved by the skeleton representation of the human body, which offers incisive details of the human posture in compact form. It also makes it easier to compute the height centroid from depth silhouettes (Rougier et al., 2011a). This has inspired academics to create real-time software employing the skeleton modality, which makes computations quicker, easier, and more efficient.

An efficient real-time intelligent monitoring system for ATMs was created by Nar et al., 2016 to identify anomalous postures that call for more security in the ATM, such as tampering with the camera, adopting an aggressive stance, and peeping. The study employed the angles between various bones as relevant information to calculate the ideal weight value for calculating the likelihood that the subject's present stance is abnormal. With 3D skeleton data, calculating the angle between joints (x, y, and z) is easier, quicker, and more precise.

Hendryli and Fanany, 2016 addressed the problem of exam proctors being warned if any suspicious behaviour is discovered when students engage in automated detection of aberrant behaviors in the exam room (Cheating activity). In order to achieve this, the MCMCLDA (Multi-class Markov Chain Latent Dirichlet Allocation) framework is created, which accesses the arm joints and head location as interest points directly from the skeleton representation without taking into account irrelevant ones, leading to better accuracy and faster computation than the Harris3D detector.

Human gait analysis is currently receiving more attention from researchers and health experts (Aggarwal and Vishwakarma, 2016). Because variations in human gait from the normal frame may be associated with neurological illnesses, they may also be used to identify distinct personality traits in people. Using joint motion history feature, or the 3D position of skeletal joints and motion signals, Chaaraoui et al., 2015 created a general machine learning framework (Bag- of-Key-Poses). In Paiement et al., 2014 low-level and high-level multi-scale motion cues are retrieved to manage complicated behaviors. However, skeletal data collected using a Kinect sensor is probably going to have a lot of noise and outliers, especially if there is partial occlusion. Diffusion maps are therefore used in the study to filter out the outliers. In order to transform a room into a smart living area, Jalal et al., 2014 attempted to build continuous monitoring and everyday activity detection in indoor spaces (i.e., smart homes, smart offices, and smart hospitals). Additionally, it produced noise from the joints of the skeleton and was unable to manage complicated actions and partial body occlusion.

Skeletal based Fall Detection

Table-1

Table 1: Feature extraction and representation methods for detecting abnormal human behavior using three-dimensional skeletons (AbHAR).

The exponential development of information, intelligent video systems, and communication technologies has spurred research in aberrant activity identification, which is drawing dynamically growing attention (Khan and Hoey, 2017). The severity of a fall's medical effects is mostly influenced by reaction and rescue times. In order to enhance and expedite the medical treatment given to this population, an accurate automated fall detector (Bian et al., 2012a) is a need for older adults. The effectiveness of skeletal representations of people is demonstrated by the improvement of fall detection systems and several other aberrant everyday activities. Various skeleton-based AbHAR strategies are described in Table 1. From this, it is clear that joint motion history (JMH) and trajectory of joints (Rougier et al., 2006; Bian et al., 2012b; Nizam et al., 2016) based action description are straightforward and highly efficient, with appreciable view and illumination invariance property for skeleton-based abnormal human action detection.

Fig-1

Figure 1: An example of rotation. (a) Before rotation. (b) After rotation, the head, neck and hip center joints can be extracted correctly.

However, the shape deformation concept-based fall detection work (Rougier et al., 2011a) or distance between the silhouette center and the floor is not able to distinguish between initiative action and fall accident well, i.e. fall in bed and fall in the floor without defining normal inactivity zones, a person is sleeping on the sofa or bed and falls down to the floor. By differentiating a fall from slowly laying down on the floor and other comparable scenarios, 3D human skeleton combines height from the ground, striking velocity, joint position, and distance from the floor (Bian et al., 2012b; Nizam et al., 2016) jointly elicit robust findings. Because of the substantial changes in body position that occur during falling, joints track poorly. As a result, in Bian et al., 2012c, the author first adjusted the subject's trunk orientation (from the hip point to the neck) before using a quick Randomized Decision Forest (RDF) algorithm to extract the subject's skeleton, which increased the accuracy of the fall detection rate as displayed in Fig. 1. By concurrently tracking the head, the work described here is able to recognize tiny falls like falling off the sofa while our half body (legs) is still on the sofa. This is something the silhouette center-based technique misses. Based on information about how people react in the final few frames after falling, a view independent statistical technique (Zhang et al., 2012) makes a conclusion. It established a feature set f=[f1,f2,f3,f4,f5]f = [f_1, f_2, f_3, f_4, f_5], where f1 represents the length of the fall, f2 represents the total head drop, f3 represents the maximum speed of the fall, f4 represents the smallest head height, and f5 represents the percentage of frames where the head is shorter than in the previous frame. The Bayesian network is used to integrate the five characteristics. However, there aren't many false alarms reported when someone is on the floor or jumps into a bed. Diraco et al., 2010 employed a large actual dataset that employs Bayesian segmentation to detect moving zones as the test bed to validate strong performance in terms of consistency and competence. Additionally, it unveiled a prototype of the 3D Geodesic Distance Map-Based Posture Recognition (3dGDMPR) system, shown in Fig. 2. As shown in Fig. 3, overlapping area postures are discovered and interpolated using an occluded leg as a test instance.

Fig-2

Figure 2: Four main postures examined in Diraco et al., 2010: lie, sit, stand and bend. (a) Depth map, (b) Geodesic distance map and, (c) skeleton. Upper and lower nodes of the skeleton Reeb Graph are encircled with red circle.

Fig-3

Figure 3: Detection and interpolation of overlapped regions (a), the arm occludes a leg (b), Laplacian operator detect occluded bounds (c). Level sets of the height function (depth map) detect inbound region (d), interpolation is performed to obtain a connected mesh (e) body spine orientation is approximated by the orientation 𝜑 of the PQ segment in the body skeleton Diraco et al., 2010.

In the next blog, we will discuss about "Deep features based action description" works in detail.