Abnormal Human Activity Recognition (Part 5 - Three-Dimensional AbHAR) (cont.)

A review about Abnormal Human Activity Recognition (3D AbHAR) (cont.).
type: insightlevel: easyguides: abnormal_activity_recognition

Three-Dimensional AbHAR (cont.)

Depth based fall detection

The idea of an inactive period heightens the fall's harshness. Strong falls result in longer periods of inactivity while lying on the ground. Fall detection cannot be done from a single instance of data; instead, discriminative features must be examined both during and after the fall. The confirmation of inactivity depends heavily on context. The precise position of the individual, the time, and the length of inactivity all contribute to making intelligent decisions. For example, being in bed for a long time is not frightening, but sitting on the floor for a long time after a fall will cause an alarm. As a result, contextual knowledge lowers the probability of false alarms. Therefore, Jansen and Deklerck, 2006 quantified the area of the body on the floor and the 3D direction of the fall to comprehend the duration of an individual's inactivity during and after a fall and learned about contextual elements of the fall. The human body makes a variety of fast movements during regular activities, which could cause erroneous detection of significant motions. Therefore, using picture moments and the human form feature, Nguyen et al., 2016 quantified the center of mass of a human 2D silhouette to observe overall motion (magnitude and orientation) of the item. The effect of abrupt movements on fall detection is lessened. Additionally, MHI (motion history image) shows the precise location and motion's trajectory throughout the video sequence. To lower the false detection rate, the 2D silhouette extraction method must be enhanced.

Researchers have found that a person's posture is an important factor in determining whether they will fall. The centroid location of the body, the ellipse feature, and the form structure feature were used by Yu et al., 2013 to depict human body posture. To lower the rate of false alarms, they have established the following two rules:

  • The first rule is to compute the magnitude of motion using the area ratio (AR) between the foreground region of the current frame and the area of the MEI frames. For fall action, AR will be higher than for walking, sitting, or bending. Even though the system notices atypical postures, falls with low AR values won't be detected.
  • The second rule: a fall is only declared if any anomalous posture lasts longer than a threshold; this brings down the number of false alarms (such as sporadic bends or tying shoes) to a manageable level. Despite the method's great fall detection accuracy, manual segmentation and video clip selection require human participation.

Another method Rougier et al., 2011b used the entire Procrustes distance and mean matching cost as deformation measures to measure the deformation of the human shape during a fall. The authors' dataset, which is depicted in Figs. 1 and 2, addresses realistic scenarios such occlusion, use of the object, taking off clothing, and various angles. Yao et al., 2017 developed the Human Torso Motion Model (HTMM), which, by tracking changes in the centroid height and torso angle, can distinguish between falls and fall-like motions such bending and crouching down with an accuracy of 97.5 percent.

Fig-1

Figure 1: A fall activity involving the same individual as seen from various angles Rougier et al., 2011b.

Fig-2

Figure 2: Rougier et al., 2011b also considered more examples: (a) Falling forward, (b) Sitting, (c) Kneeling, (d)-(e), (g) Falling (loss of balance), (f) taking off a coat, (h) falling off a sofa I housework, (j) falling backward, (k) becoming obstructed, (l) lugging a package.

Due to the lack of fall sequences in the existing RGB-D action datasets, CAD-60/120, ADL and fall sequences are recorded for research. The method, however, depends on threshold values that must be determined each time a fresh dataset is used. These values are employed with a hit-and-trial methodology to maximize the output. Human centroid height in relation to the ground and 3D person velocity were calculated by Rougier et al., 2011a. 3D person velocity aids the system in finely differentiating between falling behind the sofa and stooping down behind it. In this instance, 3D velocity is chosen over 2D velocity of a person falling since 2D velocity is typically quite high close to the camera for typical walking activity, leading to a misclassification between a fall and a walk. In contrast to bounding box width to height ratio based fall detection models, it has been found that height Gasparrini et al., 2014 and height velocity Yang et al., 2015 based approaches fail to distinguish fall and fall-like actions.

While Akagunduz et al., 2016 integrated orientation scale space (OSS) and morphological scale space of a curve to form robust Silhouette Orientation volume (SOV) global scale invariant descriptor to represent, Ma et al., 2014 represented the actions by a bag of words model (BoCSS) using distinctive Curvature Scale Space (CSS) features of depth silhouette for fall detection.

An enhanced human fall detection system was developed by some researchers using the fusion of sensor data (accelerometer, floor sensor, and ocular depth data), which has improved fall detection performance. To distinguish a fall from a regular sitting motion on the floor, Toreyin et al., 2006 combined the sound impact of a falling person with the height to width ratio of the bounding box on a person under falling conditions. In order to detect probable falls while combining accelerometric data and depth data with little computing expense, Zerrouki et al., 2016 employed the notion of Univariate Statistical monitoring method Exponentially Weighted Moving Average (EWMA) control scheme. Even though these fusions result in outstanding results, the detection depends on a sensor and its peripherals, which may not be in the user's comfort zone. The fall detection system's goal of being non-intrusive to the user is thus defeated.

In the next blog, we will discuss about "Skeleton based AbHAR" works in detail.