In the previous post, we have introduced the our proposed method for Facial Expression Recognition. In this post, we will examine the effectiveness and efficiency of the proposal.
Noisy and Inconsistent Labels
We conduct experiments to study the robustness of our LDLVA on mislabelled data by adding synthetic noise to AffectNet, RAF-DB, and SFEW datasets. Specifically, we randomly flip the manual labels to one of the other categories. . We report the mean accuracy and standard error in Table 1. The results clearly show that our method consistently outperforms other approaches in all cases. We also observe that the improvements are even more apparent when the noise ratio increases, for example, the accuracy improvement on RAF-DB is 4.7\% with 10\% noise and 6.93\% with 30\% noise. The consistent results under various settings demonstrate the ability of our method to effectively deal with noisy annotation, which is crucial in the robustness against label ambiguity.
Since the annotations for large-scale FER data are commonly obtained via crowd-sourcing, this can create label inconsistency, especially between different datasets. To examine the effectiveness of our proposed methods in dealing with this problem, we also perform experiment with the cross-dataset protocol. Table 2 shows that our method achieves the best performance on all three datasets and the highest average accuracy and surpasses the current state-of-the-art methods. This confirms the advantages of our method over previous works and demonstrates the generalization ability to data with label inconsistency, which is essential for real-world FER applications.
Comparison with state of the arts
We further compare our method with several state of the arts on the original AffectNet, RAF-DB, and SFEW to evaluate the robustness of our method to the uncertainty and ambiguity that unavoidably exists in real-world FER datasets. The results are presented in Table 3. By leveraging label distribution learning on valence-arousal space, our model outperforms other methods and achieves state-of-the-art performance on AffectNet, RAF-DB, and SFEW. Although these datasets are considered to be "clean", the results suggest that they indeed suffer from uncertainty and ambiguity.
Real-world Ambiguity: To understand more about real-world ambiguous expressions, we conducted a user study in which we asked participants to choose the most clearly expressed emotion on random test images. We compare our model's predictions with the survey results in Figure 3. We can see that these images are ambiguous as they express a combination of different emotions, hence the participants do not fully agree and have different opinions about the most prominent emotion on the faces. It is further shown that our model can give consistent results and agree with the perception of humans to some degree.
Uncertainty Factor: Figure 4 shows the estimated uncertainty factors of some training images and their original labels. The uncertainty values decrease from top to bottom. Highly uncertain labels can be caused by low-quality inputs (as shown in Angry and Surprise columns) or ambiguous facial expressions. In contrast, when the emotions can be easily recognized as those in the last row, the uncertainty factors are assigned low values. This characteristic can guide the model to decide whether to put more weight on the provided label or the neighborhood information. Therefore, the model can be more robust against uncertainty and ambiguity.
We have introduced a new label distribution learning method for facial expression recognition by leveraging structure information in the valence-arousal space to recover the intensities distributed over emotion categories. The constructed label distribution provides rich information about the emotions, thus can effectively describe the ambiguity degree of the facial image. Intensive experiments on popular datasets demonstrate the effectiveness of our method over previous approaches under inconsistency and uncertainty conditions in facial expression recognition.
 Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 2019
 Shan Li, Weihong Deng, and JunPing Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In CVPR, 2017.
 B. Gao, C. Xing, C. Xie, J. Wu, and X. Geng. Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017.