Triple loss for hard face detection

Zhenyu Fang, Jinchang Ren, Stephen Marshall, Huimin Zhao, Zheng Wang, Kaizhu Huang, Bing Xiao

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Although face detection has been well addressed in the last decades, despite the achievements in recent years, effective detection of small, blurred and partially occluded faces in the wild remains a challenging task. Meanwhile, the trade-off between computational cost and accuracy is also an open research problem in this context. To tackle these challenges, in this paper, a novel context enhanced approach is proposed with structural optimization and loss function optimization. For loss function optimization, we introduce a hierarchical loss, referring to ``triple loss'' in this paper, to optimize the feature pyramid network (FPN) (Lin et al., 2017) based face detector. Additional layers are only applied during the training process. As a result, the computational cost is the same as FPN during inference. For structural optimization, we propose a context sensitive structure to increase the capacity of the prediction network to improve the accuracy of the output. In details, a three-branch inception subnet (Szegedy et al., 2015) based feature fusion module is employed to refine the original FPN without increasing the computational cost significantly, further improving low-level semantic information, which is originally extracted from a single convolutional layer in the backward pathway of FPN. The proposed approach is evaluated on two publicly available face detection benchmarks, FDDB and WIDER FACE. By using a VGG-16 based detector, experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost of face detection.

Original languageEnglish
JournalNeurocomputing
Early online date21 Feb 2020
DOIs
Publication statusE-pub ahead of print - 21 Feb 2020

    Fingerprint

Keywords

  • face detection
  • small face
  • face feature fusion
  • single shot detection
  • efficiency-accuracy balance

Cite this