Abstract
Accurate keypoint positioning is necessary for bottom-up multi-person pose estimation methods to handle scale variation and crowdedness. In this paper, we present DoubleHigherNet: a novel network learning scale-aware and precise heatmap representation for bottom-up process using double high-resolution feature pyramids and coarse-to-fine training. The two feature pyramids in DoubleHigherNet consists of 1/4 resolution feature and higher-resolution (1/2) maps generated by attention fusion blocks and transposed convolutions. Benefited by the training strategy, muti-resoltion and coarse-fine heatmap aggregation, the proposed approach is able to predict keypoints more accurately so as to perform better on difficult crowded scenes. DoubleHigherNetw32 achieves competitive result on CrowdPose-test, surpassing all the top-down methods and bottom-up SOTA HigherHRNet-w32 (which possesses similar number of params with DoubleHigherNet-w32).
Original language | English |
---|---|
Article number | 012068 |
Number of pages | 8 |
Journal | Journal of Physics: Conference Series |
Volume | 2033 |
Issue number | 1 |
Early online date | 13 Jun 2021 |
DOIs | |
Publication status | Published - 5 Oct 2021 |
Externally published | Yes |
Event | 3rd International Conference on Electrical, Communication and Computer Engineering, ICECCE 2021 - Kuala Lumpur, Virtual, Malaysia Duration: 12 Jun 2021 → 13 Jun 2021 |
Keywords
- attention fusion block
- coarse-to-fine training
- DoubleHigherNet
- heatmap aggregation