Abstract
Accurate keypoint positioning is necessary for bottom-up multi-person pose estimation methods to handle scale variation and crowdedness. In this paper, we present DoubleHigherNet: a novel network learning scale-aware and precise heatmap representation for bottom-up process using double high-resolution feature pyramids and coarse-to-fine training. The two feature pyramids in DoubleHigherNet consists of 1/4 resolution feature and higher-resolution (1/2) maps generated by attention fusion blocks and transposed convolutions. Benefited by the training strategy, muti-resoltion and coarse-fine heatmap aggregation, the proposed approach is able to predict keypoints more accurately so as to perform better on difficult crowded scenes. DoubleHigherNetw32 achieves competitive result on CrowdPose-test, surpassing all the top-down methods and bottom-up SOTA HigherHRNet-w32 (which possesses similar number of params with DoubleHigherNet-w32).
| Original language | English |
|---|---|
| Article number | 012068 |
| Number of pages | 8 |
| Journal | Journal of Physics: Conference Series |
| Volume | 2033 |
| Issue number | 1 |
| Early online date | 13 Jun 2021 |
| DOIs | |
| Publication status | Published - 5 Oct 2021 |
| Externally published | Yes |
| Event | 3rd International Conference on Electrical, Communication and Computer Engineering, ICECCE 2021 - Kuala Lumpur, Virtual, Malaysia Duration: 12 Jun 2021 → 13 Jun 2021 |
Keywords
- attention fusion block
- coarse-to-fine training
- DoubleHigherNet
- heatmap aggregation