TY - JOUR
T1 - Completed part transformer for person re-identification
AU - Zhang, Zhong
AU - He, Di
AU - Liu, Shuang
AU - Xiao, Baihua
AU - Durrani, Tariq S.
N1 - © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
PY - 2024/2/2
Y1 - 2024/2/2
N2 - Recently, part information of pedestrian images has been demonstrated to be effective for person re-identification (ReID), but the part interaction is ignored when using Transformer to learn long-range dependencies. In this paper, we propose a novel transformer network named Completed Part Transformer (CPT) for person ReID, where we design the part transformer layer to learn the completed part interaction. The part transformer layer includes the intra-part layer and the part-global layer, where they consider long-range dependencies from the aspects of the intra-part interaction and the part-global interaction, simultaneously. Furthermore, in order to overcome the limitation of fixed number of the patch tokens in the transformer layer, we propose the Adaptive Refined Tokens (ART) module to focus on learning the interaction between the informative patch tokens in the pedestrian image, which improves the discrimination of the pedestrian representation. Extensive experimental results on four person ReID datasets, i.e., MSMT17, Market1501, DukeMTMC-reID and CUHK03, demonstrate that the proposed method achieves a new state-of-the-art performance, e.g., it achieves 68.0% mAP and 84.6% Rank-1 accuracy on MSMT17.
AB - Recently, part information of pedestrian images has been demonstrated to be effective for person re-identification (ReID), but the part interaction is ignored when using Transformer to learn long-range dependencies. In this paper, we propose a novel transformer network named Completed Part Transformer (CPT) for person ReID, where we design the part transformer layer to learn the completed part interaction. The part transformer layer includes the intra-part layer and the part-global layer, where they consider long-range dependencies from the aspects of the intra-part interaction and the part-global interaction, simultaneously. Furthermore, in order to overcome the limitation of fixed number of the patch tokens in the transformer layer, we propose the Adaptive Refined Tokens (ART) module to focus on learning the interaction between the informative patch tokens in the pedestrian image, which improves the discrimination of the pedestrian representation. Extensive experimental results on four person ReID datasets, i.e., MSMT17, Market1501, DukeMTMC-reID and CUHK03, demonstrate that the proposed method achieves a new state-of-the-art performance, e.g., it achieves 68.0% mAP and 84.6% Rank-1 accuracy on MSMT17.
KW - electrical and electronic engineering
KW - computer science applications
KW - media technology
KW - signal processing
U2 - 10.1109/tmm.2023.3294816
DO - 10.1109/tmm.2023.3294816
M3 - Article
SN - 1520-9210
VL - 26
SP - 2303
EP - 2313
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -