TY - JOUR
T1 - SR-POD
T2 - sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection
AU - Xi, Yue
AU - Zheng, Jiangbin
AU - Li, Xiuxiu
AU - Xu, Xinying
AU - Ren, Jinchang
AU - Xie, Gang
PY - 2018/12/31
Y1 - 2018/12/31
N2 - Convolutional neural networks (CNNs) have outperformed most state-of-the-art methods in object detection. However, CNNs suffer the difficulty of detecting objects with rotation, because the dataset used to train the CCNs often does not contain sufficient samples with various angles of orientation. In this paper, we propose a novel data-augmentation approach to handle samples with rotation, which utilizes the distribution of the object's orientation without the time-consuming process of rotating the sample images. Firstly, we present an orientation descriptor, named as "principal-axis orientation" to describe the orientation of the object's principal axis in an image and estimate the distribution of objects’ principal-axis orientations (PODs) of the whole dataset. Secondly, we define a similarity metric to calculate the POD similarity between the training set and an additional dataset, which is built by randomly selecting images from the benchmark ImageNet ILSVRC2012 dataset. Finally, we optimize a cost function to obtain an optimal rotation angle, which indicates the highest POD similarity between the two aforementioned data sets. In order to evaluate our data augmentation method for object detection, experiments, conducted on the benchmark PASCAL VOC2007 dataset, show that with the training set augmented using our method, the average precision (AP) of the Faster RCNN in the TV-monitor is improved by 7.5%. In addition, our experimental results also demonstrate that new samples generated by random rotation are more likely to result in poor performance of object detection.
AB - Convolutional neural networks (CNNs) have outperformed most state-of-the-art methods in object detection. However, CNNs suffer the difficulty of detecting objects with rotation, because the dataset used to train the CCNs often does not contain sufficient samples with various angles of orientation. In this paper, we propose a novel data-augmentation approach to handle samples with rotation, which utilizes the distribution of the object's orientation without the time-consuming process of rotating the sample images. Firstly, we present an orientation descriptor, named as "principal-axis orientation" to describe the orientation of the object's principal axis in an image and estimate the distribution of objects’ principal-axis orientations (PODs) of the whole dataset. Secondly, we define a similarity metric to calculate the POD similarity between the training set and an additional dataset, which is built by randomly selecting images from the benchmark ImageNet ILSVRC2012 dataset. Finally, we optimize a cost function to obtain an optimal rotation angle, which indicates the highest POD similarity between the two aforementioned data sets. In order to evaluate our data augmentation method for object detection, experiments, conducted on the benchmark PASCAL VOC2007 dataset, show that with the training set augmented using our method, the average precision (AP) of the Faster RCNN in the TV-monitor is improved by 7.5%. In addition, our experimental results also demonstrate that new samples generated by random rotation are more likely to result in poor performance of object detection.
KW - data augmentation
KW - deep learning
KW - deep object detection
KW - object rotation
UR - http://www.scopus.com/inward/record.url?scp=85049637206&partnerID=8YFLogxK
UR - https://www.sciencedirect.com/journal/cognitive-systems-research
U2 - 10.1016/j.cogsys.2018.06.014
DO - 10.1016/j.cogsys.2018.06.014
M3 - Article
AN - SCOPUS:85049637206
VL - 52
SP - 144
EP - 154
JO - Cognitive Systems Research
JF - Cognitive Systems Research
ER -