Balancing accuracy and computational cost is a challenging task in computer vision. This is especially true for convolutional neural networks (CNNs), which required far larger scale of processing power than traditional learning algorithms. This thesis is aimed at the development of new CNN structures and loss functions to tackle the unbalanced accuracy-effciency issue in image classification and object detection, which are two fundamental yet challenging tasks of computer vision. For a CNN based object detector, the main computational cost is caused by the feature extractor (backbone), which has been originally applied to image classification.Optimising the structure of CNN applied to image classification will bring benefits when it is applied to object detection. Although the outputs of detectors may vary across detection tasks, the challenges and the design principles among detectors are similar. Therefore, this thesis will start with face detection (i.e. a single object detection task), which is a significant branch of objection detection and has been widely used in real life. After that, object detection on aerial image will be investigated, which is a more challenging detection task.Specifically, the objectives of this thesis are: 1. Optimising the CNN structures for image classification; 2. Developing a face detector which enables a trade-off between computational cost and accuracy; and 3. Proposing an object detector for aerial images, which suppresses the background noise without damaging the inference efficiency.For the first target, this thesis aims to automatically optimise the topology of CNNs to generate the structure of fixed-length models, in which unnecessary convolutional kernels are removed. Experimental results have demonstrated that the optimised model can achieve comparable accuracy to the state-of-the-art models, across a broad range of datasets, whilst significantly reducing the number of parameters.To tackle the unbalanced accuracy-effciency challenge in face detection, a novel context enhanced approach is proposed which improves the performance of the face detector in terms of both loss function and structure. For loss function optimisation, a hierarchical loss, referred to as 'triple loss' in this thesis, is introduced to optimise the feature pyramid network (FPN) based face detector. For structural optimisation, this thesis proposes a context-sensitive structure to increase the capacity of the network prediction. Experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost of face detection.To suppress the background noise in aerial image object detection, this thesis presents a two-stage detector, named as 'SAFDet'. To be more specific, a rotation anchor-free-branch (RAFB) is proposed to regress the precise rectangle boundary. Asthe RAFB is anchor free, the computational cost is negligible during training. Meanwhile,a centre prediction module (CPM) is introduced to enhance the capabilities oftarget localisation and noise suppression from the background. As the CPM is only deployed during training, it does not increase the computational cost of inference. Experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost, and it effectively suppresses the background noise at the same time.
|Date of Award||16 Jun 2020|
- University Of Strathclyde
|Sponsors||University of Strathclyde|
|Supervisor||Stephen Marshall (Supervisor) & Jinchang Ren (Supervisor)|