Dense convolutional networks for efficient video analysis

Tian Jin, Zhihao He, Amlan Basu, John Soraghan, Gaetano Di Caterina, Lykourgos Petropoulakis

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

2 Citations (Scopus)
115 Downloads (Pure)

Abstract

Over the past few years various Convolutional Neural Networks (CNNs) based models exhibited certain human-like performance in a range of image processing problems. Video understanding, action classification, gesture recognition has become a new stage for CNNs. The typical approach for video analysis is based on 2DCNN to extract feature map from a single frame and through 3DCNN or LSTM to merging spatiotemporal information, some approaches will add optical flow on the other branch and then post-hoc fusion. Normally the performance is proportional to the model complexity, as the accuracy keeps improving, the problem is also evolved from accuracy to model size, computing speed, model availability. In this paper, we present a lightweight network architecture framework to learn spatiotemporal feature from video. Our architecture tries to merge long-term content in any network feature map. Keeping the model as small and as fast as possible while maintaining accuracy. The accuracy achieved is 91.4% along with an appreciable speed of 69.3 fps.
Original languageEnglish
Title of host publication2019 The 5th International Conference on Control, Automation and Robotics (ICCAR 2019)
Place of PublicationPiscataway, N.J.
PublisherIEEE
Pages550-554
Number of pages5
ISBN (Print)9781728133256, 9781728133263
DOIs
Publication statusPublished - 29 Aug 2019
Event2019 The 5th International Conference on Control, Automation and Robotics - Park Plaza Beijing Science Park, 25 Zhichun Road, Haidian, Beijing, China
Duration: 19 Apr 201922 Apr 2019
http://www.iccar.org/

Conference

Conference2019 The 5th International Conference on Control, Automation and Robotics
Abbreviated titleICCAR 2019
Country/TerritoryChina
CityBeijing
Period19/04/1922/04/19
Internet address

Keywords

  • CNN
  • LSTM
  • 3D-Net
  • DenseNet
  • 2D layer
  • video analysis
  • convolutional network
  • convolutional neural network (CNN)
  • feature maps

Fingerprint

Dive into the research topics of 'Dense convolutional networks for efficient video analysis'. Together they form a unique fingerprint.

Cite this