A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos

Zheng Wang, Jinchang Ren, Dong Zhang, Meijun Sun, Jianmin Jiang

Research output: Contribution to journalArticle

62 Citations (Scopus)

Abstract

Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated.
LanguageEnglish
Number of pages20
JournalNeurocomputing
Early online date2 Feb 2018
DOIs
Publication statusE-pub ahead of print - 2 Feb 2018

Fingerprint

Learning
Motion Pictures
Surgical Instruments
Feature extraction
Databases
Deep learning
Research
Datasets

Keywords

  • spatiotemporal saliency detection
  • human gaze
  • convolutional neural networks
  • visual dispersion
  • movie highlight extraction

Cite this

@article{00d57bc0c2754ce396956f305f5b4629,
title = "A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos",
abstract = "Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated.",
keywords = "spatiotemporal saliency detection, human gaze, convolutional neural networks, visual dispersion, movie highlight extraction",
author = "Zheng Wang and Jinchang Ren and Dong Zhang and Meijun Sun and Jianmin Jiang",
year = "2018",
month = "2",
day = "2",
doi = "10.1016/j.neucom.2018.01.076",
language = "English",
journal = "Neurocomputing",
issn = "0925-2312",

}

A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. / Wang, Zheng; Ren, Jinchang; Zhang, Dong; Sun, Meijun; Jiang, Jianmin.

In: Neurocomputing, 02.02.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos

AU - Wang, Zheng

AU - Ren, Jinchang

AU - Zhang, Dong

AU - Sun, Meijun

AU - Jiang, Jianmin

PY - 2018/2/2

Y1 - 2018/2/2

N2 - Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated.

AB - Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated.

KW - spatiotemporal saliency detection

KW - human gaze

KW - convolutional neural networks

KW - visual dispersion

KW - movie highlight extraction

UR - https://www.sciencedirect.com/science/article/pii/S0925231218301097

U2 - 10.1016/j.neucom.2018.01.076

DO - 10.1016/j.neucom.2018.01.076

M3 - Article

JO - Neurocomputing

T2 - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -