A data driven approach to audiovisual speech mapping

Andrew Abel, Ricard Marxer, Jon Barker, Roger Watt, Bill Whitmer, Peter Derleth, Amir Hussain*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

7 Citations (Scopus)
8 Downloads (Pure)

Abstract

The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

Original languageEnglish
Title of host publicationAdvances in Brain Inspired Cognitive Systems
Subtitle of host publication8th International Conference, BICS 2016, Beijing, China, November 28-30, 2016, Proceedings
EditorsCheng-Lin Liu, Amir Hussain, Bin Luo, Kay Chen Tan, Yi Zeng, Zhaoxiang Zhang
Place of PublicationCham, Switzerland
PublisherSpringer-Verlag
Pages331-342
Number of pages12
ISBN (Electronic)9783319496856
ISBN (Print)9783319496849
DOIs
Publication statusPublished - 13 Nov 2016
Event8th International Conference on Brain Inspired Cognitive Systems, BICS 2016 - Beijing, China
Duration: 28 Nov 201630 Nov 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10023 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Conference on Brain Inspired Cognitive Systems, BICS 2016
Country/TerritoryChina
CityBeijing
Period28/11/1630/11/16

Keywords

  • ANNs
  • audiovisual
  • speech mapping
  • speech processing

Fingerprint

Dive into the research topics of 'A data driven approach to audiovisual speech mapping'. Together they form a unique fingerprint.

Cite this