Abstract
Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.
| Original language | English |
|---|---|
| Title of host publication | Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings |
| Editors | Jinchang Ren, Amir Hussain, Huimin Zhao, Jun Cai, Rongjun Chen, Yinyin Xiao, Kaizhu Huang, Jiangbin Zheng |
| Publisher | Springer |
| Pages | 169-179 |
| Number of pages | 11 |
| Volume | 11691 |
| ISBN (Electronic) | 9783030394318 |
| ISBN (Print) | 9783030394301 |
| DOIs | |
| Publication status | Published - 1 Feb 2020 |
| Event | 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019 - Guangzhou, China Duration: 13 Jul 2019 → 14 Jul 2019 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 11691 LNAI |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019 |
|---|---|
| Country/Territory | China |
| City | Guangzhou |
| Period | 13/07/19 → 14/07/19 |
Funding
Acknowledgments. This work was supported by XJTLU Grant RDF 16-01-35, and partially funded by the Research Institute of Big Data Analytics.
Keywords
- audiovisual
- Chinese
- gabor transform
- speech recognition