A semi-automatic pipeline for transcribing and segmenting child speech

Polychronia Christodoulidou, James Tanner, Jane Stuart-Smith, Michael McAuliffe, Mridhula Murali, Amy Smith, Lauren Taylor, Joanne Cleland, Anja Kuschmann

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

16 Downloads (Pure)

Abstract

This study evaluates both automated transcription (WhisperX) and forced alignment (MFA) in developing a semi-automated pipeline for obtaining acoustic vowel measures from field recordings from 275 children speaking a non-standard, English dialect, Scottish English. As expected, manual correction of speech transcriptions before forced alignment improves the quality of acoustic vowel measures with respect to manually-annotated data, though speech style and recording environment present some challenges for both tools. Adaptation of the MFA pre-trained english_us_arpa acoustic model towards the children's speech also improves the quality of acoustic measures, though greater improvement was not found by increasing training sample size.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2025
Pages4278-4282
Number of pages5
DOIs
Publication statusPublished - 14 Aug 2025
EventInterspeech - Rotterdam, Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025
https://www.interspeech2025.org/home

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

ConferenceInterspeech
Country/TerritoryNetherlands
CityRotterdam
Period17/08/2521/08/25
Internet address

Funding

This research was supported by ESRC grant ES/W003244/1.

Keywords

  • child speech
  • automated speech processing
  • non-standard English
  • Whisper X
  • force alignment

Fingerprint

Dive into the research topics of 'A semi-automatic pipeline for transcribing and segmenting child speech'. Together they form a unique fingerprint.

Cite this