Investigating variability in child’s speech (VariCS): introducing the VariCS corpus

Research output: Contribution to conferencePosterpeer-review


Background The Variability in Child Speech (VariCS) corpus will be a collection of audio recordings from primary school aged children in Scotland aged five to eleven years. The corpus data, collected as part of the VariCS project, aims to systematically chart variability in child speech from a longitudinal perspective and establish typical ranges for acoustic measures associated with the speech subsystems of respiration, phonation, resonance, and articulation. A comprehensive understanding of variability in typical speech development and its change over time is valuable in supporting the identification and interpretation of developmental speech patterns that extend beyond the typical ranges.
Corpus information: The VariCS corpus is being created through a collection of audio speech recordings from children aged five to eleven years. Speech data are being collected from three cohorts of children in Scotland (Cohort 1 – 5 to 6 years old at start of data collection; Cohort 2 – 7 to 8 years old; Cohort 3 – 9 to 10 years old), recorded four times at six-month intervals over the course of 1.5 years to allow for longitudinal analyses. The first data was collected in May 2023, with data collection for round 2 underway. To date (November 2023), audio recordings of 212 children have been collected.
Speech data: The data include connected speech from a story retell, a picture description, as well as sentence repetition (consisting of six sentences). The connected speech data are complemented by a single word picture naming task of 22 single syllable words to elicit Scottish corner vowels and allow for analyses of liquids in word initial and final position. Each word is elicited three times per data collection point to allow for variability analyses. Finally, non-speech data is collected in the form of sound prolongation and diadochokinesis (DDK) tasks. Information on the child’s age, children’s and parents’ dialectal background as well as any medical conditions has also been collected to better understand factors influencing performance. The Diagnostic Evaluation of Articulation and Phonology Screen (DEAP[1]) is conducted during the first data collection time point to screen for speech sound disorders in children and add to our understanding of thresholds for typical speech development.
Recordings: Data is collected in primary schools across seven councils in the Scottish belt including Glasgow City, West Dunbartonshire, North and South Lanarkshire, City of Edinburgh, Fife, and Scottish Borders. The councils represent rural and urban areas. SIMD rank of each participating school is part of the meta-data information collected. Data is collected in schools through a designated iPad app. A unidirectional head-mounted condenser microphone (Shure SM35) is used to minimise background noise and to ensure a constant mouth-to-microphone distance. Recordings are sampled at 44.1KHz.
Benefits of the resource: The corpus will provide a substantial and unique resource capturing the longitudinal speech development of primary school-aged children living in Scotland. Analyses of the rich speech data set will contribute the field’s understanding of variability in typical as well as atypical child speech development. The data will further be a large-scale resource for sociolinguists interested in charting the development of socially-motivated variation in primary school-aged children.
Original languageEnglish
Publication statusPublished - 25 Mar 2024
Event2024 Colloquium of the British Association of Academic Phoneticians - Cardiff, United Kingdom
Duration: 25 Mar 202427 Mar 2024


Conference2024 Colloquium of the British Association of Academic Phoneticians
Abbreviated titleBAAP 2024
Country/TerritoryUnited Kingdom


  • child's speech
  • speech disorder
  • VariCS


Dive into the research topics of 'Investigating variability in child’s speech (VariCS): introducing the VariCS corpus'. Together they form a unique fingerprint.

Cite this