Skip to main navigation Skip to search Skip to main content

Maximum Gaussianality training for deep speaker vector normalization

Yunqi Cai, Lantian Li, Andrew Abel, Xiaoyan Zhu, Dong Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

44 Downloads (Pure)

Abstract

Automatic Speaker Verification (ASV) is a critical task in pattern recognition and has been applied to various security-sensitive scenarios. The current state-of-the-art technique for ASV is based on deep embedding. However, a significant challenge with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. To address this issue, this paper proposes a novel training method called Maximum Gaussianality (MG), which regulates the distribution of the speaker vectors. Compared to the conventional normalization approach based on maximum likelihood (ML), the new approach directly maximizes the Gaussianality of the latent codes, and therefore can both normalize the between-class and within-class distributions in a controlled and reliable way and eliminate the unbound likelihood problem associated with the conventional ML approach. Our experiments on several datasets demonstrate that our MG-based normalization can deliver much better performance than the baseline systems without normalization and outperform discriminative normalization flow (DNF), an ML-based normalization method, particularly when the training data is limited. In theory, the MG criterion can be applied to any task in any research domain where Gaussian distributions are needed, making the MG training a versatile tool.
Original languageEnglish
Article number109977
Number of pages12
JournalPattern Recognition
Volume145
Early online date18 Sept 2023
DOIs
Publication statusPublished - 31 Jan 2024

Keywords

  • speaker embedding
  • normalization flow
  • Gaussianality training

Fingerprint

Dive into the research topics of 'Maximum Gaussianality training for deep speaker vector normalization'. Together they form a unique fingerprint.

Cite this