Skip to main navigation Skip to search Skip to main content

Synthesizing mixed-type electronic health records using diffusion models

Taha Ceritli*, Ghadeer O. Ghosheh, Vinod Kumar Chauhan, Tingting Zhu, Andrew P. Creagh, David A. Clifton

*Corresponding author for this work

Research output: Working paper/Preprint/Pre-registrationWorking Paper/Preprint

Abstract

Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However, recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound. In this work, we investigate the potential of diffusion models for generating realistic mixed-type tabular EHRs, comparing TabDDPM model with existing methods on four datasets in terms of data quality, utility, privacy, and augmentation. Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.
Original languageEnglish
Place of PublicationIthaca, NY
Number of pages8
DOIs
Publication statusPublished - 10 Aug 2023

Keywords

  • electronic health records
  • diffusion models

Fingerprint

Dive into the research topics of 'Synthesizing mixed-type electronic health records using diffusion models'. Together they form a unique fingerprint.

Cite this