DAGAF: a directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis

Hristo Petkov*, Calum MacLellan, Feng Dong

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.
Original languageEnglish
Article number602
Number of pages27
JournalApplied Intelligence
Volume55
Issue number7
Early online date31 Mar 2025
DOIs
Publication statusPublished - 1 May 2025

Funding

The authors declare that their work has been funded by the United Kingdom Medical Research Council (Grant Reference: MR/X005925/1) throughout the duration of their associated research project (Virtual Clinical Trial Emulation with Generative AI Models, Duration: Sept 2022 - Feb 2023). EPSRC (Grant Refference: EP/X029778/1), Causal Counterfactual visualization for human causal decision making 2023-2025.

Keywords

  • tabular data synthesis
  • adversarial causal discovery
  • directed acyclic graph
  • additive noise model

Fingerprint

Dive into the research topics of 'DAGAF: a directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis'. Together they form a unique fingerprint.

Cite this