Skip to main navigation Skip to search Skip to main content

Causal discovery from observational tabular data with generative adversarial learning

Student thesis: Doctoral Thesis

Abstract

Background Causal knowledge is essential for understanding complex systems and revealing relationships between variables. It enables researchers to transition beyond correlations, reason about cause and effect, and derive scientific insights. Although Randomized Controlled Trials (RCT) remain the gold standard for causal inference, they are often infeasible due to ethical, logistical, or financial constraints and may lack real-world applicability. In contrast, observational data offer abundant, diverse samples, making them well-suited for large-scale analysis. Despite susceptibility to confounding, advances in structure learning from observations allow researchers to identify causal relationships without relying on randomized experiments. Research objectives This thesis challenges conventional maximum likelihood estimation (MLE)-based methods by exploring adversarial causal discovery approaches. It leverages the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) framework to address key limitations: (1) model overfitting from simplistic loss functions; (2) dependence on single parametric assumptions that hinder accurate causal graph recovery reflective of true data relationships; (3) high computational cost from Augmented Lagrangian optimization in the NOTEARS framework; and (4) inability to perform causal discovery and tabular data synthesis simultaneously under a single framework. Methods Three models were developed using the WGAN-GP framework. The first, DAG-WGAN integrates WGAN-GP with variational inference, leveraging hybrid losses for improved causal modeling. The second, DAG-WGAN+ enhances continuous optimization with efficient structure learning techniques. The third, DAGAF captures variable interdependencies under various causal assumptions to generate synthetic data preserving causal relations. Results All models target multivariate causal discovery and were rigorously evaluated using Structural Hamming Distance (SHD). Results show they outperform leading methods in causal discovery across 97.47% of all test cases. In real-world experiments, the proposed models achieve superior accuracy (SHD = 8 vs. > 10 for state-of-the-art models). Findings further reveal that precise causal modeling enhances synthetic data quality by preserving underlying causal mechanisms.
Date of Award3 Dec 2025
Original languageEnglish
Awarding Institution
  • University Of Strathclyde
SponsorsUniversity of Strathclyde
SupervisorFeng Dong (Supervisor) & Roma Maguire (Supervisor)

Cite this

'