Background
Causal knowledge is essential for understanding complex systems and revealing relationships between variables. It enables researchers to transition beyond correlations,
reason about cause and effect, and derive scientific insights. Although Randomized
Controlled Trials (RCT) remain the gold standard for causal inference, they are often
infeasible due to ethical, logistical, or financial constraints and may lack real-world
applicability. In contrast, observational data offer abundant, diverse samples, making them well-suited for large-scale analysis. Despite susceptibility to confounding,
advances in structure learning from observations allow researchers to identify causal
relationships without relying on randomized experiments.
Research objectives
This thesis challenges conventional maximum likelihood estimation (MLE)-based methods by exploring adversarial causal discovery approaches. It leverages the Wasserstein
Generative Adversarial Network with Gradient Penalty (WGAN-GP) framework to
address key limitations: (1) model overfitting from simplistic loss functions; (2) dependence on single parametric assumptions that hinder accurate causal graph recovery
reflective of true data relationships; (3) high computational cost from Augmented Lagrangian optimization in the NOTEARS framework; and (4) inability to perform causal
discovery and tabular data synthesis simultaneously under a single framework.
Methods
Three models were developed using the WGAN-GP framework. The first, DAG-WGAN
integrates WGAN-GP with variational inference, leveraging hybrid losses for improved
causal modeling. The second, DAG-WGAN+ enhances continuous optimization with
efficient structure learning techniques. The third, DAGAF captures variable interdependencies under various causal assumptions to generate synthetic data preserving
causal relations.
Results
All models target multivariate causal discovery and were rigorously evaluated using
Structural Hamming Distance (SHD). Results show they outperform leading methods
in causal discovery across 97.47% of all test cases. In real-world experiments, the
proposed models achieve superior accuracy (SHD = 8 vs. > 10 for state-of-the-art
models). Findings further reveal that precise causal modeling enhances synthetic data
quality by preserving underlying causal mechanisms.
| Date of Award | 3 Dec 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - University Of Strathclyde
|
|---|
| Sponsors | University of Strathclyde |
|---|
| Supervisor | Feng Dong (Supervisor) & Roma Maguire (Supervisor) |
|---|