Causal Counterfactual visualisation for human causal decision making – A case study in healthcare

Project: Research

Project Details

Description

This project aims at a robust, fast paced proof-of-concept to unlock the potential of AI in biomedical and health research. It will apply the newly emerging generative AI technology to transform biomedical and health research by enabling virtual clinical trial emulation with synthetic data. The research outcome will address key limitations in both Randomised Controlled Trials (RCTs) and observational studies.

Layman's description

Health data contains important knowledge that enables clinical research to assess treatment effect in real world settings. However, there are significant limitations: they are typically imbalanced across different population, diseases and interventions; they contain bias, noise and missing measurements; the process of removing patient identifiable information may take significant time and effort, which also faces the risk of deleting valuable information from the original data.
This project is designed to investigate an alternative approach to support clinical research through the use of synthetic data. We will study the feasibility of creating synthetic health data with the help of the latest AI technology, namely the generative AI, to generate synthetic data that preserve the same value for research as real data. To answer the clinical questions about treatment effect, our clinical trial emulation will run a “virtual trial” on the synthetic data.
The goal of this project is to study the feasibility of this new approach through a specific use case in the context of Type 2 diabetes mellites (T2DM). Through training the AI model with the SCI Diabetes data on Safe Haven, we aim to create their synthetic version and then we will carry out a virtual trial to assess the effect of a target medicine. We will assess this new approach by comparing the outcomes from the trial emulation with the real ones.

The potential benefit of the new approach with synthetic health data include the following aspects:
- Quality of the data: This approach can generate synthetic data to address the problems that lie within the real data, including bias, data imbalance, noise and missing measurements.
- Research agenda: Trial emulation can be tailored to create virtual populations to address target clinical questions via clinical trial emulations, which would otherwise be impossible to address in real-world trials. For example, the clinical questions that are associated with underrepresented populations of children, older adults, and patients with multi-morbidities and polypharmacy – these people are commonly excluded in clinical trials. In fat, Randomised Controlled Trials (RCTs) are far from being able to answer all clinical questions. In many situations conducting RCTs with real patients is logistically challenging or unethical due to their potentially harmful nature. This leaves a significant knowledge gap. For example, a major drawback of the current clinical guidelines is that most of them only address single diseases with very few recommendations for multi-morbidity management despite the high prevalence.
- Privacy protection: Compared with anonymised real data (which contains reduced information about real patients), this new approach can generate synthetic data in unlimited volume while containing no identifiable information about real individuals. Hence, this is in a much better position to overcome legal barriers in data protection and sharing.
Overall, this research will assess the potentials of a new way to provide real-world evidence to support future clinical research with better quality and privacy protection. This will open doors for further research in this direction, which could ultimately bring a landscape change to revolutionise future biomedical and health research by broadening its research agenda, liberating its restrictions, saving cost and time. Research in this direction will speed up new timelines for treatment discovery, address increasingly complex healthcare landscape in elderly population and multi-morbidity, and potentially transform regulatory and policy making process. This will be fully aligned with the remit of Safe Haven by using health data to understand better the causes of disease, the effectiveness of drugs, or the impact of health services.
Note, we do not claim that this single research will provide all the solutions and answers to this synthetic data approach and yield impact. Rather, it is a feasibility study to bring first set of evidence to assess whether this can be achieved by leveraging the latest AI technology – see more details below in the Methodology.
AcronymMRC-GAN
StatusActive
Effective start/end date1/07/2331/12/25

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.