Skip to main navigation Skip to search Skip to main content

Development of a novel imputation framework for PM2.5 particle data in Pakistani cities using machine learning and statistical techniques

Muhammad Asad Khan*, Jiazhu Pan, Amani Alshatti, Ahmad Alsaber*, Alison Gray

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)

Abstract

Introduction: Missing PM2.5 observations in environmental monitoring systems, caused by sensor malfunctions, communication failures, maintenance issues, and coverage gaps, compromise public health assessments and evidence-based air quality policymaking. Reliable imputation strategies are therefore essential to preserve data integrity and analytical validity.

Methods: This study evaluated five imputation techniques: Bayesian Regression (BR), K-Nearest Neighbors (KNN), missForest, Predictive Mean Matching (PMM), and Random Forest (RF), using daily PM2.5 measurements collected between May 2019 and December 2024 from monitoring stations in Islamabad, Karachi, Lahore, and Peshawar, Pakistan. Three missing data mechanisms, MCAR, MAR, and MNAR, were simulated at missing rates ranging from 5% to 25%. Model performance was assessed using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).

Results: Imputation under the MAR mechanism consistently yielded lower error values as missingness increased. Across all mechanisms and missing rates, missForest and KNN demonstrated superior performance. Notably, missForest achieved the lowest RMSE and MAE values overall and effectively preserved the temporal structure, range, and variability of the PM2.5 series.

Discussion: The findings suggest that machine-learning-based approaches, particularly missForest, provide robust and reliable imputation for PM2.5 datasets with varying missingness patterns. These results support the use of missForest as a preferred method for handling incomplete air quality data in similar monitoring contexts, thereby strengthening the reliability of environmental health analyses and air quality policy development.
Original languageEnglish
Article number1775982
Number of pages15
JournalFrontiers in Environmental Science
Volume14
DOIs
Publication statusPublished - 19 Feb 2026

Keywords

  • air quality monitoring
  • machine learning
  • missForest
  • Pakistan
  • PM2.5 missing data imputation

Fingerprint

Dive into the research topics of 'Development of a novel imputation framework for PM2.5 particle data in Pakistani cities using machine learning and statistical techniques'. Together they form a unique fingerprint.

Cite this