← All publications

Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study

MMM Fahim, MR Karim

arXiv preprint · 2026 · preprint · arXiv:2602.03957

TL;DR

Temporal validation over eight years eliminates look-ahead bias that inflates most published health ML results. The model surfaces ~1,300 more at-risk children annually than gradient boosting, with fairness audits that reveal where the model succeeds and where it fails.

Abstract

Predictive machine learning models for child mortality tend to be inaccurate when applied to future populations due to look-ahead bias from standard cross-validation. Using Demographic and Health Surveys (DHS) data from Bangladesh for 2011–2022 (n = 33,962), we train on 2011–2014 data, validate on 2017, and test on 2022—eight years after model training. A genetic algorithm-based Neural Architecture Search identified a single-layer neural architecture (64 units) superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). A fairness audit revealed a 'Socioeconomic Predictive Gradient': the model performed highest in the least affluent divisions (AUC 0.74) and decreased in the wealthiest (AUC 0.66), indicating it identifies areas with greatest need. Validated with SHAP values and Platt Calibration, the model identifies approximately 1,300 additional at-risk children annually at the 10% screening level compared to gradient boosting.

Machine LearningPublic HealthBangladeshChild MortalityNeural Architecture SearchFairnessDHS

BibTeX

@article{fahim2026child,
  title   = {Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study},
  author  = {MMM Fahim and MR Karim},
  year    = {2026},
  journal = {arXiv preprint},
  eprint  = {2602.03957},
  archivePrefix = {arXiv},
  url     = {https://arxiv.org/abs/2602.03957},
}