Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study

Md Muhtasim Munif Fahim, M. Monimul Huq, M. Sabiruzzaman, Md. Rezaul Karim

BMC Public Health · 2026 · submitted · arXiv:2602.03957

TL;DR

The revised arXiv preprint is now on v2 and reflects the current BMC Public Health submission. In a four-round BDHS benchmark, validation-regime choice changed screening workload estimates more than architecture choice, making temporal splits and capacity-based metrics essential before programmatic use.

Abstract

Bangladesh has reduced under-five mortality substantially, but preventable deaths remain unevenly distributed across households and divisions. Prediction models based on Demographic and Health Survey (DHS) data could help planners prioritise follow-up, referral, and resource allocation—but only if reported performance reflects future public-health use. We analysed four Bangladesh DHS rounds (2011, 2014, 2017, and 2022; 33,962 children; 1,290 under-five deaths), evaluating identical 26-feature pipelines and three model classes under four validation regimes: pooled random 80/20, matched-size pooled random, 2022-only random, and cross-survey temporal validation (train 2011+2014, validate/calibrate on 2017, test on 2022). A 32-unit ELU multilayer perceptron selected by genetic-algorithm neural architecture search was compared with XGBoost and logistic regression. Validation regime changed public-health interpretation more than model class: AUROC ranged from 0.669 under 2022-only random validation to 0.775 under pooled random validation, with a temporal estimate of 0.730. At the top-10% temporal screening threshold, the model identified 152 of 355 observed 2022 deaths (sensitivity 42.8%, PPV 13.2%, NNS 7.6). Across validation designs, the same model implied number-needed-to-screen values from 5.6 to 11.0—changing the expected follow-up workload substantially. Cross-round temporal validation gives planners a more defensible basis for estimating community-health-worker follow-up, referral demand, and budget scenarios than random-split AUROC alone.

Machine LearningPublic HealthBangladeshUnder-Five MortalityTemporal ValidationBDHSNeural Architecture SearchFairness

BibTeX

@article{fahim2026temporal,
  title   = {Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study},
  author  = {Md Muhtasim Munif Fahim and M. Monimul Huq and M. Sabiruzzaman and Md. Rezaul Karim},
  year    = {2026},
  journal = {BMC Public Health},
  eprint  = {2602.03957},
  archivePrefix = {arXiv},
  url     = {https://arxiv.org/abs/2602.03957},
}