Publications

7 papers · See Google Scholar for up-to-date metrics

Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study

MMM Fahim, MR Karim

Temporal validation over eight years eliminates look-ahead bias that inflates most published health ML results. The model surfaces ~1,300 more at-risk children annually than gradient boosting, with fairness audits that reveal where the model succeeds and where it fails.

2026 · arXiv:2602.03957

arXiv → PDF →

Machine LearningPublic HealthBangladeshChild Mortality

Abstract

Predictive machine learning models for child mortality tend to be inaccurate when applied to future populations due to look-ahead bias from standard cross-validation. Using Demographic and Health Surveys (DHS) data from Bangladesh for 2011–2022 (n = 33,962), we train on 2011–2014 data, validate on 2017, and test on 2022—eight years after model training. A genetic algorithm-based Neural Architecture Search identified a single-layer neural architecture (64 units) superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). A fairness audit revealed a 'Socioeconomic Predictive Gradient': the model performed highest in the least affluent divisions (AUC 0.74) and decreased in the wealthiest (AUC 0.66), indicating it identifies areas with greatest need. Validated with SHAP values and Platt Calibration, the model identifies approximately 1,300 additional at-risk children annually at the 10% screening level compared to gradient boosting.

Cite / details →

The Dependency Divide: An Interpretable Machine Learning Framework for Profiling Student Digital Satisfaction in the Bangladesh Context

MMM Fahim, H Ankona, MM Huq, MR Karim

High digital engagement makes students more vulnerable to infrastructure failures, not less — the 'Dependency Divide'. Targeted reliability improvements for heavy users yield 2× the return of blanket interventions.

2026 · arXiv:2601.01231

arXiv → PDF →

Machine LearningEducationBangladeshInterpretability

Abstract

While digital access has expanded rapidly in resource-constrained contexts, satisfaction with digital learning platforms varies significantly among students with seemingly equal connectivity. This study introduces the 'Dependency Divide', a novel framework proposing that highly engaged students become conditionally vulnerable to infrastructure failures, challenging assumptions that engagement uniformly benefits learners in post-access environments. Using a cross-sectional study of 396 university students in Bangladesh, we apply K-prototypes clustering, profile-specific Random Forest models with SHAP and ALE analysis, and formal interaction analysis with propensity score matching. Three profiles emerged: Casually Engaged (58%), Efficient Learners (35%), and Hyper-Engaged (7%). A significant interaction between educational device time and internet reliability (β = 0.033, p = 0.028) confirmed the Dependency Divide: engagement increased satisfaction only when infrastructure remained reliable. Policy simulations demonstrated targeted reliability improvements for high-dependency users yielded 2.06× greater returns than uniform interventions.

Cite / details →

The Depth Delusion: Why Transformers Should Be Wider, Not Deeper

MMM Fahim, MR Karim

Width should grow 2.8× faster than depth in transformers — validated across 30 architectures up to 7B parameters. Past a critical depth, adding layers actively hurts, even though parameter count increases.

2026 · arXiv:2601.20994

arXiv → PDF →

TransformersScaling LawsDeep LearningArchitecture

Abstract

Neural scaling laws describe how language model loss decreases with parameters and data, but treat architecture as interchangeable. We propose architecture-conditioned scaling laws decomposing depth-width dependence, finding that optimal depth scales as D* ~ C^0.12 while optimal width scales as W* ~ C^0.34, meaning width should grow 2.8× faster than depth. We discover a critical depth phenomenon: beyond D_crit ~ W^0.44 (sublinear in W), adding layers increases loss despite adding parameters—the Depth Delusion. Validated across 30 transformer architectures spanning 17M to 7B parameters (R² = 0.922), our central finding is that at 7B scale a 64-layer model (6.38B params) underperforms a 32-layer model (6.86B params) by 0.12 nats, despite being significantly deeper—demonstrating that optimal depth-width tradeoffs persist at production scale.

Cite / details →

Distributed Causality in the SDG Network: Evidence from Panel VAR and Conditional Independence Analysis

MMM Fahim, MJH Imran, L Debnath, T Shill, MN Molla, EB Pranto, MSS Saad, MR Karim

First complete causal map of SDG interdependencies across 168 countries — no single 'hub' goal exists. Education → Inequality is the strongest direct link, but its effect size varies 10× by national income level.

2026 · arXiv:2601.20875

arXiv → PDF →

Causal InferenceSDGsPanel VARStatistics

Abstract

Achievement of the 2030 Sustainable Development Goals depends on strategic resource distribution. We propose a causal discovery framework using Panel Vector Autoregression with country-specific fixed effects and PCMCI+ conditional independence testing on 168 countries (2000–2025) to develop the first complete causal architecture of SDG dependencies. Analyzing 8 strategically chosen SDGs, we identify a distributed causal network (no single 'hub' SDG) with 10 statistically significant Granger-causal relationships as 11 unique direct effects. Education to Inequality is the most statistically significant direct relationship (r = −0.599; p < 0.05), with effect magnitude varying substantially by income level (high-income: r = −0.65; lower-middle-income: r = −0.06, non-significant). We propose a tiered priority framework identifying upstream drivers (Education, Growth), enabling goals (Institutions, Energy), and downstream outcomes (Poverty, Health), concluding that effective SDG acceleration requires coordinated multi-dimensional interventions rather than single-goal sequential strategies.

Cite / details →

A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

Md. Naim Molla, MMM Fahim, Md. Binyamin, MR Karim

Multilingual sentiment classification of government mobile banking app reviews (English + Bangla). Benchmarks several architectures for monitoring public service quality through NLP.

Preprint · 2026

arXiv →

NLPSentiment AnalysisBanglaMobile Banking

Abstract

This study presents a multi-model approach for sentiment classification of user reviews of government mobile banking applications in Bangladesh, handling both English and Bangla language inputs. We benchmark several classification architectures on a curated review dataset and evaluate their performance across sentiment categories, with implications for public service improvement and digital governance monitoring.

Cite / details →

Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting

MMM Fahim, SH Yesmin, S Islam, MPB Faruque, MA Salam, MM Uddin, S Islam, T Ahmed, M Binyamin, MR Karim

239× fewer parameters than GraphCast at near-identical accuracy — principled multi-objective NAS can find truly deployable models. Transfer learning adds ~5% accuracy gains when historical data is scarce.

2026 · arXiv:2602.00240

arXiv → PDF →

Neural Architecture SearchWeather ForecastingEdge ComputingGreen AI

Abstract

We introduce Green-NAS, a multi-objective neural architecture search (NAS) framework designed for low-resource environments using weather forecasting as a case study. Adhering to Green AI principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The search simultaneously optimizes model accuracy and efficiency to find lightweight architectures with very few parameters. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (within 1.4% of a manually tuned baseline) using only 153k parameters—239 times fewer than globally deployed models such as GraphCast. Transfer learning improves forecasting accuracy by approximately 5.2% compared to training a new model per city when historical data is limited.

Cite / details →

Pre-trained Encoders for Global Child Development: Transfer Learning Enables Deployment in Data-Scarce Settings

MMM Fahim, MR Karim

Pre-trained on 357K children across 44 countries, this encoder solves the cold-start problem: with only 50 samples it beats gradient boosting by 8–12%. Zero-shot to unseen countries still reaches AUC 0.84.

2026 · arXiv:2601.20987

arXiv → PDF →

Transfer LearningChild DevelopmentGlobal HealthSDGs

Abstract

A large number of children experience preventable developmental delays each year, yet deployment of machine learning in new countries is stymied by a data bottleneck: reliable models require thousands of samples, while new programs begin with fewer than 100. We introduce the first pre-trained encoder for global child development, trained on 357,709 children across 44 countries using UNICEF survey data. With only 50 training samples, the pre-trained encoder achieves an average AUC of 0.65 (95% CI: 0.56–0.72), outperforming cold-start gradient boosting by 8–12% across regions. At N = 500, the encoder achieves AUC of 0.73. Zero-shot deployment to unseen countries achieves AUCs up to 0.84. We apply a transfer learning bound to explain why pre-training diversity enables few-shot generalization, establishing that pre-trained encoders can transform the feasibility of ML for SDG 4.2.1 monitoring in resource-constrained settings.

Cite / details →

All papers are arXiv preprints. For the most current citation counts, see Google Scholar .