Publications

7 papers · See Google Scholar for up-to-date metrics

Temporal validation over eight years eliminates look-ahead bias that inflates most published health ML results. The model surfaces ~1,300 more at-risk children annually than gradient boosting, with fairness audits that reveal where the model succeeds and where it fails.

2026 · arXiv:2602.03957
Machine LearningPublic HealthBangladeshChild Mortality
Abstract

Predictive machine learning models for child mortality tend to be inaccurate when applied to future populations due to look-ahead bias from standard cross-validation. Using Demographic and Health Surveys (DHS) data from Bangladesh for 2011–2022 (n = 33,962), we train on 2011–2014 data, validate on 2017, and test on 2022—eight years after model training. A genetic algorithm-based Neural Architecture Search identified a single-layer neural architecture (64 units) superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). A fairness audit revealed a 'Socioeconomic Predictive Gradient': the model performed highest in the least affluent divisions (AUC 0.74) and decreased in the wealthiest (AUC 0.66), indicating it identifies areas with greatest need. Validated with SHAP values and Platt Calibration, the model identifies approximately 1,300 additional at-risk children annually at the 10% screening level compared to gradient boosting.

Cite / details →

High digital engagement makes students more vulnerable to infrastructure failures, not less — the 'Dependency Divide'. Targeted reliability improvements for heavy users yield 2× the return of blanket interventions.

2026 · arXiv:2601.01231
Machine LearningEducationBangladeshInterpretability
Abstract

While digital access has expanded rapidly in resource-constrained contexts, satisfaction with digital learning platforms varies significantly among students with seemingly equal connectivity. This study introduces the 'Dependency Divide', a novel framework proposing that highly engaged students become conditionally vulnerable to infrastructure failures, challenging assumptions that engagement uniformly benefits learners in post-access environments. Using a cross-sectional study of 396 university students in Bangladesh, we apply K-prototypes clustering, profile-specific Random Forest models with SHAP and ALE analysis, and formal interaction analysis with propensity score matching. Three profiles emerged: Casually Engaged (58%), Efficient Learners (35%), and Hyper-Engaged (7%). A significant interaction between educational device time and internet reliability (β = 0.033, p = 0.028) confirmed the Dependency Divide: engagement increased satisfaction only when infrastructure remained reliable. Policy simulations demonstrated targeted reliability improvements for high-dependency users yielded 2.06× greater returns than uniform interventions.

Cite / details →

Width should grow 2.8× faster than depth in transformers — validated across 30 architectures up to 7B parameters. Past a critical depth, adding layers actively hurts, even though parameter count increases.

2026 · arXiv:2601.20994
TransformersScaling LawsDeep LearningArchitecture
Abstract

Neural scaling laws describe how language model loss decreases with parameters and data, but treat architecture as interchangeable. We propose architecture-conditioned scaling laws decomposing depth-width dependence, finding that optimal depth scales as D* ~ C^0.12 while optimal width scales as W* ~ C^0.34, meaning width should grow 2.8× faster than depth. We discover a critical depth phenomenon: beyond D_crit ~ W^0.44 (sublinear in W), adding layers increases loss despite adding parameters—the Depth Delusion. Validated across 30 transformer architectures spanning 17M to 7B parameters (R² = 0.922), our central finding is that at 7B scale a 64-layer model (6.38B params) underperforms a 32-layer model (6.86B params) by 0.12 nats, despite being significantly deeper—demonstrating that optimal depth-width tradeoffs persist at production scale.

Cite / details →

MMM Fahim, MJH Imran, L Debnath, T Shill, MN Molla, EB Pranto, MSS Saad, MR Karim

First complete causal map of SDG interdependencies across 168 countries — no single 'hub' goal exists. Education → Inequality is the strongest direct link, but its effect size varies 10× by national income level.

2026 · arXiv:2601.20875
Causal InferenceSDGsPanel VARStatistics
Abstract

Achievement of the 2030 Sustainable Development Goals depends on strategic resource distribution. We propose a causal discovery framework using Panel Vector Autoregression with country-specific fixed effects and PCMCI+ conditional independence testing on 168 countries (2000–2025) to develop the first complete causal architecture of SDG dependencies. Analyzing 8 strategically chosen SDGs, we identify a distributed causal network (no single 'hub' SDG) with 10 statistically significant Granger-causal relationships as 11 unique direct effects. Education to Inequality is the most statistically significant direct relationship (r = −0.599; p < 0.05), with effect magnitude varying substantially by income level (high-income: r = −0.65; lower-middle-income: r = −0.06, non-significant). We propose a tiered priority framework identifying upstream drivers (Education, Growth), enabling goals (Institutions, Energy), and downstream outcomes (Poverty, Health), concluding that effective SDG acceleration requires coordinated multi-dimensional interventions rather than single-goal sequential strategies.

Cite / details →

Multilingual sentiment classification of government mobile banking app reviews (English + Bangla). Benchmarks several architectures for monitoring public service quality through NLP.

Preprint · 2026
NLPSentiment AnalysisBanglaMobile Banking
Abstract

This study presents a multi-model approach for sentiment classification of user reviews of government mobile banking applications in Bangladesh, handling both English and Bangla language inputs. We benchmark several classification architectures on a curated review dataset and evaluate their performance across sentiment categories, with implications for public service improvement and digital governance monitoring.

Cite / details →

MMM Fahim, SH Yesmin, S Islam, MPB Faruque, MA Salam, MM Uddin, S Islam, T Ahmed, M Binyamin, MR Karim

239× fewer parameters than GraphCast at near-identical accuracy — principled multi-objective NAS can find truly deployable models. Transfer learning adds ~5% accuracy gains when historical data is scarce.

2026 · arXiv:2602.00240
Neural Architecture SearchWeather ForecastingEdge ComputingGreen AI
Abstract

We introduce Green-NAS, a multi-objective neural architecture search (NAS) framework designed for low-resource environments using weather forecasting as a case study. Adhering to Green AI principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The search simultaneously optimizes model accuracy and efficiency to find lightweight architectures with very few parameters. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (within 1.4% of a manually tuned baseline) using only 153k parameters—239 times fewer than globally deployed models such as GraphCast. Transfer learning improves forecasting accuracy by approximately 5.2% compared to training a new model per city when historical data is limited.

Cite / details →

Pre-trained on 357K children across 44 countries, this encoder solves the cold-start problem: with only 50 samples it beats gradient boosting by 8–12%. Zero-shot to unseen countries still reaches AUC 0.84.

2026 · arXiv:2601.20987
Transfer LearningChild DevelopmentGlobal HealthSDGs
Abstract

A large number of children experience preventable developmental delays each year, yet deployment of machine learning in new countries is stymied by a data bottleneck: reliable models require thousands of samples, while new programs begin with fewer than 100. We introduce the first pre-trained encoder for global child development, trained on 357,709 children across 44 countries using UNICEF survey data. With only 50 training samples, the pre-trained encoder achieves an average AUC of 0.65 (95% CI: 0.56–0.72), outperforming cold-start gradient boosting by 8–12% across regions. At N = 500, the encoder achieves AUC of 0.73. Zero-shot deployment to unseen countries achieves AUCs up to 0.84. We apply a transfer learning bound to explain why pre-training diversity enables few-shot generalization, establishing that pre-trained encoders can transform the feasibility of ML for SDG 4.2.1 monitoring in resource-constrained settings.

Cite / details →
All papers are arXiv preprints. For the most current citation counts, see Google Scholar .