AI Researcher · BSc & MSc in Statistics · University of Rajshahi · Bangladesh
Statistics taught me to ask the right questions. Machine learning gave me sharper tools.
My work connects both: building AI systems efficient enough for edge hardware, rigorous
enough for public health decisions, and honest enough about their own failure modes.
Based in Bangladesh, I'm currently working in the Data Science Research Lab of our
Statistics Department at the University of Rajshahi, under the supervision of Prof. Dr. M. Rezaul Karim.
Now, I'm seeking a PhD position in artificial intelligence, machine learning, or related fields.
Width should grow 2.8× faster than depth in transformers, validated across 30 architectures up to 7B parameters. Past a critical depth, adding layers actively hurts, even though parameter count increases.
Neural scaling laws describe how language model loss decreases with parameters and data, but treat architecture as interchangeable. We propose architecture-conditioned scaling laws decomposing depth-width dependence, finding that optimal depth scales as D* ~ C^0.12 while optimal width scales as W* ~ C^0.34, meaning width should grow 2.8× faster than depth. We discover a critical depth phenomenon: beyond D_crit ~ W^0.44 (sublinear in W), adding layers increases loss despite adding parameters, the effect we call the Depth Delusion. Validated across 30 transformer architectures spanning 17M to 7B parameters (R² = 0.922), our central finding is that at 7B scale a 64-layer model (6.38B params) underperforms a 32-layer model (6.86B params) by 0.12 nats, despite being significantly deeper. This demonstrates that optimal depth-width tradeoffs persist at production scale.
239× fewer parameters than GraphCast at near-identical accuracy. Principled multi-objective NAS can find truly deployable models, and transfer learning adds ~5% accuracy gains when historical data is scarce.
2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN) · 2026 · published · arXiv:2602.00240
Neural Architecture SearchWeather ForecastingEdge ComputingGreen AI
Abstract
We introduce Green-NAS, a multi-objective neural architecture search (NAS) framework designed for low-resource environments using weather forecasting as a case study. Adhering to Green AI principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The search simultaneously optimizes model accuracy and efficiency to find lightweight architectures with very few parameters. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (within 1.4% of a manually tuned baseline) using only 153k parameters, 239 times fewer than globally deployed models such as GraphCast. Transfer learning improves forecasting accuracy by approximately 5.2% compared to training a new model per city when historical data is limited.
Pre-trained on 357K children across 44 countries, this encoder solves the cold-start problem: with only 50 samples it beats gradient boosting by 8–12%. Zero-shot to unseen countries still reaches AUC 0.84.
Transfer LearningChild DevelopmentGlobal HealthSDGs
Abstract
A large number of children experience preventable developmental delays each year, yet deployment of machine learning in new countries is stymied by a data bottleneck: reliable models require thousands of samples, while new programs begin with fewer than 100. We introduce the first pre-trained encoder for global child development, trained on 357,709 children across 44 countries using UNICEF survey data. With only 50 training samples, the pre-trained encoder achieves an average AUC of 0.65 (95% CI: 0.56–0.72), outperforming cold-start gradient boosting by 8–12% across regions. At N = 500, the encoder achieves AUC of 0.73. Zero-shot deployment to unseen countries achieves AUCs up to 0.84. We apply a transfer learning bound to explain why pre-training diversity enables few-shot generalization, establishing that pre-trained encoders can transform the feasibility of ML for SDG 4.2.1 monitoring in resource-constrained settings.
An LLM agent that iteratively self-corrects its outputs
A Python framework for building LLM agents with built-in self-correction loops, enabling more reliable and accurate outputs through iterative refinement.
Multi-objective NAS for efficiency under real deployment constraints
A research project on multi-objective neural architecture search (NAS) that explicitly targets efficiency (parameters/energy) while keeping forecasting accuracy competitive.
Legacy Fiverr track: 200+ projects, 4.8+ rating, Level Two Seller
WordPress, Shopify, and Wix development for businesses worldwide. This is the older Fiverr lane that established delivery credibility before the AI/data portfolio.
Data Science Research Lab · University of Rajshahi, Department of Statistics
Working with Prof. M. Rezaul Karim on machine learning applications for public health, neural architecture search, and causal inference for sustainable development goals. Author of 6 arXiv preprints covering ML, transformers, NAS, and statistical modelling.
Level Two Seller, Web Development, Data & Machine Learning
200+ successful projects in WordPress, Shopify, Wix, data analysis, and machine learning support. Maintained a 4.8+ rating across 4+ years, spanning custom themes, e-commerce, CMS migration, responsive design, and applied data/ML workflows.
Managed operations for the university's largest volunteer organization (SDG Goal 4). Oversaw volunteer development workshops and skill-building events.
Program Intern
Sep 2022 – Oct 2022
Real Star Society · Rajshahi, Bangladesh
Contributed across community development areas: event management, health advocacy campaigns, environmental conservation, and visual content for outreach.
Helped underprivileged children with education support, school supplies, and clothing as part of the largest volunteer organization at the University of Rajshahi.