Muhtasim Munif Fahim

AI Researcher · BSc & MSc in Statistics
· University of Rajshahi · Bangladesh

Statistics taught me to ask the right questions. Machine learning gave me sharper tools. My work connects both: building AI systems efficient enough for edge hardware, rigorous enough for public health decisions, and honest enough about their own failure modes.

Based in Bangladesh, I'm currently working in the Data Science Research Lab of our Statistics Department at the University of Rajshahi, under the supervision of Prof. Dr. M. Rezaul Karim. Now, I'm seeking a PhD position in artificial intelligence, machine learning, or related fields.

Email Scholar LinkedIn GitHub ResearchGate ORCID

Academic

Research themes, publications, and PhD fit.

Research → CV →

Portfolio

Projects, services, and selected work.

Projects → Services →

Resources

Mentoring, writing, and open notes.

Mentoring → Blog →

Selected Publications

View all →

The Depth Delusion: Why Transformers Should Be Wider, Not Deeper

MMM Fahim, MR Karim

Width should grow 2.8× faster than depth in transformers — validated across 30 architectures up to 7B parameters. Past a critical depth, adding layers actively hurts, even though parameter count increases.

2026 · arXiv:2601.20994

arXiv → PDF →

TransformersScaling LawsDeep LearningArchitecture

Abstract

Neural scaling laws describe how language model loss decreases with parameters and data, but treat architecture as interchangeable. We propose architecture-conditioned scaling laws decomposing depth-width dependence, finding that optimal depth scales as D* ~ C^0.12 while optimal width scales as W* ~ C^0.34, meaning width should grow 2.8× faster than depth. We discover a critical depth phenomenon: beyond D_crit ~ W^0.44 (sublinear in W), adding layers increases loss despite adding parameters—the Depth Delusion. Validated across 30 transformer architectures spanning 17M to 7B parameters (R² = 0.922), our central finding is that at 7B scale a 64-layer model (6.38B params) underperforms a 32-layer model (6.86B params) by 0.12 nats, despite being significantly deeper—demonstrating that optimal depth-width tradeoffs persist at production scale.

Cite / details →

Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting

MMM Fahim, SH Yesmin, S Islam, MPB Faruque, MA Salam, MM Uddin, S Islam, T Ahmed, M Binyamin, MR Karim

239× fewer parameters than GraphCast at near-identical accuracy — principled multi-objective NAS can find truly deployable models. Transfer learning adds ~5% accuracy gains when historical data is scarce.

2026 · arXiv:2602.00240

arXiv → PDF →

Neural Architecture SearchWeather ForecastingEdge ComputingGreen AI

Abstract

We introduce Green-NAS, a multi-objective neural architecture search (NAS) framework designed for low-resource environments using weather forecasting as a case study. Adhering to Green AI principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The search simultaneously optimizes model accuracy and efficiency to find lightweight architectures with very few parameters. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (within 1.4% of a manually tuned baseline) using only 153k parameters—239 times fewer than globally deployed models such as GraphCast. Transfer learning improves forecasting accuracy by approximately 5.2% compared to training a new model per city when historical data is limited.

Cite / details →

Pre-trained Encoders for Global Child Development: Transfer Learning Enables Deployment in Data-Scarce Settings

MMM Fahim, MR Karim

Pre-trained on 357K children across 44 countries, this encoder solves the cold-start problem: with only 50 samples it beats gradient boosting by 8–12%. Zero-shot to unseen countries still reaches AUC 0.84.

2026 · arXiv:2601.20987

arXiv → PDF →

Transfer LearningChild DevelopmentGlobal HealthSDGs

Abstract

A large number of children experience preventable developmental delays each year, yet deployment of machine learning in new countries is stymied by a data bottleneck: reliable models require thousands of samples, while new programs begin with fewer than 100. We introduce the first pre-trained encoder for global child development, trained on 357,709 children across 44 countries using UNICEF survey data. With only 50 training samples, the pre-trained encoder achieves an average AUC of 0.65 (95% CI: 0.56–0.72), outperforming cold-start gradient boosting by 8–12% across regions. At N = 500, the encoder achieves AUC of 0.73. Zero-shot deployment to unseen countries achieves AUCs up to 0.84. We apply a transfer learning bound to explain why pre-training diversity enables few-shot generalization, establishing that pre-trained encoders can transform the feasibility of ML for SDG 4.2.1 monitoring in resource-constrained settings.

Cite / details →

Talks & Conferences

All talks →

IEEE QPAIN 2026 Presenter

IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking

Apr 2026 · CUET, Chattogram, Bangladesh

“Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting”

DAAS 2026 Presenter

Regional Conference on Role of Statistics in Strengthening Sustainable Agriculture and Public Health

Apr 2026 · Bangladesh Agricultural University, Mymensingh

“Green-NAS: Multi-Objective Neural Architecture Search for Sustainable and Efficient Weather Forecasting in Data-Scarce Agricultural Regions”

ICRAST 2025 Presenter

2nd International Conference on Recent Advances in Science and Technology

Nov 2025 · University of Rajshahi, Bangladesh

“Dependency Divide: An Interpretable ML Framework for Profiling Student Digital Satisfaction”

ICASDS 2025 Presenter

International Conference on Applied Statistics and Data Science

Dec 2025 · University of Dhaka, Bangladesh

“Targeting Renewable Energy Investments to Maximize Emissions Reductions: Precision Climate Finance”

RUEC 2025 Co-Author

RUEC 1st International Research Conference

Sep 2025 · University of Rajshahi, Bangladesh

“Antibiotic Misuse and Antimicrobial Resistance Knowledge Among University Students in Bangladesh: Current Trends and Future Prospects”

ICRSDS4IR 2024 Volunteer

8th International Conference on The Role of Statistics and Data Science in 4IR

Dec 2024 · University of Rajshahi, Bangladesh

Created conference Logo, Banner, Souvenir, and Kit design, alongside technical contributions.

Projects

All projects →

Self-Correct Agent

research

An LLM agent that iteratively self-corrects its outputs

A Python framework for building LLM agents with built-in self-correction loops, enabling more reliable and accurate outputs through iterative refinement.

PythonLLMAgentsAI

Details → View ↗

Green-NAS (Edge-native weather forecasting)

research

Multi-objective NAS for efficiency under real deployment constraints

A research project on multi-objective neural architecture search (NAS) that explicitly targets efficiency (parameters/energy) while keeping forecasting accuracy competitive.

Neural Architecture SearchGreen AIEdge ComputingWeather Forecasting

Details → View ↗

The Depth Delusion (architecture-conditioned scaling)

research

Evidence that width should grow faster than depth for transformers

A research project studying architecture-conditioned scaling laws and identifying a critical-depth regime where adding layers can hurt performance.

TransformersScaling LawsDeep LearningNLP & LLMs

Details → View ↗

Freelance Web Development

freelance

200+ projects · 4.8+ rating · Level Two Seller on Fiverr

WordPress, Shopify, and Wix development for businesses worldwide — custom themes, e-commerce stores, CMS migrations, and responsive design.

WordPressShopifyWixE-commerce

Details → View ↗

Experience

Research Associate
Apr 2025 – Present

Data Science Research Lab, University of Rajshahi · Rajshahi, Bangladesh

Building multimodal ML pipelines integrating text, image, and tabular data. Exploring NLP & LLM applications for small business data solutions.
Level Two Seller — Web Development
Feb 2021 – Jul 2025

Fiverr

200+ successful projects in WordPress, Shopify, and Wix development. Maintained a 4.8+ rating across 4+ years — custom themes, e-commerce, CMS migration, and responsive design. Additional expertise: graphics design (Adobe Illustrator, Photoshop, InDesign, Lightroom), video editing (Premiere Pro, After Effects).
Joint Secretary
Nov 2022 – Dec 2023

Nobojagoron Foundation · Rajshahi, Bangladesh

Managed operations for the university's largest volunteer organization (SDG Goal 4). Oversaw volunteer development workshops and skill-building events.
Program Intern
Sep 2022 – Oct 2022

Real Star Society · Rajshahi, Bangladesh

Contributed across community development areas: event management, health advocacy campaigns, environmental conservation, and visual content for outreach.
Design Team Lead
Jul 2020 – Oct 2022

Nobojagoron Foundation · Rajshahi, Bangladesh

Led all visual communications and campaign design for the foundation's outreach and community programs.
Head of Research & Development
Jan 2020 – Jun 2020

Nobojagoron Foundation · Rajshahi, Bangladesh
Volunteer
Jan 2019 – Dec 2019

Nobojagoron Foundation · Rajshahi, Bangladesh

Helped underprivileged children with education support, school supplies, and clothing as part of the largest volunteer organization at the University of Rajshahi.

Blog

All posts →

May 4, 2026

Scaling Laws for LLMs: From Chinchilla to 2026

The most expensive equations in AI determine how labs spend billions. Here's what they actually say — and where they're being rewritten. From Kaplan to Chinchilla to inference-time scaling.

Machine LearningLLMScaling LawsResearch

Apr 28, 2026

LLM Quantization Demystified: GGUF vs GPTQ vs AWQ

Your 7B model has 14 billion numbers. Here's exactly how to shrink them — and what you lose in the process. A practitioner's guide to choosing GGUF, GPTQ, or AWQ.

Machine LearningLLMQuantizationEdge AI