Neural Architecture Search for the Real World: Lessons from Edge Deployment
January 20, 2026
Most NAS research is benchmarked on ImageNet, measured in GPU-hours, and deployed to data centers. That is not the world most people live in.
When we started the Green-NAS project — searching for architectures that could run weather forecasting on edge nodes in South Asia — the standard NAS literature was nearly useless. Here is what we learned.
Constraint-First Design
The typical NAS workflow is: define a search space, define a proxy task, search, evaluate. The edge deployment context inverts this. Constraints come first:
- Power budget: ≤ 5W continuous draw on solar-charged nodes
- Memory: 512MB RAM, no GPU
- Latency: inference must complete in under 2 seconds for real-time alerting
- Reliability: models must degrade gracefully on incomplete sensor readings
Only after specifying these constraints did we define our search space. Anything that could not satisfy the budget was excluded from the start — no post-hoc pruning.
Multi-Objective Pareto Search
Single-objective NAS (minimize validation loss) produced models that were accurate but undeployable. We switched to multi-objective search with three objectives:
- Forecast error (minimize)
- FLOPs per inference (minimize)
- Parameter count (minimize)
The Pareto frontier revealed an important structure: accuracy degraded slowly as compute decreased, until a sharp cliff around 8M parameters. This cliff is architecture-dependent — some families collapse gracefully, others do not.
What Surprised Us
The best edge architectures were wider and shallower than their server-side counterparts. This is consistent with the theoretical intuition in our transformer scaling paper — depth helps when compute is abundant, but width is more efficient when you are parameter-constrained.
The result feels obvious in retrospect. It rarely is before you run the experiments.