Neural Architecture Search for the Real World: Lessons from Edge Deployment

Most NAS research is benchmarked on ImageNet, measured in GPU-hours, and deployed to data centers. That is not the world most people live in.

When we started the Green-NAS project — searching for architectures that could run weather forecasting on edge nodes in South Asia — the standard NAS literature was nearly useless. Here is what we learned.

Constraint-First Design

The typical NAS workflow is: define a search space, define a proxy task, search, evaluate. The edge deployment context inverts this. Constraints come first:

Power budget: ≤ 5W continuous draw on solar-charged nodes
Memory: 512MB RAM, no GPU
Latency: inference must complete in under 2 seconds for real-time alerting
Reliability: models must degrade gracefully on incomplete sensor readings

Only after specifying these constraints did we define our search space. Anything that could not satisfy the budget was excluded from the start — no post-hoc pruning.

Multi-Objective Pareto Search

Single-objective NAS (minimize validation loss) produced models that were accurate but undeployable. We switched to multi-objective search with three objectives:

Forecast error (minimize)
FLOPs per inference (minimize)
Parameter count (minimize)

The Pareto frontier revealed an important structure: accuracy degraded slowly as compute decreased, until a sharp cliff around 8M parameters. This cliff is architecture-dependent — some families collapse gracefully, others do not.

What Surprised Us

The best edge architectures were wider and shallower than their server-side counterparts. This is consistent with the theoretical intuition in our transformer scaling paper — depth helps when compute is abundant, but width is more efficient when you are parameter-constrained.

The result feels obvious in retrospect. It rarely is before you run the experiments.