Llama 4 is open-weight ambition with a very long context flex

Llama 4 arrived on Hugging Face as two open-weight-ish giants: Maverick and Scout. Hugging Face describes both as Mixture-of-Experts models with 17B active parameters, native multimodality, and model weights available on the Hub under Meta's Llama 4 Community License Agreement.

Maverick is the larger model, around 400B total parameters with 128 experts. Scout is around 109B total parameters with 16 experts. Both are large enough to make your laptop look away politely, but the active-parameter design is the important part.

Source credit: Hugging Face's original source material.

The context windows are the fireworks

Hugging Face says the Llama 4 models were pretrained with 256K context. The instruct versions go much further: 1M context for Maverick and 10M for Scout. Yes, million with an m. Yes, you should be skeptical in the useful way, not the performative way.

Huge context only matters if retrieval, attention behavior, cost, and latency hold up on real workloads. But if Scout's long-context design proves usable, it changes how people think about local and private document workflows.

Maverick: roughly 400B total parameters, 17B active, 128 experts
Scout: roughly 109B total parameters, 17B active, 16 experts
both process text and image inputs through native multimodality
Hugging Face integration includes transformers and Text Generation Inference support

The deployment notes are more useful than the hype. Scout is described as fitting on a single server-grade GPU with on-the-fly 4-bit or 8-bit quantization. Maverick is available in BF16 and FP8 formats. Hugging Face also notes automatic device mapping, tensor parallel support, and quantization support in the ecosystem.

That is the open-model story in miniature: the weights are only the beginning. The tooling layer decides whether the community can actually run, test, quantize, and improve the thing.

The right posture is excitement with a torque wrench. Llama 4 gives open-model builders a serious multimodal, long-context platform. Now the job is to find where the context length is genuinely useful, where it is just brochure gravity, and how much hardware the promise wants to eat.

In short

Llama 4 Maverick and Scout bring MoE architecture, native multimodality, and huge advertised context windows to the Hugging Face ecosystem. The promise is big; the local deployment details are where builders should look first.