Home โ€บ Tools โ€บ LLM Provider / Model โ€บ Llama 4 Scout

Llama 4 Scout

LLM Provider / Model ยท On-prem 10M-token context analysis, doc/codebase RAG without external chunking

At a glance

Input price$0.20
Output price$0.60
Cache price$0.04
Price tierBudget
Context window10M
Max output8K
Context tier500K+
Speed tierFast
Latencylow
Knowledge cutoffAug 2025
ModalityMultimodal (vision)
Model IDllama-4-scout
ProviderMeta
HumanEval88%
MMLU82%
SWE-Bench52%
BenchmarkBest long-context open-weight model at this size
ReleasedApr 2025
HostingOpen/Self-host
CapabilitiesVision, Tool use, MoE, Function calling, 10M context, Single-GPU friendly

What Llama 4 Scout does

Vision, Tool use, MoE, Function calling, 10M context, Single-GPU friendly

Best for

On-prem 10M-token context analysis, doc/codebase RAG without external chunking

Works well with

Conflicts & caveats

Build a full stack around Llama 4 Scout โ€” Flowpicker shows compatibility warnings before you commit.

Open the stack planner โ†’