Home โ€บ Tools โ€บ LLM Provider / Model โ€บ Nemotron 3 Nano Omni

Nemotron 3 Nano Omni

LLM Provider / Model ยท Local omni-modal workflows (video, audio, document), edge deployment on consumer GPUs, multimodal RAG, on-device assistants

At a glance

Input priceFree (self-hosted)
Output priceFree (self-hosted)
Price tierFree
Context window256K
Max output16K
Context tier128K-500K
Speed tierFast
Latencylocal-bound
Knowledge cutoffJan 2026
ModalityMultimodal (vision + audio)
Model IDnemotron-3-nano-omni-30b-a3b
ProviderNVIDIA
HumanEval78%
MMLU78%
BenchmarkOmni-modal: high
ReleasedApr 2026
HostingOpen-weights
CapabilitiesVision, Audio, Video, Tool use, Streaming, Structured output, MoE (30B / 3B active), 9x throughput on video/document workloads, Runs on 25GB RAM, Self-hostable

What Nemotron 3 Nano Omni does

Vision, Audio, Video, Tool use, Streaming, Structured output, MoE (30B / 3B active), 9x throughput on video/document workloads, Runs on 25GB RAM, Self-hostable

Best for

Local omni-modal workflows (video, audio, document), edge deployment on consumer GPUs, multimodal RAG, on-device assistants

Works well with

Conflicts & caveats

Build a full stack around Nemotron 3 Nano Omni โ€” Flowpicker shows compatibility warnings before you commit.

Open the stack planner โ†’