LLM Provider / Model for AI coding

Every llm provider / model option Flowpicker tracks, with pricing, setup notes, and compatibility for each.

Claude Sonnet 4.6 — Day-to-day coding, fast agentic loops, balanced cost/quality
Claude Opus 4.7 — Complex refactors, agentic coding, hard debugging, deep reasoning
GPT-4o — Multimodal tasks, fast chat, broad general use
Gemini 2.x — Huge documents, video/audio understanding, long-context retrieval
Llama 3 (Ollama/Groq) — Local/offline use, privacy-sensitive work, no-cost experimentation
Deepseek — Cheap high-quality coding, bulk classification, self-host for privacy
Claude Haiku 4.5 — High-volume quick tasks, cost-sensitive agentic loops, inline completions
DeepSeek V4 Flash — Ultra-cheap high-quality coding, bulk classification, context-heavy tasks
DeepSeek V4 Pro — Complex reasoning, agentic coding, hard debugging with long context
Grok 4.3 — Fast general-purpose coding with native web and X search agent capabilities
Grok 4.20 — Deep reasoning, multi-step agentic coding, massive context tasks
Grok 4-1 Fast — Ultra-cheap fast reasoning for bulk agentic coding and large context retrieval
Mistral Large 3 — Top open-weight multipurpose model, multilingual coding, self-hosting with frontier quality
Codestral — Specialized code completion and generation, FIM-aware coding, fast IDE completions
Devstral 2 — Open-source coding agent, SWE-bench tasks, complex multi-file refactors
Ministral 3 14B — Local coding assistant on laptop, edge deployments, quick completions
Mistral Medium 3.5 — Frontier-class agentic coding and reasoning at lower cost than Opus/GPT top tiers
Qwen 3 (235B) — Powerful self-hosted reasoning, multilingual coding, agentic workflows with MCP
Qwen 3.6 — Agentic coding with sustained multi-turn reasoning, frontend generation, local development
Qwen 3 Coder — Open-source coding specialist, long-context code generation, FIM completions on local hardware
Qwen 3 Coder Next — Next-gen open-source coding model optimized for agentic coding and local dev workflows
Gemma 4 31B — Frontier open-weights on workstation, agentic coding, reasoning, local multimodal tasks
Gemini 3 Flash — Fast frontier intelligence, near real-time coding assistance, multimodal agentic loops
Command R+ — Enterprise RAG, multilingual coding, document understanding with tool use
Command R — Cost-effective enterprise RAG, multilingual understanding at budget pricing
Llama 4 Maverick — Latest open-weights from Meta, large context, self-hosted coding with vision
Amazon Nova Pro — AWS-integrated coding, enterprise Bedrock deployments, multimodal tasks
Amazon Nova Lite — Cheapest AWS-native model for high-volume coding tasks, Bedrock deployments
Amazon Nova Micro — Ultra-low-cost AWS-native for simple coding, routing, high-throughput tasks
Phi-4 — Extremely small self-hosted coding model, edge devices, resource-constrained environments
o3 — Hardest coding problems, complex multi-step reasoning, advanced debugging
o4-mini — Cheap fast reasoning, agentic coding loops, high-volume tasks
GPT-4.1 — Long-context coding, instruction-following, broad general use at mid price
Gemini 2.5 Pro — Advanced reasoning, multimodal workflows, massive context tasks, agentic coding
Gemini 2.5 Flash — Fast cost-efficient inference with huge context, real-time applications
GPT-OSS 120B — Best open-weight model for production coding, configurable reasoning for quality/speed trade-offs, fine-tunable on single H100, agentic workflows
GPT-OSS 20B — Local development, consumer hardware, fast reasoning loops, cost-effective agentic coding, laptop-friendly open-weight model
GLM 5.1 — Top-tier agentic engineering, complex multi-step workflows, sustained long-horizon coding, self-improving agent loops
Laguna XS.2 — Local agentic coding on Mac/laptop (runs on 36GB), SWE-bench tasks, long-horizon autonomous coding, Zed/JetBrains integration via ACP
MiniMax M2.7 — Professional software engineering, SRE incident response, multi-agent collaboration, self-improving coding workflows, 100+ round autonomous optimization
MiMo V2.5 Pro — Highest open-weight coding performance, 1M context agentic tasks, complex multi-step engineering, long-context reasoning
Ling 2.6 (1T) — Open-source SOTA execution-heavy tasks, enterprise agent workflows, production coding with optimized token efficiency, AIME-level reasoning
Granite 4.1 30B — Enterprise coding with tool calling, RAG workflows, multilingual development, FIM code completions, IBM ecosystem, governed deployments
GPT-5.5 — Frontier reasoning, agentic coding, long-context refactors, multimodal analysis, replaces GPT-5.4 as default flagship
GPT-5.5 Pro — Hardest reasoning problems, math olympiad, research-grade analysis, mission-critical coding tasks where cost is no object
GPT-5.4 — Production agentic coding, multi-step tool use, balanced cost/quality on long-context tasks, GPT-5.5 alternative at lower cost
Kimi K2.6 — Long-horizon coding, UI/UX generation from prompts, multi-agent orchestration, cost-optimized frontier-class workloads (~80% cheaper than GPT-5.5)
Doubao Seed 2.0 Pro — Cost-effective frontier coding, Codeforces-level competitive programming (3020 rating), AIME math (98.3%), production agentic workflows
Doubao Seed 2.0 Code — Cheapest frontier-class coding model on market, high-throughput code completion, CI-driven agent loops, bulk refactors at minimal cost
Command A — Enterprise RAG, citation-grounded responses, multilingual support, agentic workflows with strict tool-use accuracy, regulated industries
Hy3 Preview — Agent-led tasks in development environments, CodeBuddy/WorkBuddy workflows, low-latency production coding (54% TTFT reduction), open-weight frontier alternative
Step 3.5 Flash — High-throughput low-cost reasoning, real-time agent loops, budget-tier production deployments, fastest open-weight reasoning model in its price class
Nemotron 3 Super — Top-tier open-weight agentic coding, 1M-context refactors, GPU-rich self-hosted deployments, NVIDIA ecosystem (NIM/NeMo), governed enterprise environments
Nemotron 3 Nano Omni — Local omni-modal workflows (video, audio, document), edge deployment on consumer GPUs, multimodal RAG, on-device assistants
Qwen 3.5 397B-A17B — Frontier open-weight reasoning, native multimodal tasks (text/image/video co-trained), commercial use under Apache 2.0, Alibaba Cloud Model Studio (1M ctx hosted)
Qwen 3.6 27B — Single-GPU agentic coding (fits on 1x H100), workstation deployment, beats much larger MoE models on agentic tasks, Apache 2.0 commercial use
DeepSeek R2 — Hard reasoning at 1/20th o3 cost, math (AIME 83.2%, MATH 98.1%), competitive programming, research-grade analysis on a budget
Jamba Mini 2 — Cost-efficient long-context workflows, document Q&A, summarization at 256K, enterprise on-prem deployment, 2.5x throughput vs Transformer-only
Jamba Large 2 — Enterprise long-document workflows, regulated industries needing on-prem deployment, secure RAG at 256K, financial/legal analysis, beats Llama 3.1 405B on Arena Hard
Gemini 3 Pro — Long-horizon agentic tasks, generative UI, multi-modal reasoning, Antigravity-driven workflows
Gemini 3 Deep Think — Hardest reasoning, research analysis, math olympiad and competitive programming
GPT-5.1 — Default daily-driver coding agent with adaptive reasoning and warmer chat tone
GPT-5.1 Codex — Codex CLI and long-horizon coding agents; engineered for terminal-driven workflows
GPT-5.1 Codex Max — Multi-hour, cross-file engineering tasks where context compaction matters; enterprise Codex CLI use
Claude Haiku 4.5 (Fast) — High-throughput coding sidekick, autocomplete, batch jobs, and low-latency Claude Code use
Grok 5 — Real-time web/X context, long-context analysis, and agentic browsing
Grok Code Fast 2 — High-volume agentic coding where latency and cost trump max intelligence
DeepSeek V4 — Low-cost coding LLM with self-host option; strong English + Chinese coding capabilities
Qwen 3 Max — Long-context coding, multilingual codebases, China-region deployments
Llama 4 Behemoth — Self-hosted frontier reasoning, complex agentic coding, multimodal analysis
Llama 4 Scout — On-prem 10M-token context analysis, doc/codebase RAG without external chunking
Mistral Large 4 — EU-residency coding workloads, multilingual engineering, GDPR-sensitive deployments
Codestral 2 — In-IDE completions, fill-in-the-middle, code refactoring, BYO-model integrations
Kimi K3 — Agentic coding at low cost, ultra-long context, China-region deployments
GLM 5 Air — Cheap self-hosted coding agent, autocomplete, batch inference at scale
Hermes 4 — Self-hosted assistants with steerable persona and explicit chain-of-thought support
Magistral 2 — Reasoning workloads where chain-of-thought transparency and EU residency matter
Command R 2 — RAG and tool-using agents in regulated enterprises with on-prem deployment options
Jamba 1.7 — Long-context RAG with low memory; cost-conscious enterprise deployments
Phi-5 — On-device AI, edge inference, mobile/embedded coding sidekicks
Amazon Nova Premier 2 — AWS-native enterprise agents, Bedrock integrations, long-context analysis
Yi-3 Lightning — Cheap, fast bilingual coding tasks and high-volume Chinese-English mixed workloads
Inflection 3 — Enterprise deployments requiring on-prem AI with productivity tuning

Build a full stack →