vLLM

Other tools · Production self-hosted inference for open-weight LLMs at scale

At a glance

Pricing	Free
Setup effort	Medium
Released	2023
Open source	Yes
Interface	CLI / API
Languages	All (REST API)
Hosting	Self-hosted
Category	Inference server
Capabilities	High-throughput inference, PagedAttention, Continuous batching, OpenAI-compatible API, Tensor parallelism

High-throughput inference, PagedAttention, Continuous batching, OpenAI-compatible API, Tensor parallelism

Production self-hosted inference for open-weight LLMs at scale

No known compatibility conflicts detected.

Build a full stack around vLLM — Flowpicker shows compatibility warnings before you commit.