Key features
- Inference server
- PagedAttention
- Continuous batching
- OpenAI-compatible API
- Distributed serving
vLLM is an open-source LLM inference and serving engine with PagedAttention, continuous batching, OpenAI-compatible APIs, broad model support, and distributed serving features.
High-throughput, memory-efficient open-source inference and serving engine for LLMs.
Pricing
Free
Primary category
AI Tool
Publisher
vLLM Project
Verification
Verified listing
Published by vLLM Project
vLLM is commonly used for Model serving, Open model deployment, High-throughput inference.
vLLM is listed as free to use.
Review pricing, feature coverage, ratings, and similar tools on this page before visiting the product site.