Key features
- C/C++ inference
- GGUF
- Quantization
- Local server
- CPU/GPU backends
llama.cpp enables local and cloud LLM inference with minimal setup, quantization, GPU backends, a CLI, and an OpenAI-compatible server.
C/C++ inference engine for running LLMs locally and in the cloud with broad CPU, GPU, and GGUF support.
Pricing
Free
Primary category
AI Tool
Publisher
ggml.org
Verification
Verified listing
Published by ggml.org
llama.cpp is commonly used for Local inference, Edge AI, Open model serving.
llama.cpp is listed as free to use.
Review pricing, feature coverage, ratings, and similar tools on this page before visiting the product site.