llama.cpp

0 upvotesFreeVerified

llama.cpp enables local and cloud LLM inference with minimal setup, quantization, GPU backends, a CLI, and an OpenAI-compatible server.

Tool Snapshot

C/C++ inference engine for running LLMs locally and in the cloud with broad CPU, GPU, and GGUF support.

Pricing

Free

Primary category

AI Tool

Publisher

ggml.org

Verification

Verified listing

Published by ggml.org

llama.cpp is commonly used for Local inference, Edge AI, Open model serving.

llama.cpp is listed as free to use.

Review pricing, feature coverage, ratings, and similar tools on this page before visiting the product site.