Key features
- Unified inference engine supporting NPU, GPU, and CPU across devices
- Compatibility with GGUF, Apple MLX, and .nexa model formats
- OpenAI-compatible API server for easy integration with existing apps
- Cross-platform deployment support for Windows, Linux, Android, and iOS
- NexaQuant compression to optimize frontier models for mobile/edge RAM
- Hardware acceleration for Qualcomm, Intel, AMD, and Apple NPUs
