oMLX is a high-performance local LLM inference server optimized specifically for Apple Silicon Macs. Built on the MLX framework, it supports text LLMs, VLMs, OCR, and embedding models. It features a unique Tiered KV Cache system (Hot RAM + Cold SSD) that persists context across requests and server restarts, making it ideal for coding agents like Claude Code. oMLX provides a native macOS menubar app and a web dashboard, supporting continuous batching, automated memory management via LRU eviction, and seamless MCP integration.
One API is an open-source LLM management and distribution system that provides a unified access layer for dozens of AI models (OpenAI, Anthropic, Google Gemini, DeepSeek, etc.) using the standard OpenAI API format. It features high-performance load balancing, streaming support, comprehensive token/quota management, and flexible model mapping. With support for Docker deployment and multi-node scaling, it acts as a robust API gateway for developers to centralize and manage AI resource consumption efficiently.