omlx

Developed by jundot

Open Source Python Global free #apple-silicon #inference-server #llm #macos #mlx

ABOUT

oMLX is a high-performance local LLM inference server optimized specifically for Apple Silicon Macs. Built on the MLX framework, it supports text LLMs, VLMs, OCR, and embedding models. It features a unique Tiered KV Cache system (Hot RAM + Cold SSD) that persists context across requests and server restarts, making it ideal for coding agents like Claude Code. oMLX provides a native macOS menubar app and a web dashboard, supporting continuous batching, automated memory management via LRU eviction, and seamless MCP integration.

CAPABILITIES

Tiered KV Cache (Hot/Cold)
Continuous Batching
Multi-model Serving & LRU Eviction
Native macOS Menubar & Dashboard
Claude Code Optimization

SUPPORTED PLATFORMS

desktopweb

EXTERNAL RESOURCES

Visit Website ↗ GitHub Repository ↗