o

omlx

Developed by jundot
Open Source Python Global free #apple-silicon#inference-server#llm#macos#mlx

oMLX is a high-performance local LLM inference server optimized specifically for Apple Silicon Macs. Built on the MLX framework, it supports text LLMs, VLMs, OCR, and embedding models. It features a unique Tiered KV Cache system (Hot RAM + Cold SSD) that persists context across requests and server restarts, making it ideal for coding agents like Claude Code. oMLX provides a native macOS menubar app and a web dashboard, supporting continuous batching, automated memory management via LRU eviction, and seamless MCP integration.

  • Tiered KV Cache (Hot/Cold)
  • Continuous Batching
  • Multi-model Serving & LRU Eviction
  • Native macOS Menubar & Dashboard
  • Claude Code Optimization
desktopweb