VoxCPM is a tokenizer-free Text-to-Speech system that directly generates continuous speech representations via an end-to-end diffusion autoregressive architecture, achieving highly natural and expressive synthesis. VoxCPM2, the latest 2B parameter model, is trained on over 2 million hours of multilingual speech data, supporting 30 languages, Voice Design, Controllable Voice Cloning, and 48kHz studio-quality audio output with built-in super-resolution.
MiniCPM-V is a series of multimodal LLMs optimized for ultra-efficient edge-device deployment. Developed by OpenBMB, it excels in image and video understanding using intra-ViT early compression and LLaVA-UHD v4 technology, reducing visual encoding costs by over 50%. With only 1.3B parameters, MiniCPM-V 4.6 outperforms larger models while maintaining high token throughput. It supports seamless deployment across iOS, Android, and HarmonyOS, enabling real-time multimodal interaction on mobile platforms.