Meta unveiled four custom Broadcom-built AI inference chips (MTIA 300/400/450/500) designed for ranking, recommendation, and generative AI workloads, with plans to deploy multiple gigawatts starting in 2027. The chips use modular chiplet architecture with RISC-V cores and HBM stacks, with successive generations claiming performance competitive or superior to commercial alternatives like Nvidia.
RunAnywhere released MetalRT, a Metal GPU-optimized inference engine for Apple Silicon that achieves 1.67x faster LLM decode than llama.cpp and 4.6x faster speech-to-text than mlx-whisper through custom GPU shaders and zero-allocation inference. They also open-sourced RCLI, a voice AI pipeline combining STT, LLM, and TTS with sub-600ms end-to-end latency entirely on-device.