ai-inference

2 articles
sort: new top best
clear filter
0 2/10

Meta unveiled four custom Broadcom-built AI inference chips (MTIA 300/400/450/500) designed for ranking, recommendation, and generative AI workloads, with plans to deploy multiple gigawatts starting in 2027. The chips use modular chiplet architecture with RISC-V cores and HBM stacks, with successive generations claiming performance competitive or superior to commercial alternatives like Nvidia.

Meta Broadcom MTIA 300 MTIA 400 MTIA 450 MTIA 500
theregister.com · giuliomagnifico · 2 days ago · details · hn
0 2/10

RunAnywhere released MetalRT, a Metal GPU-optimized inference engine for Apple Silicon that achieves 1.67x faster LLM decode than llama.cpp and 4.6x faster speech-to-text than mlx-whisper through custom GPU shaders and zero-allocation inference. They also open-sourced RCLI, a voice AI pipeline combining STT, LLM, and TTS with sub-600ms end-to-end latency entirely on-device.

RunAnywhere MetalRT RCLI YC W26 Sanchit Shubham llama.cpp Apple MLX Ollama sherpa-onnx mlx-whisper Qwen3 LFM2.5
github.com · sanchitmonga22 · 3 days ago · details · hn