Cumulus Labs launches IonRouter, a low-cost inference API optimized for open-source and fine-tuned models, backed by IonAttention—a custom C++ inference runtime designed specifically for NVIDIA GH200 hardware architecture that achieves 588 tokens/s on multimodal workloads through novel optimizations around cache coherence, KV block writeback, and attention scheduling.
The Nervous Machine framework aggregates distributed ML experiment findings from Karpathy's autoresearch repository across heterogeneous hardware (H100, Blackwell, GH200) using a graph-based certainty scoring system to distinguish universal optimization strategies from hardware-specific architectural artifacts, enabling isolated forks to collaboratively build shared knowledge.