NVIDIA announces a suite of open datasets and training frameworks across multiple AI domains including robotics, autonomous vehicles, synthetic personas, protein modeling, and language model pre-training, with over 2 petabytes of data across 180+ datasets designed to reduce AI development bottlenecks.
Nvidia announced a $26 billion investment over five years to develop open-weight AI models, positioning itself as a frontier AI lab competitor. The investment includes releasing Nemotron 3 Super (128B parameters) and aims to establish US-made alternatives to increasingly popular Chinese open-source models while strengthening Nvidia's position as the dominant AI chip manufacturer.
A detailed account of troubleshooting open-source ML infrastructure when post-training the Kimi-K2-Thinking 1T parameter model, exposing bugs and inefficiencies in HuggingFace Transformers and quantization libraries that aren't documented and can hide several layers in the dependency stack.