benchmark-improvement

1 article
sort: new top best
clear filter
0 3/10

NVIDIA researchers present a concept-driven synthetic data generation workflow that creates 15 million Python programming problems from a hierarchical taxonomy of programming concepts, achieving a 6-point improvement on HumanEval benchmark when included in Nemotron-Nano-v3 pretraining. The method enables targeted LLM training by combining and distilling specific programming concepts to control data difficulty, diversity, and conceptual balance.

NVIDIA Joseph Jennings Brandon Norick Nemotron-Pretraining-Code-Concepts Nemotron-Nano-v3 HumanEval GPT-OSS 120B
huggingface.co · ibobev · 6 days ago · details · hn