programming-concepts

1 article

sort: new top best

bug-bounty589 xss342 exploit241 google209 microsoft166 facebook162 rce153 web3129 writeup114 malware106 apple105 open-source91 cve91 csrf89 sqli77 browser66 ai-agents63 account-takeover58 cloudflare51 ssrf50 node48 supply-chain48 tool46 phishing45 dos44 privacy44 aws44 reverse-engineering42 idor41 pentest39 privilege-escalation38 llm37 cloud37 oauth36 opinion35 automation33 lfi32 machine-learning32 auth-bypass32 race-condition31 ctf31 code-generation31 infrastructure31 clickjacking28 cors28 react28 access-control27 docker25 performance-optimization24 rust24

0 3/10

A Large-Scale Synthetic Dataset Generated from Programming Concept Seeds

research

NVIDIA researchers present a concept-driven synthetic data generation workflow that creates 15 million Python programming problems from a hierarchical taxonomy of programming concepts, achieving a 6-point improvement on HumanEval benchmark when included in Nemotron-Nano-v3 pretraining. The method enables targeted LLM training by combining and distilling specific programming concepts to control data difficulty, diversity, and conceptual balance.

synthetic-data-generation llm-training code-generation python machine-learning dataset pretraining programming-concepts benchmark-improvement data-quality

NVIDIA Joseph Jennings Brandon Norick Nemotron-Pretraining-Code-Concepts Nemotron-Nano-v3 HumanEval GPT-OSS 120B

huggingface.co · ibobev · 6 days ago · details · hn