rlhf

1 article
sort: new top best
clear filter
0 5/10

DataForge is an open-source toolkit for generating deterministic synthetic datasets for LLM tool-calling fine-tuning, featuring 8,500+ lines of code with anti-template detection and quality gates. The accompanying NHA Epistemic Deliberations dataset provides 183 real multi-agent reasoning sessions from 3-7 different LLM providers with convergence measurement and adversarial challenges for training reasoning-focused models.

DataForge NotHumanAllowed Anthropic OpenAI Gemini DeepSeek Grok Qwen 7B PROMETHEUS CASSANDRA ATHENA Geth Consensus
nothumanallowed.com · senza1dio · 1 day ago · details · hn