A comprehensive survey of 16 open-source reinforcement learning libraries that implement asynchronous training architectures, analyzing design choices across 7 axes (orchestration, buffer design, weight sync protocols, staleness management, LoRA support, distributed backends) to optimize GPU utilization by disaggregating inference and training workloads.
DataForge is an open-source toolkit for generating deterministic synthetic datasets for LLM tool-calling fine-tuning, featuring 8,500+ lines of code with anti-template detection and quality gates. The accompanying NHA Epistemic Deliberations dataset provides 183 real multi-agent reasoning sessions from 3-7 different LLM providers with convergence measurement and adversarial challenges for training reasoning-focused models.