chain-of-thought

2 articles

sort: new top best

bug-bounty497 google347 xss301 microsoft290 facebook261 rce211 exploit198 malware168 apple161 cve135 account-takeover115 bragging-post102 privilege-escalation96 csrf90 phishing86 browser75 writeup74 authentication-bypass69 supply-chain67 dos66 stored-xss65 reflected-xss57 ssrf56 reverse-engineering54 access-control52 react52 input-validation49 cross-site-scripting48 cloudflare47 aws47 docker46 web-security46 lfi46 smart-contract45 sql-injection45 web-application44 ethereum44 ctf43 web343 defi43 oauth43 node41 race-condition39 pentest39 open-source39 idor37 cloud37 info-disclosure36 burp-suite36 auth-bypass35

0 7/10

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

research

A comprehensive survey of 16 open-source reinforcement learning libraries that implement asynchronous training architectures, analyzing design choices across 7 axes (orchestration, buffer design, weight sync protocols, staleness management, LoRA support, distributed backends) to optimize GPU utilization by disaggregating inference and training workloads.

reinforcement-learning asynchronous-training gpu-optimization distributed-training model-inference rollout-buffer weight-synchronization lora-training vllm ray nccl post-training chain-of-thought agentic-ai mixture-of-experts orchestration

TRL Ray NCCL vLLM GRPO LoRA MiniMax Forge Deepseek v3.2 Amine Dirhoussi Quentin Gallouédec Kashif Rasul Lewis Tunstall Edward Beeching

huggingface.co · kashifr · 1 day ago · details · hn

0 5/10

Show HN: Generator SFT and DPO datasets for tool-calling LoRA fine-tuning

tool

DataForge is an open-source toolkit for generating deterministic synthetic datasets for LLM tool-calling fine-tuning, featuring 8,500+ lines of code with anti-template detection and quality gates. The accompanying NHA Epistemic Deliberations dataset provides 183 real multi-agent reasoning sessions from 3-7 different LLM providers with convergence measurement and adversarial challenges for training reasoning-focused models.

llm-fine-tuning synthetic-data-generation sft dpo lora tool-calling dataset open-source multi-agent-reasoning chain-of-thought rlhf preference-learning data-quality epistemic-reasoning

DataForge NotHumanAllowed Anthropic OpenAI Gemini DeepSeek Grok Qwen 7B PROMETHEUS CASSANDRA ATHENA Geth Consensus

nothumanallowed.com · senza1dio · 1 day ago · details · hn