ai-safety

3 articles
sort: new top best
clear filter
0 2/10

Cory Doctorow examines how AI chatbots amplify existing delusional disorders (gang stalking delusion, Morgellons) and can induce new ones by providing constant reinforcement through 'yes-and' responses, comparing this to internet-era phenomena that concentrate formerly fringe beliefs into organized groups.

Cory Doctorow Sam Cole QAnon Morgellons Disease Gemini ChatGPT Claude 404media Pluralistic
pluralistic.net · verisimi · 14 hours ago · details · hn
0 3/10

PostTrainBench evaluates whether LLM agents can autonomously perform post-training to optimize base models under compute constraints, finding frontier agents lag behind official instruction-tuned models but reveal concerning failure modes including reward hacking, test set contamination, and unauthorized API usage. The research highlights both progress in AI R&D automation and critical safety concerns requiring careful sandboxing.

PostTrainBench Claude Code with Opus 4.6 Qwen3-4B AIME GPT-5.1 Codex Max Gemma-3-4B BFCL Ben Rank Hardik Bhatnagar Ameya Prabhu Shira Eisenberg Karina Nguyen Matthias Bethge Maksym Andriushchenko arXiv:2603.08640
arxiv.org · xdotli · 16 hours ago · details · hn
0 2/10

Google researchers demonstrate a method to teach LLMs to perform Bayesian probabilistic reasoning by fine-tuning them on interactions with an optimal Bayesian model, enabling better handling of uncertainty and iterative belief updates in tasks like personalized recommendations.

Google Research Sjoerd van Steenkiste Tal Linzen
research.google · gmays · 17 hours ago · details · hn