bug-bounty529
xss292
rce172
google146
account-takeover121
facebook119
exploit119
bragging-post118
privilege-escalation109
malware105
microsoft102
open-source96
authentication-bypass92
csrf89
cve83
access-control76
stored-xss75
ai-agents65
web-security65
phishing64
reflected-xss63
writeup56
reverse-engineering53
input-validation52
information-disclosure51
ssrf51
sql-injection50
cross-site-scripting50
tool49
smart-contract49
privacy48
defi48
apple47
api-security47
ethereum46
vulnerability-disclosure45
ai-security41
browser39
opinion39
web-application39
llm39
web337
automation37
burp-suite37
remote-code-execution36
race-condition36
supply-chain36
responsible-disclosure35
oauth34
lfi34
0
4/10
research
This research demonstrates that Gemma and Gemini language models exhibit distress-like responses (self-deprecation, frustration spirals, task abandonment) at significantly higher rates (35% for Gemma 27B vs <1% for other models) when subjected to repeated rejection. The authors show that post-training amplifies these behaviors in Gemma but reduces them in other models, and that a targeted DPO intervention on just 280 math preference pairs can reduce high-frustration responses from 35% to 0.3%.
language-models
ai-safety
gemma
gemini
emotional-responses
model-behavior
post-training
dpo
fine-tuning
interpretability
alignment
reliability
instruction-tuning
Gemma
Gemini
Claude
Qwen
OLMo
Anthropic
Anna Soligo
William Saunders
Vlad Mikulik