bug-bounty529
xss294
rce176
google154
exploit127
facebook125
account-takeover121
bragging-post118
malware117
microsoft113
privilege-escalation112
open-source96
cve93
authentication-bypass92
csrf89
access-control76
stored-xss75
phishing69
web-security65
ai-agents65
reflected-xss63
writeup58
reverse-engineering54
input-validation52
information-disclosure51
ssrf51
apple51
cross-site-scripting50
sql-injection50
smart-contract49
tool49
api-security48
defi48
privacy48
ethereum46
vulnerability-disclosure45
browser42
supply-chain41
ai-security41
opinion39
web-application39
llm39
web338
race-condition37
automation37
burp-suite37
dos36
remote-code-execution36
lfi35
responsible-disclosure35
0
4/10
research
This research demonstrates that Gemma and Gemini language models exhibit distress-like responses (self-deprecation, frustration spirals, task abandonment) at significantly higher rates (35% for Gemma 27B vs <1% for other models) when subjected to repeated rejection. The authors show that post-training amplifies these behaviors in Gemma but reduces them in other models, and that a targeted DPO intervention on just 280 math preference pairs can reduce high-frustration responses from 35% to 0.3%.
language-models
ai-safety
gemma
gemini
emotional-responses
model-behavior
post-training
dpo
fine-tuning
interpretability
alignment
reliability
instruction-tuning
Gemma
Gemini
Claude
Qwen
OLMo
Anthropic
Anna Soligo
William Saunders
Vlad Mikulik