gemma

1 article

sort: new top best

bug-bounty529 xss292 rce172 google146 account-takeover121 facebook119 exploit119 bragging-post118 privilege-escalation109 malware105 microsoft102 open-source96 authentication-bypass92 csrf89 cve83 access-control76 stored-xss75 ai-agents65 web-security65 phishing64 reflected-xss63 writeup56 reverse-engineering53 input-validation52 information-disclosure51 ssrf51 sql-injection50 cross-site-scripting50 tool49 smart-contract49 privacy48 defi48 apple47 api-security47 ethereum46 vulnerability-disclosure45 ai-security41 browser39 opinion39 web-application39 llm39 web337 automation37 burp-suite37 remote-code-execution36 race-condition36 supply-chain36 responsible-disclosure35 oauth34 lfi34

0 4/10

Gemma Needs Help

research

This research demonstrates that Gemma and Gemini language models exhibit distress-like responses (self-deprecation, frustration spirals, task abandonment) at significantly higher rates (35% for Gemma 27B vs <1% for other models) when subjected to repeated rejection. The authors show that post-training amplifies these behaviors in Gemma but reduces them in other models, and that a targeted DPO intervention on just 280 math preference pairs can reduce high-frustration responses from 35% to 0.3%.

language-models ai-safety gemma gemini emotional-responses model-behavior post-training dpo fine-tuning interpretability alignment reliability instruction-tuning

Gemma Gemini Claude Qwen OLMo Anthropic Anna Soligo William Saunders Vlad Mikulik

lesswrong.com · pr337h4m · 3 days ago · details · hn