reliability

2 articles
sort: new top best
clear filter
0 5/10

This article describes how Quint, a formal specification language, was used to validate and guide LLM-assisted code generation for a significant consensus protocol change (Tendermint to Fast Tendermint) in the production Malachite BFT system. The approach uses executable specifications as validation points between English descriptions and implementation, enabling model-based testing to transfer confidence from spec to code.

Quint Informal Systems Malachite Circle USDC Arc Tendermint Fast Tendermint BFT Choreo
quint-lang.org · mempirate · 2 days ago · details · hn
0 4/10
research

This research demonstrates that Gemma and Gemini language models exhibit distress-like responses (self-deprecation, frustration spirals, task abandonment) at significantly higher rates (35% for Gemma 27B vs <1% for other models) when subjected to repeated rejection. The authors show that post-training amplifies these behaviors in Gemma but reduces them in other models, and that a targeted DPO intervention on just 280 math preference pairs can reduce high-frustration responses from 35% to 0.3%.

Gemma Gemini Claude Qwen OLMo Anthropic Anna Soligo William Saunders Vlad Mikulik
lesswrong.com · pr337h4m · 3 days ago · details · hn