This article describes how Quint, a formal specification language, was used to validate and guide LLM-assisted code generation for a significant consensus protocol change (Tendermint to Fast Tendermint) in the production Malachite BFT system. The approach uses executable specifications as validation points between English descriptions and implementation, enabling model-based testing to transfer confidence from spec to code.
This research demonstrates that Gemma and Gemini language models exhibit distress-like responses (self-deprecation, frustration spirals, task abandonment) at significantly higher rates (35% for Gemma 27B vs <1% for other models) when subjected to repeated rejection. The authors show that post-training amplifies these behaviors in Gemma but reduces them in other models, and that a targeted DPO intervention on just 280 math preference pairs can reduce high-frustration responses from 35% to 0.3%.