AI thinks your code is correct, but it can not prove it

predictablemachines.com · jorgegalindo · 12 hours ago · view on HN · research
quality 6/10 · good
0 net
AI Summary

Article demonstrates how formal verification can detect subtle bugs in code that pass testing and code review, using a banking application example with three classes of defects: fee calculation errors violating invariants, money conservation violations in transfers, and integer overflow edge cases that break properties for all possible inputs.

Entities
Predictable Machines Francisco Alonso Java
AI thinks your code is correct, but it can not prove it. | Predictable Machines | Predictable Machines AI thinks your code is correct, but it can not prove it. Francisco Alonso 12th of Mar, 2026 Formal Verification Share Article When we talk about software development, there is one concept that sometimes does not receive the importance it deserves. That concept is trust . Every decision made by an engineering team, from approving a Pull Request to deploying to production, is ultimately a bet on the correctness of the code . Naturally, the more evidence we have that the code behaves correctly, the more willing we are to deliver it to real users. This concept of trust is built incrementally. In this sense, one of the first signs that a piece of code might be correct is the fact that it compiles . Compiling ensures that the code is syntactically and structurally valid . The compiler performs checks such as making sure that you are not calling methods that do not exist, that variables and functions type check, and that the structure of the program is consistent. A program compiling is a good first sign. However, compilation does not tell us anything about whether a program fulfills its purpose. This is where testing comes into play. There are many types of tests: unit, integration, end-to-end... But, in general, the idea is the same: passing the tests means that the code behaves as you expect in the cases you have chosen to test . At this point, the level of confidence rises again. The code is not only structurally valid, but for certain inputs it produces certain outputs that are correct. The problem at this point is that the coverage of the cases that can be tested is finite. With formal verification , we can go further. Formal verification ensures that the fundamental properties of the system are fulfilled for all possible inputs . This can prevent the bugs that bite harder: the ones that show up in production at a much later stage, when some specific (and often rare) conditions are met. Vanilla Java Banking App To illustrate the added value of formal verification, we will use a simple banking app written in Java. The domain of this application is easy to understand: accounts with balances, deposit, withdrawal, and transfer operations. In our context, the banking system must comply with certain basic properties. However, we will later see why these properties may differ depending on the context or user needs: The account balance can never be negative Transfers retain the total money in the bank, i.e. what leaves from one account goes into another Based on these premises, let's see what kind of bugs would be easy to miss. Fee incurred in the red Let's look at the withdrawal method. At first glance, it does everything right: it validates the amount, checks for sufficient funds, and subtracts the withdrawal. But there is a subtle detail, a fee is applied after the balance check, without being included in the validation: public void withdraw ( int amount) { if (amount <= 0 ) { throw new IllegalArgumentException ( "Withdrawal must be positive" ); } // Withdraw fee int fee = 2 ; if (balance < amount) { throw new IllegalStateException ( "Insufficient funds" ); } balance -= amount; balance -= fee; } There is a validation that throws an error when balance < amount , to make sure that our invariant of "account balance can never be negative" is preserved, but it does not take the fee into account. Let's suppose that a given user has $10 in their account. A priori, they can withdraw that $10, but if we consider the fee, the account would be left with a negative balance of -$2. This violates our stated properties. Money conservation in transfers Along the same line of reasoning, we may even find cumulative errors. For example, just as there is a fee for withdrawing money, there may be cases where there is a fee for making a transfer. In addition, it is common for the transfer method to internally call the withdrawal method to take money out of the source account. In that case, we could even encounter two fees. Let's look at it in the code: public void transfer (String fromId, String toId, int amount) { Account from = getAccount(fromId); Account to = getAccount(toId); // Transfer's fee int fee = 1 ; from.withdraw(amount + fee); to.deposit(amount); transactions.add( new Transaction ( TransactionType.TRANSFER, fromId, toId, amount )); } Let's imagine that the source account has $11, the destination account has $50, and $10 is transferred. In this case, the source account is left with $11 -$10 (money transferred) -$2 (withdrawal fee) -$1 (transfer fee) = -$2. Meanwhile, the destination account is left with $60. The total balance of the system goes from $61 ($11 + $50) to $58 (-$2 + $60). Once again, one of our properties is violated. Arithmetic overflows There is an even more subtle category of bugs: those where the code is apparently correct line by line, but fails due to the limitations of the language itself. Unlike the previous cases, here the problem is not one of local or global context; it is a semantic subtlety of the language that even an LLM with full system context could overlook. Consider a simple method that calculates the total balance of an account by adding a bonus: public int calculateTotalBalance ( int balance, int bonus) { return balance + bonus; } The code appears to be fine, but if balance = Integer.MAX_VALUE (2,147,483,647) and bonus = 1 , the result is not 2,147,483,648 but -2,147,483,648. Java does not throw any exceptions; integer arithmetic simply overflows and wraps around. An astronomical balance quietly becomes a negative one. These types of errors are especially dangerous because they also escape the code review of an LLM. How would Predictable Code act in these cases? The first thing that might come to mind after seeing these examples is the idea of conducting more exhaustive tests and trying to cover a broader spectrum of possibilities. That is their key limitation: tests can verify specific examples. Ensuring the correctness of a system requires something different: that certain properties are satisfied for all possible inputs, which in most cases is inherently infinite . This is only achievable through formal verification . Formal verification carries out logical reasoning through the program that proves the implementation of a program does not do anything that would go against the specification (i.e. the set of properties, requirements and invariants). If the program violates some property, formal verification can also often help find concrete inputs that show this violation: these are counterexamples . Next, we will see how Predictable Code would behave in the above situations. We will also illustrate how it would act in a normal situation where there are no errors. Withdraw fee [ERROR] InvariantViolation: balance_non_negative Intended invariant ∀ withdraw(amount): balance_before ≥ amount + fee Method: Account.withdraw Severity: HIGH Counterexample: balance = 10, withdraw(10), fee = 2 → balance_after = -2 Violation: balance_after < 0 Confidence: 100% Suggested fix: require(balance >= amount + fee) Transfer and money conservation [CRITICAL] InvariantViolation: total_balance_preserved Expected global invariant: Σ balances_before == Σ balances_after Method: Bank.transfer Severity: CRITICAL Counterexample: Alice = 11, Bob = 50, transfer(10) Alice_after = -2, Bob_after = 60 Σ before = 61, Σ after = 58 → 3 units lost Violation: Σ balances_before ≠ Σ balances_after Confidence: 100% Suggested fix: ensure fee is applied once per transaction Overflows [ERROR] InvariantViolation: balance_non_negative_after_operation Expected invariant: ∀ calculateTotalBalance(balance, bonus): result ≥ 0 Method: Account.calculateTotalBalance Severity: HIGH Counterexample: balance = 2147483647, bonus = 1 result = -2147483648 (integer overflow) Violation: result < 0 Confidence: 100% Suggested fix: use long or validate input bounds before operation No bugs case There may be a situation in which Predictable Code analyzes code without errors. In such a case, we could expect a response like the following. INFERRED INVARIANTS: INV-1: Account.balance >= 0 INV-2: withdraw(amount > 0 && amount <= balance) strictly decreases Account.balance INV-3: transfer(from, to, amount > 0 && amount <= from.balance) preserves total balance: from.balance + to.balance is constant All invariants are preserved across all execution paths. Counterexamples found: NONE Search space coverage: COMPLETE Predictable Code explored all reachable states derived from the code semantics and found no counterexamples violating inferred invariants. System consistency: HIGH Behavior determinism: GUARANTEED Hidden side effects: NONE DETECTED System classification: PREDICTABLE Verification confidence: HIGH The analyzed system is internally consistent. All inferred semantic properties are preserved. No unexpected or contradictory behaviors were detected. Beyond specific cases The examples we have seen in this article are intentionally simple. Their purpose is to illustrate the idea in a clear and accessible way. However, in real systems, the errors that formal verification detects are often considerably more subtle and difficult to spot. And this is where a limitation of technologies such as LLMs becomes apparent: they can detect problems within their context window, a function, a file, perhaps a module. But the most critical properties of a system are global: they cross boundaries between components, between modules, between services. The preservation of money in a transfer is not a property of withdraw or deposit separately, it is a property of the entire system. Formal verification operates at exactly that level. Throughout this article, we have seen a specific use case. However, Predictable Code addresses the problem in general. On the one hand, the behavior is language-independent. Although we have seen an example in Java, the same invariant detection logic applies to any codebase. A service in Kotlin, an API in Python, a legacy system in C++, etc. The same applies to the context of the code, which in this case was banking but can be extrapolated to any field. Furthermore, Predictable Code is adaptable to the user's needs. For example, the user of the application we have presented could say something like “at my bank, I need to be allowed to be in the red up to X$ or for Y days.” In that case, those properties would become axiomatic for Predictable Code and, therefore, not flagged as violations in subsequent iterations. Ultimately, at Predictable Machines , our approach is that any team, regardless of the language, can mathematically prove that their code does what it says it does. Is your AI-generated code actually correct? If your engineering teams are shipping AI-generated code, the question isn't whether it runs. It's whether it's aligned with your specifications and correct at the implementation level. Predictable Code verifies that automatically, across any language, giving your organization mathematical proof that the code does exactly what it's supposed to. Learn more about Predictable Code → More Articles All articles Formal Verification Formal Verification and AI Are Reshaping Mathematical Research Formal Verification The Fundamental Principles behind Our Verification Platform Engineering Why We Chose Lean for Predictable Code Engineering Why We Are Building Predictable Code Formal Verification Formal Verification in AI, and Why It Matters Company A letter from the Founders of Predictable Machines Formal Verification Formal Verification and AI Are Reshaping Mathematical Research Mathematics is evolving: AI generates ideas, proof assistants guarantee correctness. Discover how their convergence is reshaping research, enabling collaboration at scale, and opening new frontiers in both math and software verification. Formal Verification The Fundamental Principles behind Our Verification Platform How can verification be truly reliable? What makes a system trustworthy? These are the core principles that guide every decision we make in our verification platform. Engineering Why We Chose Lean for Predictable Code AI code generation fragments context across sessions and team members. We built Predictable Code on Lean to provide machine-checked verification that keeps AI-generated code aligned with your specifications. Engineering Why We Are Building Predictable Code AI is transforming software development, but AI-generated code often fails in subtle ways. We're building Predictable Code to make AI-generated code verifiable, coordinated, and aligned with human intent. Formal Verification Formal Verification in AI, and Why It Matters Formal Verification proves AI systems behave as intended under all conditions. Discover why it matters for building safe, reliable, and trustworthy AI. Company A letter from the Founders of Predictable Machines Why we built Predictable Machines: a story of AI, trust, and human-centered engineering rooted in ethics, experience, and a drive to build responsibly. Formal Verification Formal Verification and AI Are Reshaping Mathematical Research Mathematics is evolving: AI generates ideas, proof assistants guarantee correctness. Discover how their convergence is reshaping research, enabling collaboration at scale, and opening new frontiers in both math and software verification. Formal Verification The Fundamental Principles behind Our Verification Platform How can verification be truly reliable? What makes a system trustworthy? These are the core principles that guide every decision we make in our verification platform. Engineering Why We Chose Lean for Predictable Code AI code generation fragments context across sessions and team members. We built Predictable Code on Lean to provide machine-checked verification that keeps AI-generated code aligned with your specifications. Engineering Why We Are Building Predictable Code AI is transforming software development, but AI-generated code often fails in subtle ways. We're building Predictable Code to make AI-generated code verifiable, coordinated, and aligned with human intent. Formal Verification Formal Verification in AI, and Why It Matters Formal Verification proves AI systems behave as intended under all conditions. Discover why it matters for building safe, reliable, and trustworthy AI. Company A letter from the Founders of Predictable Machines Why we built Predictable Machines: a story of AI, trust, and human-centered engineering rooted in ethics, experience, and a drive to build responsibly. Curious to see our products in action? Let's have a chat about it Get in touch