AI is getting scary good at finding hidden software bugs

zdnet.com · CrankyBear · 3 hours ago · view on HN · vulnerability
0 net
AI is getting scary good at finding hidden software bugs - even in decades-old code | ZDNET X Innovation Home Innovation Artificial Intelligence AI is getting scary good at finding hidden software bugs - even in decades-old code But AI also creates bugs - about 1.7 times as many as humans, including critical and major issues. Written by Steven Vaughan-Nichols, Senior Contributing Editor Senior Contributing Editor March 10, 2026 at 8:28 a.m. PT WhataWin via iStock / Getty Images Plus Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways AI is proving better than expected at finding old, obscure bugs. Unfortunately, AI is also good at finding bugs for hackers to exploit. In short, AI still isn't ready to replace programmers or security pros. In a recent LinkedIn post, Microsoft Azure CTO Mark Russinovich said he used Anthropic's new AI model Claude Opus 4.6 to read and analyze assembly code he'd written in 1986 for the Apple II 6502 processor. Also: Why AI is both a curse and a blessing to open-source software - according to developers Claude didn't just explain the code; it performed what he called a "security audit," surfacing subtle logic errors, including one case where a routine failed to check the carry flag after an arithmetic operation. That's a classic bug that had been hiding, dormant, for decades. The good news and the bad news Russinovich's experiment is striking because the code predates today's languages, frameworks, and security checklists. However, the AI was able to reason about low-level control flow and CPU flags to point out real defects. For veteran developers, it's a reminder that long-lived codebases may still harbor bugs that conventional tools and developers have learned to live with. Also: 7 AI coding techniques I use to ship real, reliable products - fast Yet despite the progress, some experts believe this experiment raises concerns. As Matthew Trifiro, a veteran go-to-market engineer, said: "Oh, my, am I seeing this right? The attack surface just expanded to include every compiled binary ever shipped. When AI can reverse-engineer 40-year-old, obscure architectures this well, current obfuscation and security-through-obscurity approaches are essentially worthless." Trifiro makes a point. On the one hand, AI will help us find bugs so we can fix them. That's the good news. On the other hand, and here's the bad news, AI can also break into programs still in use that are no longer being patched or supported. As Adedeji Olowe, founder of Lendsqr , pointed out, "This is scarier than we're letting on . Billions of legacy microcontrollers exist globally, many likely running fragile or poorly audited firmware like this." Also: Is Perplexity's new Computer a safer version of OpenClaw? How it works He continued: "The real implication is that bad actors can send models like Opus after them to systematically find vulnerabilities and exploit them, while many of these systems are effectively unpatchable." LLMs complementing detector tools Traditional static analysis tools such as SpotBugs , CodeQL , and Snyk Code scan source code for patterns associated with bugs and vulnerabilities. These tools excel at catching well-understood issues, such as null-pointer dereferences, common injection patterns, and API misuse, and they do so at scale across large Java and other-language codebases. Now, it has become clear that large language models (LLMs) can complement those big detector tools . In a 2025 head-to-head study, LLMs like GPT-4.1, Mistral Large, and DeepSeek V3 were as good as industry-standard static analyzers at finding bugs across multiple open-source projects. Also: This new Claude Code Review tool uses AI agents to check your pull requests for bugs -- here's how How do these models do it? Instead of asking, "Does this line violate rule X?", the LLM is effectively asking, "Given what this system is supposed to do, where are the failure modes and attack paths?" Combined, this approach is a powerful pairing. For example, Anthropic's Claude Opus 4.6 AI is helping clean up Firefox's open-source code . According to Mozilla, Anthropic's Frontier Red Team found more high-severity bugs in Firefox in just two weeks than people typically report in two months. Mozilla proclaimed, "This is clear evidence that large-scale, AI-assisted analysis is a powerful new addition to security engineers' toolbox." Anthropic isn't the only organization using AI engines to find bugs in code. Black Duck's Signal product, for instance, combines multiple LLMs, Model Context Protocol (MCP) servers, and AI agents to autonomously analyze code in real time, detect vulnerabilities, and propose fixes. Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic Meanwhile, security consultancies, such as NCC Group , are experimenting with LLM-powered plugins for software reverse-engineering tools, like Ghidra , to help discover security problems, including potential buffer overflows and other memory-safety issues that can be hard for people to spot. Passing security checks to AI These successes don't mean we're ready to pass our security checks to AI. Far from it. Also: I tried to save $1,200 by vibe coding for free - and quickly regretted it Researchers have found that LLM-driven bug finding is not a drop-in replacement for mature static analysis pipelines. Studies comparing AI coding agents to human developers show that while AI can be prolific, it also introduces security flaws at higher rates, including unsafe password handling and insecure object references. CodeRabbit found "that there are some bugs that humans create more often and some that AI creates more often. For example, humans create more typos and difficult-to-test code than AI. But overall, AI created 1.7 times as many bugs as humans . Code generation tools promise speed but get tripped up by the errors they introduce. It's not just little bugs: AI created 1.3-1.7 times more critical and major issues." Also: Rolling out AI? 5 security tactics your business can't get wrong - and why You can also ask Daniel Stenberg, creator of the popular open-source data transfer program cURL . He's loudly and legitimately complained that his project has been flooded with bogus, AI-written security reports that drown maintainers in pointless busywork. The moral of the story AI, in the right hands, makes a great assistant, but it's not ready to be a top programmer or security checker. Maybe someday, but not today. So, use AI with existing tools carefully, and your programs will be far more secure than they are currently. As for old code, well, that's a real worry. I foresee people replacing firmware-powered devices due to realistic fears that they'll soon be compromised. Artificial Intelligence I tried a Claude Code alternative that's local, open source, and completely free - how it works How to remove Copilot AI from Windows 11 today AI is quietly poisoning itself and pushing models toward collapse - but there's a cure How to spot an AI image: 6 telltale signs it's fake - and my go-to free detectors I tried a Claude Code alternative that's local, open source, and completely free - how it works How to remove Copilot AI from Windows 11 today AI is quietly poisoning itself and pushing models toward collapse - but there's a cure How to spot an AI image: 6 telltale signs it's fake - and my go-to free detectors Editorial standards Show Comments Log In to Comment Community Guidelines Related AI agents are fast, loose, and out of control, MIT study finds Why AI is both a curse and a blessing to open-source software - according to developers Why enterprise AI agents could become the ultimate insider threat