LLMs audit code from the same blind spot they wrote it from. Here's the fix

The platform I built is live in beta at FluentLogic.org, serving real families. I’m a high school teacher with a physics and philosophy background (no software engineering experience) who spent 10 months building it — roughly 350,000 lines of production TypeScript, written entirely with AI assistance. I don’t know TS from JS, but I do know assembler and C++. No matter how many times I asked the model to audit the same piece of code, I kept finding the same categories of bugs — until I forced a completely different angle. New class of bugs appeared. Then a plateau. New angle. New class. Plateau again. Before realizing this, I tried the obvious approach: firing hundreds of varied prompts, changing phrasing, and hoping coverage would emerge from volume. I spent several hundred dollars on this shotgun method. It doesn’t work. You’re simply sampling the same semantic neighborhood from slightly different entry points. Shotgun auditing is same-axis repetition with extra noise. The fix is almost embarrassingly simple: add one word to your audit prompts — “orthogonal.” Instead of: “Find bugs in this code” (or any target surface) Try: “Audit this surface from the most orthogonal direction to what you just found.” Then fix the bugs, rotate the axis, and repeat until you hit the P2 floor. The models aren’t broken. When you ask the same model that generated your code to audit it, you’re sending the auditor back into the same semantic compression manifold the generator already exhausted. Same manifold = same blind spot. I call this Generator-Auditor Symmetry (GAS). “Orthogonal” routes the model through a genuinely different neighborhood, producing non-overlapping findings consistently. What I formalized:

Confidence-Coverage Divergence (CCD): Same-axis repetition decreases output entropy (rising false certainty) while bug-class coverage stays flat. P2 Floor: When your false-positive rate crosses ~40% on two consecutive fresh-axis waves with zero new critical bugs, the surface is clean. The FP rate acts as an entropy meter. Rotation > Diversity: Rotating a single model across 3 orthogonal axes outperformed using 3 different models on the same axis.

Scale of the test: Earlier this week I ran a 36-hour marathon audit across 150+ surfaces. Yield: 60+ P0 bugs fixed and ~150 P1 bugs catalogued (e.g., OAuth sentinel bypasses, silent cache-invalidation race conditions). Each was invisible to other probe axes. The web app now feels the snappiest it’s ever been. Same-axis repetition plateaus at ~20% bug-class discovery yield, while orthogonal rotation reaches ~80% — a 4–5× advantage. I took the full 350K-line codebase to systemic P2 floor. The app is perceptibly faster afterward. I wrote a short paper formalizing the method and the supporting topological observations. To verify this wasn’t just a prompting trick, I ran persistent homology (Vietoris-Rips on Gemini semantic embeddings of 58 production bug classes). It revealed 20 significant β₁ interior loops — evidence that the bug classes form geometric structure in semantic space that same-axis probing structurally cannot exhaust. Preprint (Zenodo): https://doi.org/10.5281/zenodo.19223166 This is a single real-world codebase, not a controlled experiment. The survival curves are strong evidence, not final proof. What I’m genuinely curious about:

Has anyone else seen meaningfully better LLM bug detection by rotating audit axes? Does Confidence-Coverage Divergence (CCD) appear in LLM evaluation loops (RLHF, Constitutional AI)? What does the survival curve look like on a codebase you didn’t build yourself?

(19-year Ontario teacher | M.A., B.A. Philosophy · B.Sc. Physics. Built this for real families.)