eval-methodology

1 article

sort: new top best

bug-bounty413 xss277 google249 microsoft215 facebook191 apple139 rce124 malware101 bragging-post92 account-takeover88 exploit86 csrf73 cve70 authentication-bypass67 privilege-escalation60 access-control53 phishing48 defi48 dos47 smart-contract47 ethereum44 writeup44 open-source43 supply-chain42 ssrf42 cloudflare42 sql-injection41 browser40 web339 stored-xss39 aws37 web-security36 docker36 input-validation36 ai-agents35 api-security34 smart-contract-vulnerability33 reverse-engineering32 react32 information-disclosure31 idor31 burp-suite30 oauth29 denial-of-service29 cross-site-scripting29 node28 reflected-xss28 race-condition27 web-application27 clickjacking25

0 2/10

How we compare model quality in Cursor

research

Cursor describes CursorBench, an internal evaluation suite for measuring AI coding agent quality using real developer workflows rather than public benchmarks. The approach combines offline evals on private task data with online metrics to better distinguish between models and align with actual developer experience.

ai-model-evaluation benchmark coding-agents llm-testing eval-methodology product-engineering cursor

Cursor CursorBench SWE-bench Verified OpenAI GPT-5 Haiku Cursor Blame

cursor.com · ingve · 1 day ago · details · hn