ai-benchmarking

1 article

sort: new top best

bug-bounty448 google356 microsoft314 facebook264 xss238 apple180 malware175 rce149 exploit127 bragging-post101 cve99 account-takeover93 phishing83 csrf79 privilege-escalation77 stored-xss65 supply-chain65 authentication-bypass63 dos60 reflected-xss57 browser57 react50 cloudflare49 input-validation48 cross-site-scripting48 reverse-engineering48 access-control47 docker46 aws45 smart-contract45 node44 ethereum43 web343 defi42 web-security42 sql-injection42 web-application41 ssrf38 burp-suite35 idor34 vulnerability-disclosure34 info-disclosure34 race-condition33 html-injection33 buffer-overflow33 writeup32 cloud32 oauth32 smart-contract-vulnerability32 information-disclosure30

0 2/10

We compare model quality in Cursor

research

Cursor describes CursorBench, their internal benchmark suite for evaluating AI coding agent performance on real developer tasks, which provides better model discrimination and developer alignment than public benchmarks like SWE-bench by using actual user sessions and measuring multi-dimensional agent behavior beyond simple correctness.

ai-benchmarking code-generation model-evaluation llm-performance ai-agents software-engineering testing-methodology

Cursor CursorBench SWE-bench Terminal-Bench OpenAI Haiku GPT-5

cursor.com · xdotli · 16 hours ago · details · hn