llm-performance

1 article
sort: new top best
clear filter
0 2/10

Cursor describes CursorBench, their internal benchmark suite for evaluating AI coding agent performance on real developer tasks, which provides better model discrimination and developer alignment than public benchmarks like SWE-bench by using actual user sessions and measuring multi-dimensional agent behavior beyond simple correctness.

Cursor CursorBench SWE-bench Terminal-Bench OpenAI Haiku GPT-5
cursor.com · xdotli · 16 hours ago · details · hn