eval-methodology

1 article
sort: new top best
clear filter
0 2/10

Cursor describes CursorBench, an internal evaluation suite for measuring AI coding agent quality using real developer workflows rather than public benchmarks. The approach combines offline evals on private task data with online metrics to better distinguish between models and align with actual developer experience.

Cursor CursorBench SWE-bench Verified OpenAI GPT-5 Haiku Cursor Blame
cursor.com · ingve · 1 day ago · details · hn