bug-bounty516
xss286
rce165
google139
facebook126
bragging-post119
malware118
account-takeover117
exploit115
privilege-escalation107
microsoft106
authentication-bypass97
open-source94
csrf88
cve86
access-control78
stored-xss75
phishing68
ai-agents67
web-security67
reflected-xss63
information-disclosure52
writeup52
input-validation52
sql-injection51
ssrf50
smart-contract49
privacy49
reverse-engineering49
cross-site-scripting48
defi48
apple47
tool47
api-security46
ethereum46
ai-security46
vulnerability-disclosure40
supply-chain40
web-application38
llm38
ctf37
credential-theft37
burp-suite37
opinion37
web337
remote-code-execution37
automation36
browser36
race-condition36
responsible-disclosure35
0
2/10
SWE-CI is a new benchmark for evaluating LLM-powered agents on long-term code maintenance tasks through continuous integration loops, shifting evaluation from static one-shot bug fixes to dynamic, multi-iteration codebase evolution across 100 real-world repository tasks averaging 233 days and 71 commits each.
llm-agents
code-generation
software-engineering
benchmark
continuous-integration
code-maintenance
ai-evaluation
SWE-CI
SWE-bench
Jialong Chen
Xander Xu
Hu Wei
Chuan Chen
Bing Zhao