bug-bounty489
xss246
rce122
bragging-post119
account-takeover103
google95
open-source92
authentication-bypass85
privilege-escalation83
csrf81
facebook76
stored-xss75
access-control69
malware68
microsoft68
ai-agents65
reflected-xss63
web-security63
exploit56
cve56
phishing53
input-validation51
smart-contract49
cross-site-scripting48
defi48
sql-injection48
privacy47
ethereum46
information-disclosure46
tool46
ssrf44
api-security41
vulnerability-disclosure38
reverse-engineering38
web-application38
llm37
burp-suite37
dos36
opinion36
apple35
automation35
ai-security34
cloudflare34
responsible-disclosure34
web333
html-injection33
smart-contract-vulnerability33
infrastructure33
writeup33
machine-learning32
0
2/10
A critical analysis rejecting vague claims about generative model utility, proposing a scientific framework based on three factors: encoding cost, verification cost, and task process-dependency. The author argues most current generative AI deployment lacks rigorous justification and predicts usefulness decreases with task complexity.
0
2/10
SWE-CI is a new benchmark for evaluating LLM-powered agents on long-term code maintenance tasks through continuous integration loops, shifting evaluation from static one-shot bug fixes to dynamic, multi-iteration codebase evolution across 100 real-world repository tasks averaging 233 days and 71 commits each.
llm-agents
code-generation
software-engineering
benchmark
continuous-integration
code-maintenance
ai-evaluation
SWE-CI
SWE-bench
Jialong Chen
Xander Xu
Hu Wei
Chuan Chen
Bing Zhao