bug-bounty517
xss286
rce150
bragging-post119
google112
exploit106
account-takeover106
open-source92
csrf85
privilege-escalation84
microsoft83
authentication-bypass83
facebook79
stored-xss75
cve71
access-control66
ai-agents64
reflected-xss63
web-security63
writeup63
malware61
ssrf53
input-validation52
smart-contract49
defi48
phishing48
cross-site-scripting47
sql-injection47
ethereum46
tool46
privacy45
information-disclosure44
api-security40
cloudflare39
apple39
lfi37
vulnerability-disclosure37
dos37
llm37
web-application37
burp-suite36
browser36
reverse-engineering36
opinion36
automation34
oauth34
web333
html-injection33
smart-contract-vulnerability33
responsible-disclosure33
0
2/10
TabbyML created jj-benchmark, a dataset of 63 evaluation tasks to test how well current AI coding agents can use Jujutsu version control. Results show Claude 4.6 Sonnet leads with 92% success rate, while open-weight models like Kimi-k2.5 achieved competitive 79% performance on this novel VCS tool.
TabbyML
Jujutsu
jj
Harbor
Pochi
Claude 4.6 Sonnet
GPT-5.4
Gemini-3.1-pro
Kimi-k2.5
Meng