bug-bounty506
xss267
rce147
bragging-post119
google116
account-takeover112
authentication-bypass94
privilege-escalation93
open-source92
facebook91
csrf86
malware82
microsoft81
exploit80
stored-xss75
access-control75
ai-agents67
cve65
web-security64
reflected-xss63
phishing57
input-validation52
information-disclosure52
sql-injection51
smart-contract49
cross-site-scripting48
defi48
privacy48
reverse-engineering46
ssrf46
tool46
ethereum46
api-security45
writeup42
vulnerability-disclosure40
dos38
web-application38
ai-security38
burp-suite37
llm37
apple37
opinion37
web335
responsible-disclosure35
automation35
remote-code-execution34
smart-contract-vulnerability33
cloudflare33
credential-theft33
infrastructure33
0
8/10
A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.
flash-attention
jax
tpu
kernel-optimization
attention-mechanism
llm-internals
xla-compiler
systolic-array
triton
gpu
compiler-optimization
numerical-stability
online-softmax
Archer Zhang
JAX
XLA
Triton
TPU
Colab
Flash Attention