bug-bounty506
xss267
rce149
bragging-post119
google118
account-takeover112
authentication-bypass94
privilege-escalation93
open-source92
facebook91
csrf86
malware85
microsoft84
exploit81
access-control75
stored-xss75
ai-agents67
cve66
web-security64
reflected-xss63
phishing60
input-validation52
information-disclosure52
sql-injection51
smart-contract49
cross-site-scripting48
defi48
privacy47
ssrf46
ethereum46
reverse-engineering46
tool46
api-security44
writeup42
vulnerability-disclosure40
ai-security38
dos38
web-application38
burp-suite37
apple37
opinion37
llm37
automation35
cloudflare35
web335
responsible-disclosure35
race-condition33
infrastructure33
supply-chain33
smart-contract-vulnerability33
0
8/10
A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.
flash-attention
jax
tpu
kernel-optimization
attention-mechanism
llm-internals
xla-compiler
systolic-array
triton
gpu
compiler-optimization
numerical-stability
online-softmax
Archer Zhang
JAX
XLA
Triton
TPU
Colab
Flash Attention