online-softmax

1 article

sort: new top best

bug-bounty506 xss267 rce147 bragging-post119 google116 account-takeover112 authentication-bypass94 privilege-escalation93 open-source92 facebook91 csrf86 malware82 microsoft81 exploit80 stored-xss75 access-control75 ai-agents67 cve65 web-security64 reflected-xss63 phishing57 input-validation52 information-disclosure52 sql-injection51 smart-contract49 cross-site-scripting48 defi48 privacy48 reverse-engineering46 ssrf46 tool46 ethereum46 api-security45 writeup42 vulnerability-disclosure40 dos38 web-application38 ai-security38 burp-suite37 llm37 apple37 opinion37 web335 responsible-disclosure35 automation35 remote-code-execution34 smart-contract-vulnerability33 cloudflare33 credential-theft33 infrastructure33

0 8/10

Forcing Flash Attention onto a TPU and Learning the Hard Way

tutorial

A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.

flash-attention jax tpu kernel-optimization attention-mechanism llm-internals xla-compiler systolic-array triton gpu compiler-optimization numerical-stability online-softmax

Archer Zhang JAX XLA Triton TPU Colab Flash Attention

archerzhang.me · azhng · 6 days ago · details · hn