xla-compiler

1 article

sort: new top best

bug-bounty506 xss267 rce149 bragging-post119 google118 account-takeover112 authentication-bypass94 privilege-escalation93 open-source92 facebook91 csrf86 malware85 microsoft84 exploit81 access-control75 stored-xss75 ai-agents67 cve66 web-security64 reflected-xss63 phishing60 input-validation52 information-disclosure52 sql-injection51 smart-contract49 cross-site-scripting48 defi48 privacy47 ssrf46 ethereum46 reverse-engineering46 tool46 api-security44 writeup42 vulnerability-disclosure40 ai-security38 dos38 web-application38 burp-suite37 apple37 opinion37 llm37 automation35 cloudflare35 web335 responsible-disclosure35 race-condition33 infrastructure33 supply-chain33 smart-contract-vulnerability33

0 8/10

Forcing Flash Attention onto a TPU and Learning the Hard Way

tutorial

A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.

flash-attention jax tpu kernel-optimization attention-mechanism llm-internals xla-compiler systolic-array triton gpu compiler-optimization numerical-stability online-softmax

Archer Zhang JAX XLA Triton TPU Colab Flash Attention

archerzhang.me · azhng · 6 days ago · details · hn