flash-attention

2 articles

sort: new top best

bug-bounty507 xss286 rce144 bragging-post119 account-takeover104 google101 exploit94 open-source93 csrf85 authentication-bypass80 facebook75 microsoft75 stored-xss74 cve73 privilege-escalation72 access-control67 ai-agents64 web-security63 reflected-xss63 writeup58 ssrf52 input-validation52 malware51 sql-injection49 smart-contract48 defi48 cross-site-scripting47 tool46 ethereum45 privacy44 information-disclosure44 api-security41 phishing40 web-application38 lfi37 apple37 llm37 opinion36 burp-suite36 automation35 cloudflare34 idor33 infrastructure33 web333 vulnerability-disclosure33 oauth33 smart-contract-vulnerability33 responsible-disclosure33 html-injection33 machine-learning32

0 6/10

Open Weights isn't Open Training

technical-writeup

A detailed account of troubleshooting open-source ML infrastructure when post-training the Kimi-K2-Thinking 1T parameter model, exposing bugs and inefficiencies in HuggingFace Transformers and quantization libraries that aren't documented and can hide several layers in the dependency stack.

model-training large-language-models lora quantization huggingface pytorch debugging infrastructure open-source mixture-of-experts flash-attention

Kimi-K2-Thinking HuggingFace LLaMA-Factory KTransformers DeepSeek-V3 PyTorch vLLM compressed_tensors TriviaQA PEFT Transformers

workshoplabs.ai · addiefoote8 · 4 days ago · details · hn

0 8/10

Forcing Flash Attention onto a TPU and Learning the Hard Way

tutorial

A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.

flash-attention jax tpu kernel-optimization attention-mechanism llm-internals xla-compiler systolic-array triton gpu compiler-optimization numerical-stability online-softmax

Archer Zhang JAX XLA Triton TPU Colab Flash Attention

archerzhang.me · azhng · 6 days ago · details · hn