bug-bounty507
xss286
rce144
bragging-post119
account-takeover104
google101
exploit94
open-source93
csrf85
authentication-bypass80
facebook75
microsoft75
stored-xss74
cve73
privilege-escalation72
access-control67
ai-agents64
web-security63
reflected-xss63
writeup58
ssrf52
input-validation52
malware51
sql-injection49
smart-contract48
defi48
cross-site-scripting47
tool46
ethereum45
privacy44
information-disclosure44
api-security41
phishing40
web-application38
lfi37
apple37
llm37
opinion36
burp-suite36
automation35
cloudflare34
idor33
infrastructure33
web333
vulnerability-disclosure33
oauth33
smart-contract-vulnerability33
responsible-disclosure33
html-injection33
machine-learning32
0
6/10
technical-writeup
A detailed account of troubleshooting open-source ML infrastructure when post-training the Kimi-K2-Thinking 1T parameter model, exposing bugs and inefficiencies in HuggingFace Transformers and quantization libraries that aren't documented and can hide several layers in the dependency stack.
model-training
large-language-models
lora
quantization
huggingface
pytorch
debugging
infrastructure
open-source
mixture-of-experts
flash-attention
Kimi-K2-Thinking
HuggingFace
LLaMA-Factory
KTransformers
DeepSeek-V3
PyTorch
vLLM
compressed_tensors
TriviaQA
PEFT
Transformers
0
8/10
A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.
flash-attention
jax
tpu
kernel-optimization
attention-mechanism
llm-internals
xla-compiler
systolic-array
triton
gpu
compiler-optimization
numerical-stability
online-softmax
Archer Zhang
JAX
XLA
Triton
TPU
Colab
Flash Attention