tensor-cores

2 articles

sort: new top best

bug-bounty498 google352 xss301 microsoft295 facebook262 rce211 exploit199 malware171 apple163 cve136 account-takeover115 bragging-post102 privilege-escalation95 csrf90 phishing86 browser75 writeup74 authentication-bypass69 supply-chain67 dos66 stored-xss65 reflected-xss57 ssrf56 reverse-engineering55 react52 access-control52 input-validation49 cross-site-scripting48 aws47 cloudflare47 docker46 web-security46 lfi46 sql-injection45 smart-contract45 web-application44 ethereum44 web343 ctf43 oauth43 node43 defi43 pentest40 race-condition39 open-source38 cloud37 idor37 info-disclosure36 burp-suite36 vulnerability-disclosure35

0 3/10

NVFP4: Efficient and Accurate Low-Precision Inference

research

NVIDIA introduces NVFP4, a 4-bit floating-point format for NVIDIA Blackwell GPUs that achieves efficient low-precision inference while maintaining model accuracy through a two-level scaling strategy combining fine-grained E4M3 block-level and FP32 tensor-level scaling, reducing memory footprint by 3.5x versus FP16 with less than 1% accuracy degradation on language models.

quantization low-precision-inference model-compression nvfp4 floating-point-formats nvidia-blackwell tensor-cores ai-optimization fp4 mxfp4 e4m3 hardware-acceleration

NVIDIA NVIDIA Blackwell NVFP4 MXFP4 FP4 E4M3 Tensor Cores Eduardo Alvarez Omri Almog Eric Chung Simon Layton Dusan Stosic Ronny Krashinsky Kyle Aubrey

developer.nvidia.com · tosh · 16 hours ago · details · hn

0 4/10

To Sparsify or to Quantize: A Hardware Architecture View

research

A technical analysis of sparsity versus quantization as hardware optimization strategies for neural networks, exploring architectural challenges (unstructured sparse data chaos vs. quantization metadata overhead) and current compromises (structured sparsity patterns and algorithmic co-design techniques) used in modern AI accelerators.

hardware-architecture neural-network-optimization sparsity quantization model-compression ai-accelerators tensor-cores memory-bandwidth deep-learning llm-optimization

NVIDIA Ampere EIE SCNN BitNet b1.58 GPTQ Quip SmoothQuant AWQ StreamingLLM OCP Microscaling Formats Deep Compression

sigarch.org · matt_d · 23 hours ago · details · hn