model-compression

2 articles

sort: new top best

bug-bounty498 google355 xss301 microsoft298 facebook263 rce211 exploit200 malware171 apple164 cve136 account-takeover115 bragging-post102 privilege-escalation95 csrf90 phishing86 browser75 writeup74 authentication-bypass69 supply-chain68 dos66 stored-xss65 reflected-xss57 ssrf56 reverse-engineering55 react52 access-control51 input-validation49 cross-site-scripting48 aws47 cloudflare47 docker46 web-security46 lfi46 sql-injection45 smart-contract45 ethereum44 web-application44 web343 defi43 ctf43 oauth43 node43 pentest40 race-condition39 idor37 open-source37 cloud37 burp-suite36 info-disclosure36 auth-bypass35

0 3/10

NVFP4: Efficient and Accurate Low-Precision Inference

research

NVIDIA introduces NVFP4, a 4-bit floating-point format for NVIDIA Blackwell GPUs that achieves efficient low-precision inference while maintaining model accuracy through a two-level scaling strategy combining fine-grained E4M3 block-level and FP32 tensor-level scaling, reducing memory footprint by 3.5x versus FP16 with less than 1% accuracy degradation on language models.

quantization low-precision-inference model-compression nvfp4 floating-point-formats nvidia-blackwell tensor-cores ai-optimization fp4 mxfp4 e4m3 hardware-acceleration

NVIDIA NVIDIA Blackwell NVFP4 MXFP4 FP4 E4M3 Tensor Cores Eduardo Alvarez Omri Almog Eric Chung Simon Layton Dusan Stosic Ronny Krashinsky Kyle Aubrey

developer.nvidia.com · tosh · 16 hours ago · details · hn

0 4/10

To Sparsify or to Quantize: A Hardware Architecture View

research

A technical analysis of sparsity versus quantization as hardware optimization strategies for neural networks, exploring architectural challenges (unstructured sparse data chaos vs. quantization metadata overhead) and current compromises (structured sparsity patterns and algorithmic co-design techniques) used in modern AI accelerators.

hardware-architecture neural-network-optimization sparsity quantization model-compression ai-accelerators tensor-cores memory-bandwidth deep-learning llm-optimization

NVIDIA Ampere EIE SCNN BitNet b1.58 GPTQ Quip SmoothQuant AWQ StreamingLLM OCP Microscaling Formats Deep Compression

sigarch.org · matt_d · 23 hours ago · details · hn