model-deployment

1 article

sort: new top best

bug-bounty480 google298 xss277 microsoft249 facebook212 rce160 apple150 exploit137 bragging-post102 account-takeover98 malware94 csrf84 cve80 privilege-escalation74 stored-xss65 authentication-bypass64 writeup61 reflected-xss57 react54 browser54 cloudflare51 ssrf51 dos50 phishing50 access-control49 cross-site-scripting48 input-validation48 node47 docker46 aws46 smart-contract45 sql-injection45 ethereum44 defi43 supply-chain43 web-security43 web-application42 oauth41 web339 burp-suite36 lfi35 idor34 vulnerability-disclosure34 html-injection33 race-condition32 smart-contract-vulnerability32 reverse-engineering31 clickjacking31 csp-bypass30 information-disclosure30

0 5/10

Private LLM Inference on Consumer Blackwell GPUs

research

Systematic benchmarking of NVIDIA Blackwell consumer GPUs for LLM inference across quantization formats and workloads, demonstrating cost-effective private deployment for SMEs with 40-200x lower costs than cloud APIs and sub-second latency for most use cases.

llm-inference gpu-optimization quantization model-deployment privacy performance-benchmarking nvidia-blackwell cost-analysis rag model-serving

NVIDIA Blackwell RTX 5060 Ti RTX 5070 Ti RTX 5090 Qwen3-8B Gemma3-12B Gemma3-27B GPT-OSS-20B Jonathan Knoop Hendrik Holtmann

arxiv.org · rohansood15 · 14 hours ago · details · hn