model-serving

2 articles

sort: new top best

bug-bounty480 google297 xss277 microsoft249 facebook211 rce159 apple150 exploit136 bragging-post102 account-takeover98 malware94 csrf84 cve79 privilege-escalation74 authentication-bypass65 stored-xss65 writeup61 reflected-xss57 browser54 react53 ssrf51 phishing50 dos50 input-validation49 cloudflare49 access-control49 cross-site-scripting48 node46 aws46 smart-contract45 docker45 sql-injection45 ethereum44 defi43 web-security43 web-application42 supply-chain42 oauth41 web339 burp-suite36 lfi34 vulnerability-disclosure34 idor34 html-injection33 smart-contract-vulnerability32 race-condition32 clickjacking31 reverse-engineering31 information-disclosure30 csp-bypass30

0 5/10

Private LLM Inference on Consumer Blackwell GPUs

research

Systematic benchmarking of NVIDIA Blackwell consumer GPUs for LLM inference across quantization formats and workloads, demonstrating cost-effective private deployment for SMEs with 40-200x lower costs than cloud APIs and sub-second latency for most use cases.

llm-inference gpu-optimization quantization model-deployment privacy performance-benchmarking nvidia-blackwell cost-analysis rag model-serving

NVIDIA Blackwell RTX 5060 Ti RTX 5070 Ti RTX 5090 Qwen3-8B Gemma3-12B Gemma3-27B GPT-OSS-20B Jonathan Knoop Hendrik Holtmann

arxiv.org · rohansood15 · 14 hours ago · details · hn

0 5/10

How to Run Local LLMs with Claude Code (Unsloth)

tutorial

Step-by-step guide for running open-source LLMs locally with Claude Code using llama.cpp, demonstrating deployment of models like Qwen3.5 and GLM-4.7-Flash with quantization and GPU optimization for coding tasks.

llm local-deployment llama-cpp quantization gguf claude-code model-serving inference gpu-optimization tutorial

Unsloth Claude Code Qwen3.5 GLM-4.7-Flash llama.cpp DeepSeek Gemma Qwen3-Coder-Next OpenAI

unsloth.ai · armcat · 1 day ago · details · hn