gpu-optimization

4 articles

sort: new top best

bug-bounty480 google297 xss277 microsoft249 facebook211 rce159 apple150 exploit136 bragging-post102 account-takeover98 malware94 csrf84 cve79 privilege-escalation74 authentication-bypass65 stored-xss65 writeup61 reflected-xss57 browser54 react53 ssrf51 phishing50 dos50 input-validation49 cloudflare49 access-control49 cross-site-scripting48 node46 aws46 smart-contract45 docker45 sql-injection45 ethereum44 defi43 web-security43 web-application42 supply-chain42 oauth41 web339 burp-suite36 lfi34 vulnerability-disclosure34 idor34 html-injection33 smart-contract-vulnerability32 race-condition32 clickjacking31 reverse-engineering31 information-disclosure30 csp-bypass30

0 5/10

Private LLM Inference on Consumer Blackwell GPUs

research

Systematic benchmarking of NVIDIA Blackwell consumer GPUs for LLM inference across quantization formats and workloads, demonstrating cost-effective private deployment for SMEs with 40-200x lower costs than cloud APIs and sub-second latency for most use cases.

llm-inference gpu-optimization quantization model-deployment privacy performance-benchmarking nvidia-blackwell cost-analysis rag model-serving

NVIDIA Blackwell RTX 5060 Ti RTX 5070 Ti RTX 5090 Qwen3-8B Gemma3-12B Gemma3-27B GPT-OSS-20B Jonathan Knoop Hendrik Holtmann

arxiv.org · rohansood15 · 14 hours ago · details · hn

0 1/10

Enabling Efficient Sparse Computations Using Linear Algebra Aware Compilers

research

LAPIS is a compiler framework built on MLIR that optimizes sparse linear algebra operations across diverse architectures using Kokkos dialect for performance portability and a partition dialect for distributed memory execution. The framework demonstrates MLIR's capability to enable linear algebra-level optimizations for both sparse and dense kernels on GPUs, with applications to graph algorithms, relational databases, and scientific machine learning.

compiler mlir sparse-linear-algebra performance-optimization kokkos distributed-memory tensor-operations gpu-optimization graph-algorithms sciml

LAPIS MLIR Kokkos GraphBLAS TenSQL Sandia National Laboratories Rajamanickam, Sivasankaran Kelley, Brian Michael Sadayappan, Ponnuswamy

osti.gov · matt_d · 22 hours ago · details · hn

0 5/10

How to Run Local LLMs with Claude Code (Unsloth)

tutorial

Step-by-step guide for running open-source LLMs locally with Claude Code using llama.cpp, demonstrating deployment of models like Qwen3.5 and GLM-4.7-Flash with quantization and GPU optimization for coding tasks.

llm local-deployment llama-cpp quantization gguf claude-code model-serving inference gpu-optimization tutorial

Unsloth Claude Code Qwen3.5 GLM-4.7-Flash llama.cpp DeepSeek Gemma Qwen3-Coder-Next OpenAI

unsloth.ai · armcat · 1 day ago · details · hn

0 7/10

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

research

A comprehensive survey of 16 open-source reinforcement learning libraries that implement asynchronous training architectures, analyzing design choices across 7 axes (orchestration, buffer design, weight sync protocols, staleness management, LoRA support, distributed backends) to optimize GPU utilization by disaggregating inference and training workloads.

reinforcement-learning asynchronous-training gpu-optimization distributed-training model-inference rollout-buffer weight-synchronization lora-training vllm ray nccl post-training chain-of-thought agentic-ai mixture-of-experts orchestration

TRL Ray NCCL vLLM GRPO LoRA MiniMax Forge Deepseek v3.2 Amine Dirhoussi Quentin Gallouédec Kashif Rasul Lewis Tunstall Edward Beeching

huggingface.co · kashifr · 1 day ago · details · hn