gpu-optimization

4 articles
sort: new top best
clear filter
0 5/10

Systematic benchmarking of NVIDIA Blackwell consumer GPUs for LLM inference across quantization formats and workloads, demonstrating cost-effective private deployment for SMEs with 40-200x lower costs than cloud APIs and sub-second latency for most use cases.

NVIDIA Blackwell RTX 5060 Ti RTX 5070 Ti RTX 5090 Qwen3-8B Gemma3-12B Gemma3-27B GPT-OSS-20B Jonathan Knoop Hendrik Holtmann
arxiv.org · rohansood15 · 14 hours ago · details · hn
0 1/10

LAPIS is a compiler framework built on MLIR that optimizes sparse linear algebra operations across diverse architectures using Kokkos dialect for performance portability and a partition dialect for distributed memory execution. The framework demonstrates MLIR's capability to enable linear algebra-level optimizations for both sparse and dense kernels on GPUs, with applications to graph algorithms, relational databases, and scientific machine learning.

LAPIS MLIR Kokkos GraphBLAS TenSQL Sandia National Laboratories Rajamanickam, Sivasankaran Kelley, Brian Michael Sadayappan, Ponnuswamy
osti.gov · matt_d · 22 hours ago · details · hn
0 5/10

Step-by-step guide for running open-source LLMs locally with Claude Code using llama.cpp, demonstrating deployment of models like Qwen3.5 and GLM-4.7-Flash with quantization and GPU optimization for coding tasks.

Unsloth Claude Code Qwen3.5 GLM-4.7-Flash llama.cpp DeepSeek Gemma Qwen3-Coder-Next OpenAI
unsloth.ai · armcat · 1 day ago · details · hn
0 7/10

A comprehensive survey of 16 open-source reinforcement learning libraries that implement asynchronous training architectures, analyzing design choices across 7 axes (orchestration, buffer design, weight sync protocols, staleness management, LoRA support, distributed backends) to optimize GPU utilization by disaggregating inference and training workloads.

TRL Ray NCCL vLLM GRPO LoRA MiniMax Forge Deepseek v3.2 Amine Dirhoussi Quentin Gallouédec Kashif Rasul Lewis Tunstall Edward Beeching
huggingface.co · kashifr · 1 day ago · details · hn