llm-optimization

2 articles
sort: new top best
clear filter
0 4/10

A technical analysis of sparsity versus quantization as hardware optimization strategies for neural networks, exploring architectural challenges (unstructured sparse data chaos vs. quantization metadata overhead) and current compromises (structured sparsity patterns and algorithmic co-design techniques) used in modern AI accelerators.

NVIDIA Ampere EIE SCNN BitNet b1.58 GPTQ Quip SmoothQuant AWQ StreamingLLM OCP Microscaling Formats Deep Compression
sigarch.org · matt_d · 23 hours ago · details · hn
0 3/10

Tarvos introduces a relay architecture for AI coding agents that mitigates LLM context degradation by dispatching fresh agents sequentially, each reading a static master plan and receiving only a minimal 40-line handoff note, with automatic token-based switching when context budgets are exceeded.

Tarvos Claude Code Chroma Research MIT
github.com · Photon48 · 23 hours ago · details · hn