benchmarking

10 articles

sort: new top best

bug-bounty500 xss244 rce151 google143 bragging-post120 malware118 microsoft115 facebook95 open-source91 account-takeover90 cve88 exploit87 privilege-escalation86 authentication-bypass76 csrf75 stored-xss72 access-control65 phishing64 ai-agents63 reflected-xss61 web-security53 input-validation53 apple52 sql-injection49 cross-site-scripting48 reverse-engineering48 smart-contract46 tool46 defi45 ethereum45 privacy44 supply-chain44 web-application43 ssrf41 dos41 web341 information-disclosure39 llm37 responsible-disclosure37 cloudflare36 api-security36 burp-suite35 opinion35 automation34 vulnerability-disclosure34 idor32 machine-learning32 infrastructure31 writeup31 denial-of-service31

0 2/10

The Shape of the Thing

opinion

An essay analyzing the exponential improvement of AI capabilities and the shift from co-intelligence (interactive AI assistance) to agentic AI systems that autonomously complete complex tasks. The author examines real-world examples like StrongDM's Software Factory, which uses AI agents to write, test, and deploy production code without human involvement, and discusses the organizational and market disruptions this enables.

ai-agents large-language-models ai-capabilities organizational-disruption software-development automation ai-governance benchmarking

Ethan Mollick Claude Code OpenAI Codex OpenClaw StrongDM Bytedance METR Block Citrini Research Anthropic Pentagon

oneusefulthing.org · andyjohnson0 · 1 day ago · details · hn

0 2/10

Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations

news

Shopify CEO Tobias Lütke used an AI-assisted autoresearch pattern with a coding agent to optimize the Liquid template engine, achieving 53% faster parse+render performance and 61% fewer allocations through 120 automated experiments across 93 commits. The effort demonstrates how robust test suites and AI agents enable effective performance optimization and enable high-level engineers to contribute meaningfully to code.

performance-optimization code-optimization ruby template-engine ai-assisted-programming coding-agents benchmarking open-source

Shopify/liquid Tobias Lütke Andrej Karpathy Simon Willison Django Pi David Cortés

simonwillison.net · duck · 2 days ago · details · hn

0 2/10

Qodo Outperforms Claude in Code Review Benchmark

benchmark

Qodo's research team published a standardized code review benchmark evaluating AI tools on realistic defects injected into production PRs, finding their tool outperforms Claude Code Review by 12 F1 points in recall while maintaining equivalent precision at 1/10th the cost.

code-review ai-tools benchmarking multi-agent-systems code-quality static-analysis comparative-analysis vendor-comparison

Qodo Claude Code Review Anthropic NVIDIA Nemotron 3 Super OpenAI Google

qodo.ai · bobismyuncle · 2 days ago · details · hn

0 1/10

Company AI Readiness Scores

tool

Company Profiler is a web tool that scores 500+ companies (0-100) on AI readiness by analyzing AI hiring intensity, R&D investment, public AI initiatives, and competitive positioning across 15 industries, providing data-driven competitive benchmarking rather than PR-based assessments.

ai-readiness competitive-intelligence hiring-analysis r&d-investment benchmarking data-analysis

Company Profiler Spotify Fubo Axios Viacom Adzuna API Yahoo Finance Kimi K2.5 Cloudflare Node.js Express SQLite

company.lost2038.com · mikeberkley · 2 days ago · details · hn

0 2/10

Nvidia Builds Open Data for AI

news

NVIDIA announces a suite of open datasets and training frameworks across multiple AI domains including robotics, autonomous vehicles, synthetic personas, protein modeling, and language model pre-training, with over 2 petabytes of data across 180+ datasets designed to reduce AI development bottlenecks.

ai-datasets open-source machine-learning data-engineering model-training robotics autonomous-vehicles synthetic-data language-models benchmarking rag-systems embedding-models protein-structure drug-discovery

NVIDIA Nemotron GR00T HuggingFace GitHub Runway CrowdStrike NTT Data APTO AI Singapore WideLabs Oxford Mila CIFAR Andrej Karpathy

huggingface.co · gmays · 2 days ago · details · hn

0 3/10

Big data on the cheapest MacBook

benchmark

A performance benchmark of DuckDB running on the entry-level MacBook Neo (Apple A18 Pro, 8GB RAM, $700), evaluating its capability on analytical database workloads using ClickBench and TPC-DS benchmarks. The MacBook surprisingly outperformed cloud instances on cold-run queries due to local NVMe storage, though struggled with memory-intensive TPC-DS SF300 workloads requiring extensive disk spilling.

database-performance benchmarking duckdb clickbench tpc-ds hardware-optimization macos arm-architecture memory-constraints out-of-core-processing

DuckDB MacBook Neo Apple A18 Pro ClickBench TPC-DS TPC-H Gábor Szárnyas c6a.4xlarge c8g.metal-48xl Graviton4

duckdb.org · bcye · 2 days ago · details · hn

0 3/10

The Cost of Indirection in Rust

opinion

This article argues that concerns about function call overhead in Rust async code are often unfounded, demonstrating that modern compilers inline small functions in release builds, making indirection cost negligible compared to actual I/O and system-level operations. The author emphasizes that code readability and maintainability should take priority over micro-optimizations, and provides concrete benchmarking and profiling techniques to measure real performance impact.

rust performance-optimization compiler-optimization code-design async-programming micro-optimization inlining benchmarking best-practices

Rust Criterion valgrind perf flamegraph dtrace Instruments

blog.sebastiansastre.co · sebastianconcpt · 5 days ago · details · hn

0 4/10

MariaDB innovation: vector index performance

benchmark

Benchmark comparison of vector search performance across MariaDB 11.8, MariaDB 12.3, and Postgres 18.2 with pgvector, showing MariaDB 12.3 achieves superior recall/precision and lower CPU usage per query on datasets ranging from 100k to 1M vectors.

vector-search database-performance benchmarking mariadb postgres ann-benchmarks indexing cpu-optimization

MariaDB MariaDB 11.8.5 MariaDB 12.3.0 Postgres 18.2 pgvector 0.8.1 Small Datum LLC MariaDB Foundation Mark Callaghan Hetzner ax162-s

smalldatum.blogspot.com · gslin · 7 days ago · details · hn

0 7/10

LLMs work best when the user defines their acceptance criteria first

research

A detailed technical analysis of an LLM-generated SQLite reimplementation in Rust that demonstrates critical performance failures (~20,000x slower) despite appearing correct. The article identifies two root-cause bugs: a missing INTEGER PRIMARY KEY optimization that forces full table scans instead of O(log n) B-tree lookups, and unnecessary fsync calls on every statement, alongside compound inefficiencies in AST caching and memory allocation patterns.

llm-generated-code code-quality performance-analysis bug-analysis database-optimization sqlite rust query-optimization b-tree benchmarking plausible-code acceptance-criteria

SQLite Rust Turso libsql METR GitClear Fjall

blog.katanaquant.com · dnw · 8 days ago · details · hn

0 6/10

The hidden compile-time cost of C++26 reflection

research

Benchmark analysis of C++26 reflection compile-time overhead on GCC 16, showing that including <meta> adds ~155ms overhead but the reflection logic itself scales reasonably (~1-2ms per type). Demonstrates that Standard Library headers are the primary bottleneck and precompiled headers are essential for practical compile-time performance in reflection-heavy code.

c++ c++26 reflection compile-time-performance benchmarking metaprogramming standard-library precompiled-headers gcc optimization

Vittorio Romeo Jonathan Müller Barry Revzin Jonathan Wakely GCC 16 C++26 Fedora 44 SFML hyperfine

vittorioromeo.com · SuperV1234 · 8 days ago · details · hn