compiler-optimization

3 articles

sort: new top best

bug-bounty490 xss247 rce124 bragging-post119 account-takeover106 google99 open-source92 privilege-escalation87 authentication-bypass87 csrf82 facebook79 stored-xss75 malware74 microsoft72 access-control71 ai-agents67 web-security64 reflected-xss63 cve59 phishing53 exploit53 input-validation51 sql-injection50 smart-contract49 defi48 cross-site-scripting48 privacy47 information-disclosure46 ethereum46 tool46 ssrf44 api-security43 reverse-engineering43 vulnerability-disclosure39 web-application38 burp-suite37 llm37 opinion37 automation35 ai-security35 dos35 responsible-disclosure34 infrastructure33 apple33 html-injection33 smart-contract-vulnerability33 writeup33 cloudflare33 web333 idor32

0 3/10

The Cost of Indirection in Rust

opinion

This article argues that concerns about function call overhead in Rust async code are often unfounded, demonstrating that modern compilers inline small functions in release builds, making indirection cost negligible compared to actual I/O and system-level operations. The author emphasizes that code readability and maintainability should take priority over micro-optimizations, and provides concrete benchmarking and profiling techniques to measure real performance impact.

rust performance-optimization compiler-optimization code-design async-programming micro-optimization inlining benchmarking best-practices

Rust Criterion valgrind perf flamegraph dtrace Instruments

blog.sebastiansastre.co · sebastianconcpt · 4 days ago · details · hn

0 8/10

Forcing Flash Attention onto a TPU and Learning the Hard Way

tutorial

A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.

flash-attention jax tpu kernel-optimization attention-mechanism llm-internals xla-compiler systolic-array triton gpu compiler-optimization numerical-stability online-softmax

Archer Zhang JAX XLA Triton TPU Colab Flash Attention

archerzhang.me · azhng · 6 days ago · details · hn

0 3/10

Practical Guide to Bare Metal C++

tutorial

A practical guide to using C++ for bare metal and embedded systems development, covering STL constraints, template metaprogramming, memory management, and real-time system design without OS services. The author demonstrates how C++ superiority in code reuse and generic programming can benefit embedded developers through the embxx library and ARM-based examples.

bare-metal embedded-systems c++ arm tutorial stl templates cross-compilation raspberry-pi real-time-systems memory-constrained compiler-optimization

Alex Robenko embxx embxx_on_rpi RaspberryPi ARM GNU Tools for ARM Embedded Processors Sourcery CodeBench Lite Edition CMake C++11

arobenko.github.io · ibobev · 7 days ago · details · hn