compiler-optimization

3 articles
sort: new top best
clear filter
0 3/10

This article argues that concerns about function call overhead in Rust async code are often unfounded, demonstrating that modern compilers inline small functions in release builds, making indirection cost negligible compared to actual I/O and system-level operations. The author emphasizes that code readability and maintainability should take priority over micro-optimizations, and provides concrete benchmarking and profiling techniques to measure real performance impact.

Rust Criterion valgrind perf flamegraph dtrace Instruments
blog.sebastiansastre.co · sebastianconcpt · 4 days ago · details · hn
0 8/10

A deep technical exploration of porting a Flash Attention kernel from GPU (Triton) to TPU using JAX, covering the fundamental differences in programming models, compiler behavior, and hardware architectures. The author details how JAX's functional, immutable paradigm and XLA compilation differ from explicit GPU kernel writing, and includes benchmarking and a custom systolic array emulator to understand TPU data flow.

Archer Zhang JAX XLA Triton TPU Colab Flash Attention
archerzhang.me · azhng · 6 days ago · details · hn
0 3/10

A practical guide to using C++ for bare metal and embedded systems development, covering STL constraints, template metaprogramming, memory management, and real-time system design without OS services. The author demonstrates how C++ superiority in code reuse and generic programming can benefit embedded developers through the embxx library and ARM-based examples.

Alex Robenko embxx embxx_on_rpi RaspberryPi ARM GNU Tools for ARM Embedded Processors Sourcery CodeBench Lite Edition CMake C++11
arobenko.github.io · ibobev · 7 days ago · details · hn