arm-neon

1 article
sort: new top best
clear filter
0 4/10

This article explores optimizing prefix sum (scan) operations on ARM NEON SIMD instructions, demonstrating how to process multiple integer values in parallel using vector operations and interleaved load/store techniques to achieve speeds up to tens of gigabytes per second compared to scalar loop approaches.

Daniel Lemire ARM NEON
lemire.me · mfiguiere · 6 days ago · details · hn