<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mmap on Segflow</title><link>https://segflow.github.io/tags/mmap/</link><description>Recent content in Mmap on Segflow</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 29 May 2026 00:40:00 +0200</lastBuildDate><atom:link href="https://segflow.github.io/tags/mmap/index.xml" rel="self" type="application/rss+xml"/><item><title>Finding a needle in a 4 GB haystack: from 0.75 GB/s to 49 GB/s in Go</title><link>https://segflow.github.io/post/fast-file-search-go/</link><pubDate>Fri, 29 May 2026 00:40:00 +0200</pubDate><guid>https://segflow.github.io/post/fast-file-search-go/</guid><description>&lt;p&gt;I had a 4 GiB file that&amp;rsquo;s almost entirely zeros, exactly &lt;strong&gt;one&lt;/strong&gt; non-zero
&lt;code&gt;int64&lt;/code&gt; is hiding at offset &lt;code&gt;Size - 8&lt;/code&gt; (the last aligned slot). The task: find
that offset, as fast as possible, in Go on Linux.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a deliberately silly problem. There&amp;rsquo;s no parsing, no indexing, no
cleverness on the algorithm side. The only thing it measures is how much data
we can pull through a CPU per second. Exactly the kind of micro-task that
exposes every layer of the stack: the Go runtime, the standard library, the
kernel, the page cache, the memory hierarchy, and SIMD, including Go 1.26&amp;rsquo;s
brand-new &lt;code&gt;simd/archsimd&lt;/code&gt; package that lets you write AVX-512 in pure Go.&lt;/p&gt;
&lt;p&gt;Starting from the most obvious &lt;code&gt;os.ReadFile&lt;/code&gt; + &lt;code&gt;for range&lt;/code&gt; we get &lt;strong&gt;0.75
GB/s&lt;/strong&gt;. Thirteen variants later we&amp;rsquo;re at &lt;strong&gt;49 GB/s&lt;/strong&gt;, a 66× speedup, and
we&amp;rsquo;ll know exactly which wall we hit and why.&lt;/p&gt;</description></item></channel></rss>