<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Linux on Segflow</title><link>https://segflow.github.io/tags/linux/</link><description>Recent content in Linux on Segflow</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 29 May 2026 00:40:00 +0200</lastBuildDate><atom:link href="https://segflow.github.io/tags/linux/index.xml" rel="self" type="application/rss+xml"/><item><title>Finding a needle in a 4 GB haystack: from 0.75 GB/s to 49 GB/s in Go</title><link>https://segflow.github.io/post/fast-file-search-go/</link><pubDate>Fri, 29 May 2026 00:40:00 +0200</pubDate><guid>https://segflow.github.io/post/fast-file-search-go/</guid><description>&lt;p&gt;I had a 4 GiB file that&amp;rsquo;s almost entirely zeros, exactly &lt;strong&gt;one&lt;/strong&gt; non-zero
&lt;code&gt;int64&lt;/code&gt; is hiding at offset &lt;code&gt;Size - 8&lt;/code&gt; (the last aligned slot). The task: find
that offset, as fast as possible, in Go on Linux.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a deliberately silly problem. There&amp;rsquo;s no parsing, no indexing, no
cleverness on the algorithm side. The only thing it measures is how much data
we can pull through a CPU per second. Exactly the kind of micro-task that
exposes every layer of the stack: the Go runtime, the standard library, the
kernel, the page cache, the memory hierarchy, and SIMD, including Go 1.26&amp;rsquo;s
brand-new &lt;code&gt;simd/archsimd&lt;/code&gt; package that lets you write AVX-512 in pure Go.&lt;/p&gt;
&lt;p&gt;Starting from the most obvious &lt;code&gt;os.ReadFile&lt;/code&gt; + &lt;code&gt;for range&lt;/code&gt; we get &lt;strong&gt;0.75
GB/s&lt;/strong&gt;. Thirteen variants later we&amp;rsquo;re at &lt;strong&gt;49 GB/s&lt;/strong&gt;, a 66× speedup, and
we&amp;rsquo;ll know exactly which wall we hit and why.&lt;/p&gt;</description></item><item><title>Zero-copy in Go: sendfile, splice, and the cost of io.Copy</title><link>https://segflow.github.io/post/zero-copy-sendfile-splice/</link><pubDate>Mon, 22 Jan 2024 10:00:00 +0100</pubDate><guid>https://segflow.github.io/post/zero-copy-sendfile-splice/</guid><description>&lt;p&gt;A small file-serving service of mine slowed to a crawl one afternoon after a
&amp;ldquo;harmless&amp;rdquo; middleware change. CPU on the server box doubled, throughput
roughly halved. The diff was a single line: instead of handing a &lt;code&gt;*os.File&lt;/code&gt;
to &lt;code&gt;io.Copy&lt;/code&gt;, somebody had wrapped it in a tiny logging reader to count
bytes.&lt;/p&gt;
&lt;p&gt;That one wrap quietly turned off &lt;code&gt;sendfile(2)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This post is about that fast path: what Go does for you for free, how to see
it actually fire, and the surprisingly easy ways to lose it.&lt;/p&gt;</description></item><item><title>What strace -c taught me about a fast CLI</title><link>https://segflow.github.io/post/strace-c-fast-cli/</link><pubDate>Mon, 05 Sep 2022 11:20:00 +0100</pubDate><guid>https://segflow.github.io/post/strace-c-fast-cli/</guid><description>&lt;p&gt;The CLI was fast. I had benchmarked it on my laptop, on a fresh clone of the
repo, and it finished in well under a second. Then a coworker pointed it at a
real monorepo, the kind with 30,000 files spread across a few thousand
directories, and the thing crawled. Same code, same machine class, just more
files. The user-visible work had not changed. The wall clock had.&lt;/p&gt;
&lt;p&gt;This is the story of the half hour I spent figuring out why, what &lt;code&gt;strace -c&lt;/code&gt;
showed me, and why I now reach for it before any profiler when something
&amp;ldquo;feels slow&amp;rdquo; on Linux.&lt;/p&gt;
&lt;p&gt;My first instinct was wrong, by the way. I assumed disk. The repo was big,
the laptop has an NVMe drive but it is not magic, and &amp;ldquo;more files&amp;rdquo; sounds
like &amp;ldquo;more IO.&amp;rdquo; So I ran the program twice in a row, expecting the second
run to be fast off the page cache. It was not. Both runs took roughly the
same time. Whatever was slow, it was not waiting on the disk.&lt;/p&gt;</description></item><item><title>Tuning a Go TCP server toward 1M idle connections on a laptop</title><link>https://segflow.github.io/post/million-idle-connections/</link><pubDate>Mon, 22 Mar 2021 18:30:00 +0100</pubDate><guid>https://segflow.github.io/post/million-idle-connections/</guid><description>&lt;p&gt;I had been telling people for months that Go can &amp;ldquo;trivially&amp;rdquo; hold a million
idle TCP connections. The runtime uses epoll, goroutines are cheap, what
could go wrong. Then a colleague asked me to actually do it, and I realised
I had never tried. So I sat down with my laptop, a fresh &lt;code&gt;net.Listen&lt;/code&gt;, and
a client that just wants to open a lot of sockets.&lt;/p&gt;
&lt;p&gt;The first wall I hit was 1024 file descriptors. After that came five more
walls in quick succession, none of them in user code. This post is the log
of every wall I walked into and how to move it. Code, logs, and scripts
are in the &lt;a href="./million-idle-connections/code/"&gt;scratch directory&lt;/a&gt;.&lt;/p&gt;</description></item></channel></rss>