A pipeline has backpressure when the consumer can make the producer slow down, or refuse the work, or both. A bigger buffer does neither. It just defers the same problem, with more memory held hostage in the meantime. The two get confused a lot, and the difference shows up unmistakably in the tail latency numbers, so this post does the comparison directly.
When load exceeds capacity you have three sane options:
- Block the caller until a slot frees. Throughput pins to whatever the consumer sustains; tail latency degrades cleanly.
- Reject the caller so they retry, fall back, or shed. Latency stays low for accepted work; some work never happens.
- Rate-limit the caller so they never get close to overloading you. Predictable throughput, the rest is queued or dropped.
Go has a direct idiom for each: a bounded channel for option 1, the
golang.org/x/sync/semaphore package for option 1 with per-item
weighting, and a time.Ticker token bucket for option 3 (and
trivially option 2). They are not interchangeable. They produce three
very different latency curves under the same workload, which is what we
will see below.
The workload
I built one harness and reused the same request schedule against all three patterns. Knobs:
- 2000 requests at a steady 1000 req/s burst (so the arrival stream completes in 2 seconds).
- Each request does ~100 ms of work (
time.Sleepwith a small jitter so the consumer is the bottleneck, not the harness). - 16 worker slots for the blocking patterns: 160 req/s sustained against a 1000 req/s burst, a 6.25x overload.
- Token bucket at 160 tokens/s, burst 32, matched to the consumer’s actual capacity.
- Latency measured from the moment the producer attempts to submit to the moment the work completes. That is what the caller sees.
Hardware: 12 logical CPUs, Go 1.20, GOMAXPROCS=12. Each pattern ran
twice with different RNG seeds and the numbers below reproduced to
within a couple of milliseconds.
Pattern A: bounded channel
The simplest possible thing that works. A buffered chan is the slot
pool. A fixed number of workers reads from it. Producers send blocks
when the channel is full. That send blocking is the backpressure.
func runBounded(reqs []time.Time, workers int) {
jobs := make(chan int, workers)
var wg sync.WaitGroup
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
defer wg.Done()
for idx := range jobs {
doWork(idx) // ~100ms
}
}()
}
for i := range reqs {
// pace producer to the arrival schedule, then submit
jobs <- i // blocks when full -> caller waits
}
close(jobs)
wg.Wait()
}
Capacity is the channel buffer plus the in-flight workers, which is why
I set the buffer equal to workers. Make it bigger and you are just
preallocating queue depth in front of the same consumer.
Fair (FIFO on the same channel), cheap, no imports, no goroutine per request.
Pattern B: weighted semaphore
golang.org/x/sync/semaphore.Weighted is what you reach for when your
work items have different costs. Image resizing, encoding, an “import a
10 MB CSV” handler next to a “lookup by id” handler. You do not want one
fat job hogging a slot another job could share.
import "golang.org/x/sync/semaphore"
func runSemaphore(reqs []time.Time, capacity int64) {
sem := semaphore.NewWeighted(capacity)
var wg sync.WaitGroup
for i := range reqs {
wg.Add(1)
go func(idx int) {
defer wg.Done()
if err := sem.Acquire(ctx, 1); err != nil {
return
}
defer sem.Release(1)
doWork(idx)
}(i)
}
wg.Wait()
}
Big difference from Pattern A: the producer is not blocked. It
spawns a goroutine per request and the goroutines park in Acquire.
That gives you per-item weighting (sem.Acquire(ctx, 5) for a heavy
job) and context cancellation for free, but nothing throttles the
rate at which goroutines accumulate. If the producer outruns the
consumer by 6x, you get 6x more parked goroutines than the bounded
channel would. Sounds harmless. It is not.
Pattern C: token bucket with time.Ticker
The two patterns above are admission-based: the consumer decides who gets in based on whether there is a free slot. A token bucket is rate-based: somebody drips tokens at a fixed rate, and you must hold one to be admitted. No tokens, no admission.
func runTokenBucket(reqs []time.Time, ratePerSec, burst int) {
tokens := make(chan struct{}, burst)
for i := 0; i < burst; i++ {
tokens <- struct{}{}
}
go func() {
t := time.NewTicker(time.Second / time.Duration(ratePerSec))
defer t.Stop()
for range t.C {
select {
case tokens <- struct{}{}:
default: // bucket full, drop new token
}
}
}()
// worker pool (sized generously; admission is what shapes the curve)
jobs := make(chan int, 1024)
for i := 0; i < 64; i++ {
go func() {
for idx := range jobs {
doWork(idx)
}
}()
}
for i := range reqs {
select {
case <-tokens:
jobs <- i
default:
// shed: caller's problem now
}
}
}
Classic time.Ticker + buffered channel. The buffer is the burst
budget, the ticker rate is the steady-state ceiling. The select on
the producer side decides whether to block (omit default) or shed
(default returns immediately). I picked shed because that makes the
contrast with the first two patterns clearest.
The worker pool is intentionally oversized: the bucket, not the workers, is the throttle. Throttle in two places and you have two regulators arguing.
One workload, three answers
Identical 2000-request burst at 1000 req/s. Each consumer slot does ~100ms of work. Latency is end to end (submission to completion). All numbers below are from real runs on the same box.
| Pattern | Served | Dropped | p50 | p95 | p99 | Wall |
|---|---|---|---|---|---|---|
| bounded | 2000 | 0 | 206 ms | 224 ms | 242 ms | 12.66 s |
| semaphore | 2000 | 0 | 5346 ms | 10077 ms | 10502 ms | 12.63 s |
| tokenbkt | 351 | 1649 | 101 ms | 105 ms | 106 ms | 2.10 s |
Three very different stories.
Bounded channel: graceful queue, predictable tail
p50 of 206 ms is what you would expect. 16 workers at 100 ms per job is 160 jobs/s steady-state. The arrival burst piles up in the channel-plus-producer wait, and the average wait grows linearly with queue depth. p99 of 242 ms is the queue length at the worst moment. Memory stayed flat: heap allocation delta was 0.2 MiB for the whole run.
The producer’s send blocks. That blocking propagates up the call
stack and naturally rate-limits whatever generated the arrivals.
No goroutine accumulates.
Semaphore: same throughput, dramatically worse tail
This one surprises people. Same workers, same work, same wall time, but p50 of 5.3 seconds and p99 above 10 seconds.
The reason is in the code. Pattern B spawns a goroutine per request and
parks it in Acquire. The arrival schedule is 1000 req/s for 2
seconds, so by the time the burst ends, roughly 1840 goroutines are
waiting on the semaphore. They drain at 160 req/s. The goroutine that
arrives at t=1.9 s waits for ~1800 jobs ahead of it. That is your p99.
The bounded channel never has more than 16 jobs queued because the producer is blocked on the 17th. The semaphore has all of them queued because the producer was never blocked.
The semaphore is still doing its job: concurrency capped, no OOM, same wall time. The throughput is identical; only the latency distribution is different, because everything queues in-process instead of upstream. If you have weight-asymmetric work, the trade is often worth it. If you do not, a bounded channel is strictly better.
(In a real server the difference is worse, because each parked goroutine holds the request object, headers, body buffers, and so on. 1840 of those is not free. The harness uses trivial state, so memory delta was only 1.6 MiB.)
Token bucket: low latency for accepted work, lots of drops
p99 of 106 ms, which is essentially just the work time. The token bucket admitted 351 of 2000 requests and shed the other 1649. There is no queueing on the accepted-work side because admission was already paced to the consumer’s rate.
Wall time is 2.10 s, which is the arrival burst duration. The consumer goes idle as soon as the arrivals stop, because nothing was queued.
This is the opposite extreme from the semaphore. There you queue everything in-process. Here you queue nothing and just say no.
Decision matrix
You generally want a bounded channel. The interesting question is when to reach for the other two.
| If you… | Reach for |
|---|---|
| have homogeneous work and one producer | bounded channel |
| have weight-asymmetric work (CPU, memory, payload size) | semaphore |
| need per-context cancellation while waiting | semaphore |
| must protect a downstream system at a fixed rate | token bucket |
| can afford to drop, would rather shed than queue | token bucket |
| have nothing to push back on the producer (e.g. HTTP) | token bucket + 429 |
A pattern I have used in production: a token bucket at the edge, shedding 429s to clients during sudden overload, with a bounded channel inside the service to make sure that even if the bucket admits a burst, the internal pipeline never grows past a fixed size. Two layers, each doing the thing it is best at.
The semaphore is a specialist. Use it when you mean it: items truly
differ in cost, or you need context.Context plumbing through the
wait.
What I would not do
Buffering. Specifically, an unbounded channel, or a chan with capacity
“big enough that this never blocks.” That is not backpressure; that is
deferring the problem until the heap runs out, which on a managed
runtime means a long GC pause, then OOM, then a page.
If you must buffer (bursty arrivals with a smoothing window that makes sense), bound it explicitly, and decide ahead of time what happens at the bound. Block? Drop? Both are fine. Silently growing is not.
If you want this productionized
The above is intentionally minimal. Real codebases will want
golang.org/x/time/rate for the token bucket. Same algorithm, plus
per-key limits, Wait(ctx) that respects deadlines, and Reserve for
“tell me how long until I would be allowed”. I just wanted to show what
the patterns underneath the library look like, and why picking the
wrong one quietly puts 10 seconds onto your tail.
The point is not the absolute numbers. It is that the same workload produces three completely different latency distributions depending on which knob you turn, and “I added a bigger buffer” is not on the list.
