Fundamental Concepts
Ring Buffer
- A ring buffer (or circular buffer) is a fixed-size queue in which the write and read pointers wrap around once they reach the end of the underlying array. It is widely used in high-performance systems such as logging pipelines, network buffers, and message queues.
MPSC (Multi-Producer, Single-Consumer)
An MPSC queue allows multiple producers (writers) to insert elements concurrently, while only a single consumer (reader) retrieves elements.
Write operations must address concurrent access among producers.
Read operations are simpler, as they are performed by a single consumer thread.
Lock-Free
- Instead of using traditional mutexes, lock-free structures rely on atomic operations (e.g., Compare-And-Swap, CAS) to ensure correctness under concurrency. This avoids lock contention, reducing latency and improving throughput.
Sharded
- A sharded design partitions a large ring buffer into several smaller independent sub-buffers (shards). Producers are distributed across shards (e.g., by hashing or thread affinity), which minimizes contention. The consumer then sequentially or cyclically retrieves items from all shards.
Operating Principles
A sharded lock-free MPSC ring buffer typically operates as follows:
Initialization
The total buffer capacity is divided into N shards (e.g., 8 sub-buffers).
Each shard itself is implemented as a lock-free MPSC ring buffer.
Producer Writes (Concurrent)
Each producer selects a shard based on a hash function, producer ID, or randomized strategy.
The producer atomically advances the write pointer in that shard using CAS and writes its data.
Because writes are distributed across shards, contention is significantly reduced.
Consumer Reads (Single Thread)
The consumer iterates over all shards, checking each for available entries.
Data is retrieved by advancing the shard’s read pointer.
As only one consumer exists, no synchronization overhead is required for reading.
Motivation for Sharding
Challenge with Conventional MPSC Buffers
- In a non-sharded MPSC buffer, all producers compete on a single shared write pointer, resulting in substantial contention under high concurrency.
Benefits of Sharding
Each shard has fewer competing producers, reducing contention on its write pointer.
The single consumer can still process data deterministically by scanning all shards.
Performance gains are particularly evident when the number of producers is large relative to the consumer.
Application Scenarios
High-performance logging systems (multiple threads writing logs, one thread persisting to storage).
Network servers (multiple connections producing packets, one thread aggregating and processing them).
Data acquisition systems (multiple sensors producing input, one thread consuming for analysis).
Implementation Notes in Golang
In Go, the sync/atomic package is typically used for lock-free synchronization.
Key implementation aspects include:Write Operation
1
2idx := atomic.AddUint64(&shard.writePos, 1) - 1
buffer[idx % shard.size] = value- Read Operation (single-threaded, no CAS required)
1
2
3
4
5if shard.readPos < shard.writePos {
val := buffer[shard.readPos % shard.size]
shard.readPos++
return val
}- Shard Selection
1
2shardID := hash(producerID) % numShards
shard := shards[shardID]- Consumer Loop
1
2
3
4
5
6
7for {
for _, shard := range shards {
if val, ok := shard.TryRead(); ok {
process(val)
}
}
}Advantages and Limitations
Advantages
Lock-free design avoids mutex contention.
Sharding reduces producer contention and increases throughput.
Single-consumer semantics simplify design and maintain order within each shard.
Limitations
The consumer must poll multiple shards, which may increase latency with many shards.
Buffer utilization may be uneven if some shards are heavily loaded while others remain idle.
The design does not extend naturally to multiple consumers.