By: P.E. McKenney
Published in: PLoPD2
Category: Parallel Programming
Locking designs for parallel programs.
To eliminate the complexity of parallelization, construct an entirely sequential program. Eliminate all synchronization primitives and the overhead and complexity associated with them.
Use code locking when a small amount of execution time is spent in critical sections or when you only need modest scaling. This is the simplest locking design.
To get better speedups than straightforward parallelizations such as Code Locking, partition data structures so that each portion can be processed in parallel. Each portion has its own independent critical section.
Each CPU or thread should own its own data and shouldn't need locking primitives to access it, but must use a communication mechanism to access another CPU's or thread's data.
To get speedups in programs that can't use aggressive locking patterns throughout, use an aggressive locking pattern for most of the workload (the fastpath) and a more conservative approach for the rest.
To speedup programs that rarely modify shared data, allow readers to proceed in parallel, but require writers to exclude readers and other writers.
To speedup programs when updates are complex and expensive, but infrequent operations such as insertion and deletion require coarse-grained locking, partition data structures into coarse- and fine-grained portions. For example, use a single lock for the internal nodes and links of a search tree, but use a separate lock for each of the leaves.
To get speedups in global memory allocators, create a per-CPU or per-process cache of data structure instances. A CPU owns the instances in its cache, so it won't incur overhead and contention penalties to allocate and free them.
To get speedups in programs with frequent, small critical sections on machines with high synchronization overheads, combine small critical sections into larger ones.
To get speedups in programs with infrequent, large critical sections on machines with low synchronization overheads, split large critical sections into smaller ones.