diff options
40 files changed, 3973 insertions, 0 deletions
diff --git a/tools/memory-model/Documentation/cheatsheet.txt b/tools/memory-model/Documentation/cheatsheet.txt new file mode 100644 index 000000000000..1917712bce99 --- /dev/null +++ b/tools/memory-model/Documentation/cheatsheet.txt @@ -0,0 +1,30 @@ + Prior Operation Subsequent Operation + --------------- --------------------------- + C Self R W RWM Self R W DR DW RMW SV + __ ---- - - --- ---- - - -- -- --- -- + +Store, e.g., WRITE_ONCE() Y Y +Load, e.g., READ_ONCE() Y Y Y +Unsuccessful RMW operation Y Y Y +smp_read_barrier_depends() Y Y Y +*_dereference() Y Y Y Y +Successful *_acquire() R Y Y Y Y Y Y +Successful *_release() C Y Y Y W Y +smp_rmb() Y R Y Y R +smp_wmb() Y W Y Y W +smp_mb() & synchronize_rcu() CP Y Y Y Y Y Y Y Y +Successful full non-void RMW CP Y Y Y Y Y Y Y Y Y Y Y +smp_mb__before_atomic() CP Y Y Y a a a a Y +smp_mb__after_atomic() CP a a Y Y Y Y Y + + +Key: C: Ordering is cumulative + P: Ordering propagates + R: Read, for example, READ_ONCE(), or read portion of RMW + W: Write, for example, WRITE_ONCE(), or write portion of RMW + Y: Provides ordering + a: Provides ordering given intervening RMW atomic operation + DR: Dependent read (address dependency) + DW: Dependent write (address, data, or control dependency) + RMW: Atomic read-modify-write operation + SV Same-variable access diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt new file mode 100644 index 000000000000..867e0ea69b6d --- /dev/null +++ b/tools/memory-model/Documentation/explanation.txt @@ -0,0 +1,1840 @@ +Explanation of the Linux-Kernel Memory Model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:Author: Alan Stern <stern@rowland.harvard.edu> +:Created: October 2017 + +.. Contents + + 1. INTRODUCTION + 2. BACKGROUND + 3. A SIMPLE EXAMPLE + 4. A SELECTION OF MEMORY MODELS + 5. ORDERING AND CYCLES + 6. EVENTS + 7. THE PROGRAM ORDER RELATION: po AND po-loc + 8. A WARNING + 9. DEPENDENCY RELATIONS: data, addr, and ctrl + 10. THE READS-FROM RELATION: rf, rfi, and rfe + 11. CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe + 12. THE FROM-READS RELATION: fr, fri, and fre + 13. AN OPERATIONAL MODEL + 14. PROPAGATION ORDER RELATION: cumul-fence + 15. DERIVATION OF THE LKMM FROM THE OPERATIONAL MODEL + 16. SEQUENTIAL CONSISTENCY PER VARIABLE + 17. ATOMIC UPDATES: rmw + 18. THE PRESERVED PROGRAM ORDER RELATION: ppo + 19. AND THEN THERE WAS ALPHA + 20. THE HAPPENS-BEFORE RELATION: hb + 21. THE PROPAGATES-BEFORE RELATION: pb + 22. RCU RELATIONS: link, gp-link, rscs-link, and rcu-path + 23. ODDS AND ENDS + + + +INTRODUCTION +------------ + +The Linux-kernel memory model (LKMM) is rather complex and obscure. +This is particularly evident if you read through the linux-kernel.bell +and linux-kernel.cat files that make up the formal version of the +memory model; they are extremely terse and their meanings are far from +clear. + +This document describes the ideas underlying the LKMM. It is meant +for people who want to understand how the memory model was designed. +It does not go into the details of the code in the .bell and .cat +files; rather, it explains in English what the code expresses +symbolically. + +Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed +toward beginners; they explain what memory models are and the basic +notions shared by all such models. People already familiar with these +concepts can skim or skip over them. Sections 6 (EVENTS) through 12 +(THE FROM_READS RELATION) describe the fundamental relations used in +many memory models. Starting in Section 13 (AN OPERATIONAL MODEL), +the workings of the LKMM itself are covered. + +Warning: The code examples in this document are not written in the +proper format for litmus tests. They don't include a header line, the +initializations are not enclosed in braces, the global variables are +not passed by pointers, and they don't have an "exists" clause at the +end. Converting them to the right format is left as an exercise for +the reader. + + +BACKGROUND +---------- + +A memory consistency model (or just memory model, for short) is +something which predicts, given a piece of computer code running on a +particular kind of system, what values may be obtained by the code's +load instructions. The LKMM makes these predictions for code running +as part of the Linux kernel. + +In practice, people tend to use memory models the other way around. +That is, given a piece of code and a collection of values specified +for the loads, the model will predict whether it is possible for the +code to run in such a way that the loads will indeed obtain the +specified values. Of course, this is just another way of expressing +the same idea. + +For code running on a uniprocessor system, the predictions are easy: +Each load instruction must obtain the value written by the most recent +store instruction accessing the same location (we ignore complicating +factors such as DMA and mixed-size accesses.) But on multiprocessor +systems, with multiple CPUs making concurrent accesses to shared +memory locations, things aren't so simple. + +Different architectures have differing memory models, and the Linux +kernel supports a variety of architectures. The LKMM has to be fairly +permissive, in the sense that any behavior allowed by one of these +architectures also has to be allowed by the LKMM. + + +A SIMPLE EXAMPLE +---------------- + +Here is a simple example to illustrate the basic concepts. Consider +some code running as part of a device driver for an input device. The +driver might contain an interrupt handler which collects data from the +device, stores it in a buffer, and sets a flag to indicate the buffer +is full. Running concurrently on a different CPU might be a part of +the driver code being executed by a process in the midst of a read(2) +system call. This code tests the flag to see whether the buffer is +ready, and if it is, copies the data back to userspace. The buffer +and the flag are memory locations shared between the two CPUs. + +We can abstract out the important pieces of the driver code as follows +(the reason for using WRITE_ONCE() and READ_ONCE() instead of simple +assignment statements is discussed later): + + int buf = 0, flag = 0; + + P0() + { + WRITE_ONCE(buf, 1); + WRITE_ONCE(flag, 1); + } + + P1() + { + int r1; + int r2 = 0; + + r1 = READ_ONCE(flag); + if (r1) + r2 = READ_ONCE(buf); + } + +Here the P0() function represents the interrupt handler running on one +CPU and P1() represents the read() routine running on another. The +value 1 stored in buf represents input data collected from the device. +Thus, P0 stores the data in buf and then sets flag. Meanwhile, P1 +reads flag into the private variable r1, and if it is set, reads the +data from buf into a second private variable r2 for copying to +userspace. (Presumably if flag is not set then the driver will wait a +while and try again.) + +This pattern of memory accesses, where one CPU stores values to two +shared memory locations and another CPU loads from those locations in +the opposite order, is widely known as the "Message Passing" or MP +pattern. It is typical of memory access patterns in the kernel. + +Please note that this example code is a simplified abstraction. Real +buffers are usually larger than a single integer, real device drivers +usually use sleep and wakeup mechanisms rather than polling for I/O +completion, and real code generally doesn't bother to copy values into +private variables before using them. All that is beside the point; +the idea here is simply to illustrate the overall pattern of memory +accesses by the CPUs. + +A memory model will predict what values P1 might obtain for its loads +from flag and buf, or equivalently, what values r1 and r2 might end up +with after the code has finished running. + +Some predictions are trivial. For instance, no sane memory model would +predict that r1 = 42 or r2 = -7, because neither of those values ever +gets stored in flag or buf. + +Some nontrivial predictions are nonetheless quite simple. For +instance, P1 might run entirely before P0 begins, in which case r1 and +r2 will both be 0 at the end. Or P0 might run entirely before P1 +begins, in which case r1 and r2 will both be 1. + +The interesting predictions concern what might happen when the two +routines run concurrently. One possibility is that P1 runs after P0's +store to buf but before the store to flag. In this case, r1 and r2 +will again both be 0. (If P1 had been designed to read buf +unconditionally then we would instead have r1 = 0 and r2 = 1.) + +However, the most interesting possibility is where r1 = 1 and r2 = 0. +If this were to occur it would mean the driver contains a bug, because +incorrect data would get sent to the user: 0 instead of 1. As it +happens, the LKMM does predict this outcome can occur, and the example +driver code shown above is indeed buggy. + + +A SELECTION OF MEMORY MODELS +---------------------------- + +The first widely cited memory model, and the simplest to understand, +is Sequential Consistency. According to this model, systems behave as +if each CPU executed its instructions in order but with unspecified +timing. In other words, the instructions from the various CPUs get +interleaved in a nondeterministic way, always according to some single +global order that agrees with the order of the instructions in the +program source for each CPU. The model says that the value obtained +by each load is simply the value written by the most recently executed +store to the same memory location, from any CPU. + +For the MP example code shown above, Sequential Consistency predicts +that the undesired result r1 = 1, r2 = 0 cannot occur. The reasoning +goes like this: + + Since r1 = 1, P0 must store 1 to flag before P1 loads 1 from + it, as loads can obtain values only from earlier stores. + + P1 loads from flag before loading from buf, since CPUs execute + their instructions in order. + + P1 must load 0 from buf before P0 stores 1 to it; otherwise r2 + would be 1 since a load obtains its value from the most recent + store to the same address. + + P0 stores 1 to buf before storing 1 to flag, since it executes + its instructions in order. + + Since an instruction (in this case, P1's store to flag) cannot + execute before itself, the specified outcome is impossible. + +However, real computer hardware almost never follows the Sequential +Consistency memory model; doing so would rule out too many valuable +performance optimizations. On ARM and PowerPC architectures, for +instance, the MP example code really does sometimes yield r1 = 1 and +r2 = 0. + +x86 and SPARC follow yet a different memory model: TSO (Total Store +Ordering). This model predicts that the undesired outcome for the MP +pattern cannot occur, but in other respects it differs from Sequential +Consistency. One example is the Store Buffer (SB) pattern, in which +each CPU stores to its own shared location and then loads from the +other CPU's location: + + int x = 0, y = 0; + + P0() + { + int r0; + + WRITE_ONCE(x, 1); + r0 = READ_ONCE(y); + } + + P1() + { + int r1; + + WRITE_ONCE(y, 1); + r1 = READ_ONCE(x); + } + +Sequential Consistency predicts that the outcome r0 = 0, r1 = 0 is +impossible. (Exercise: Figure out the reasoning.) But TSO allows +this outcome to occur, and in fact it does sometimes occur on x86 and +SPARC systems. + +The LKMM was inspired by the memory models followed by PowerPC, ARM, +x86, Alpha, and other architectures. However, it is different in +detail from each of them. + + +ORDERING AND CYCLES +------------------- + +Memory models are all about ordering. Often this is temporal ordering +(i.e., the order in which certain events occur) but it doesn't have to +be; consider for example the order of instructions in a program's +source code. We saw above that Sequential Consistency makes an +important assumption that CPUs execute instructions in the same order +as those instructions occur in the code, and there are many other +instances of ordering playing central roles in memory models. + +The counterpart to ordering is a cycle. Ordering rules out cycles: +It's not possible to have X ordered before Y, Y ordered before Z, and +Z ordered before X, because this would mean that X is ordered before +itself. The analysis of the MP example under Sequential Consistency +involved just such an impossible cycle: + + W: P0 stores 1 to flag executes before + X: P1 loads 1 from flag executes before + Y: P1 loads 0 from buf executes before + Z: P0 stores 1 to buf executes before + W: P0 stores 1 to flag. + +In short, if a memory model requires certain accesses to be ordered, +and a certain outcome for the loads in a piece of code can happen only +if those accesses would form a cycle, then the memory model predicts +that outcome cannot occur. + +The LKMM is defined largely in terms of cycles, as we will see. + + +EVENTS +------ + +The LKMM does not work directly with the C statements that make up +kernel source code. Instead it considers the effects of those +statements in a more abstract form, namely, events. The model +includes three types of events: + + Read events correspond to loads from shared memory, such as + calls to READ_ONCE(), smp_load_acquire(), or + rcu_dereference(). + + Write events correspond to stores to shared memory, such as + calls to WRITE_ONCE(), smp_store_release(), or atomic_set(). + + Fence events correspond to memory barriers (also known as + fences), such as calls to smp_rmb() or rcu_read_lock(). + +These categories are not exclusive; a read or write event can also be +a fence. This happens with functions like smp_load_acquire() or +spin_lock(). However, no single event can be both a read and a write. +Atomic read-modify-write accesses, such as atomic_inc() or xchg(), +correspond to a pair of events: a read followed by a write. (The +write event is omitted for executions where it doesn't occur, such as +a cmpxchg() where the comparison fails.) + +Other parts of the code, those which do not involve interaction with +shared memory, do not give rise to events. Thus, arithmetic and +logical computations, control-flow instructions, or accesses to +private memory or CPU registers are not of central interest to the +memory model. They only affect the model's predictions indirectly. +For example, an arithmetic computation might determine the value that +gets stored to a shared memory location (or in the case of an array +index, the address where the value gets stored), but the memory model +is concerned only with the store itself -- its value and its address +-- not the computation leading up to it. + +Events in the LKMM can be linked by various relations, which we will +describe in the following sections. The memory model requires certain +of these relations to be orderings, that is, it requires them not to +have any cycles. + + +THE PROGRAM ORDER RELATION: po AND po-loc +----------------------------------------- + +The most important relation between events is program order (po). You +can think of it as the order in which statements occur in the source +code after branches are taken into account and loops have been +unrolled. A better description might be the order in which +instructions are presented to a CPU's execution unit. Thus, we say +that X is po-before Y (written as "X ->po Y" in formulas) if X occurs +before Y in the instruction stream. + +This is inherently a single-CPU relation; two instructions executing +on different CPUs are never linked by po. Also, it is by definition +an ordering so it cannot have any cycles. + +po-loc is a sub-relation of po. It links two memory accesses when the +first comes before the second in program order and they access the +same memory location (the "-loc" suffix). + +Although this may seem straightforward, there is one subtle aspect to +program order we need to explain. The LKMM was inspired by low-level +architectural memory models which describe the behavior of machine +code, and it retains their outlook to a considerable extent. The +read, write, and fence events used by the model are close in spirit to +individual machine instructions. Nevertheless, the LKMM describes +kernel code written in C, and the mapping from C to machine code can +be extremely complex. + +Optimizing compilers have great freedom in the way they translate +source code to object code. They are allowed to apply transformations +that add memory accesses, eliminate accesses, combine them, split them +into pieces, or move them around. Faced with all these possibilities, +the LKMM basically gives up. It insists that the code it analyzes +must contain no ordinary accesses to shared memory; all accesses must +be performed using READ_ONCE(), WRITE_ONCE(), or one of the other +atomic or synchronization primitives. These primitives prevent a +large number of compiler optimizations. In particular, it is +guaranteed that the compiler will not remove such accesses from the +generated code (unless it can prove the accesses will never be +executed), it will not change the order in which they occur in the +code (within limits imposed by the C standard), and it will not +introduce extraneous accesses. + +This explains why the MP and SB examples above used READ_ONCE() and +WRITE_ONCE() rather than ordinary memory accesses. Thanks to this +usage, we can be certain that in the MP example, P0's write event to +buf really is po-before its write event to flag, and similarly for the +other shared memory accesses in the examples. + +Private variables are not subject to this restriction. Since they are +not shared between CPUs, they can be accessed normally without +READ_ONCE() or WRITE_ONCE(), and there will be no ill effects. In +fact, they need not even be stored in normal memory at all -- in +principle a private variable could be stored in a CPU register (hence +the convention that these variables have names starting with the +letter 'r'). + + +A WARNING +--------- + +The protections provided by READ_ONCE(), WRITE_ONCE(), and others are +not perfect; and under some circumstances it is possible for the +compiler to undermine the memory model. Here is an example. Suppose +both branches of an "if" statement store the same value to the same +location: + + r1 = READ_ONCE(x); + if (r1) { + WRITE_ONCE(y, 2); + ... /* do something */ + } else { + WRITE_ONCE(y, 2); + ... /* do something else */ + } + +For this code, the LKMM predicts that the load from x will always be +executed before either of the stores to y. However, a compiler could +lift the stores out of the conditional, transforming the code into +something resembling: + + r1 = READ_ONCE(x); + WRITE_ONCE(y, 2); + if (r1) { + ... /* do something */ + } else { + ... /* do something else */ + } + +Given this version of the code, the LKMM would predict that the load +from x could be executed after the store to y. Thus, the memory +model's original prediction could be invalidated by the compiler. + +Another issue arises from the fact that in C, arguments to many +operators and function calls can be evaluated in any order. For +example: + + r1 = f(5) + g(6); + +The object code might call f(5) either before or after g(6); the +memory model cannot assume there is a fixed program order relation +between them. (In fact, if the functions are inlined then the +compiler might even interleave their object code.) + + +DEPENDENCY RELATIONS: data, addr, and ctrl +------------------------------------------ + +We say that two events are linked by a dependency relation when the +execution of the second event depends in some way on a value obtained +from memory by the first. The first event must be a read, and the +value it obtains must somehow affect what the second event does. +There are three kinds of dependencies: data, address (addr), and +control (ctrl). + +A read and a write event are linked by a data dependency if the value +obtained by the read affects the value stored by the write. As a very +simple example: + + int x, y; + + r1 = READ_ONCE(x); + WRITE_ONCE(y, r1 + 5); + +The value stored by the WRITE_ONCE obviously depends on the value +loaded by the READ_ONCE. Such dependencies can wind through +arbitrarily complicated computations, and a write can depend on the +values of multiple reads. + +A read event and another memory access event are linked by an address +dependency if the value obtained by the read affects the location +accessed by the other event. The second event can be either a read or +a write. Here's another simple example: + + int a[20]; + int i; + + r1 = READ_ONCE(i); + r2 = READ_ONCE(a[r1]); + +Here the location accessed by the second READ_ONCE() depends on the +index value loaded by the first. Pointer indirection also gives rise +to address dependencies, since the address of a location accessed +through a pointer will depend on the value read earlier from that +pointer. + +Finally, a read event and another memory access event are linked by a +control dependency if the value obtained by the read affects whether +the second event is executed at all. Simple example: + + int x, y; + + r1 = READ_ONCE(x); + if (r1) + WRITE_ONCE(y, 1984); + +Execution of the WRITE_ONCE() is controlled by a conditional expression +which depends on the value obtained by the READ_ONCE(); hence there is +a control dependency from the load to the store. + +It should be pretty obvious that events can only depend on reads that +come earlier in program order. Symbolically, if we have R ->data X, +R ->addr X, or R ->ctrl X (where R is a read event), then we must also +have R ->po X. It wouldn't make sense for a computation to depend +somehow on a value that doesn't get loaded from shared memory until +later in the code! + + +THE READS-FROM RELATION: rf, rfi, and rfe +----------------------------------------- + +The reads-from relation (rf) links a write event to a read event when +the value loaded by the read is the value that was stored by the +write. In colloquial terms, the load "reads from" the store. We +write W ->rf R to indicate that the load R reads from the store W. We +further distinguish the cases where the load and the store occur on +the same CPU (internal reads-from, or rfi) and where they occur on +different CPUs (external reads-from, or rfe). + +For our purposes, a memory location's initial value is treated as +though it had been written there by an imaginary initial store that +executes on a separate CPU before the program runs. + +Usage of the rf relation implicitly assumes that loads will always +read from a single store. It doesn't apply properly in the presence +of load-tearing, where a load obtains some of its bits from one store +and some of them from another store. Fortunately, use of READ_ONCE() +and WRITE_ONCE() will prevent load-tearing; it's not possible to have: + + int x = 0; + + P0() + { + WRITE_ONCE(x, 0x1234); + } + + P1() + { + int r1; + + r1 = READ_ONCE(x); + } + +and end up with r1 = 0x1200 (partly from x's initial value and partly +from the value stored by P0). + +On the other hand, load-tearing is unavoidable when mixed-size +accesses are used. Consider this example: + + union { + u32 w; + u16 h[2]; + } x; + + P0() + { + WRITE_ONCE(x.h[0], 0x1234); + WRITE_ONCE(x.h[1], 0x5678); + } + + P1() + { + int r1; + + r1 = READ_ONCE(x.w); + } + +If r1 = 0x56781234 (little-endian!) at the end, then P1 must have read +from both of P0's stores. It is possible to handle mixed-size and +unaligned accesses in a memory model, but the LKMM currently does not +attempt to do so. It requires all accesses to be properly aligned and +of the location's actual size. + + +CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe +------------------------------------------------------------------ + +Cache coherence is a general principle requiring that in a +multi-processor system, the CPUs must share a consistent view of the +memory contents. Specifically, it requires that for each location in +shared memory, the stores to that location must form a single global +ordering which all the CPUs agree on (the coherence order), and this +ordering must be consistent with the program order for accesses to +that location. + +To put it another way, for any variable x, the coherence order (co) of +the stores to x is simply the order in which the stores overwrite one +another. The imaginary store which establishes x's initial value +comes first in the coherence order; the store which directly +overwrites the initial value comes second; the store which overwrites +that value comes third, and so on. + +You can think of the coherence order as being the order in which the +stores reach x's location in memory (or if you prefer a more +hardware-centric view, the order in which the stores get written to +x's cache line). We write W ->co W' if W comes before W' in the +coherence order, that is, if the value stored by W gets overwritten, +directly or indirectly, by the value stored by W'. + +Coherence order is required to be consistent with program order. This +requirement takes the form of four coherency rules: |