Merge tag 'lkmm-scripting.2023.04.07a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull Linux Kernel Memory Model scripting updates from Paul McKenney: "This improves litmus-test documentation and improves the ability to do before/after tests on the https://github.com/paulmckrcu/litmus repo" * tag 'lkmm-scripting.2023.04.07a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) tools/memory-model: Remove out-of-date SRCU documentation tools/memory-model: Document LKMM test procedure tools/memory-model: Use "grep -E" instead of "egrep" tools/memory-model: Use "-unroll 0" to keep --hw runs finite tools/memory-model: Make judgelitmus.sh handle scripted Result: tag tools/memory-model: Add data-race capabilities to judgelitmus.sh tools/memory-model: Add checktheselitmus.sh to run specified litmus tests tools/memory-model: Repair parseargs.sh header comment tools/memory-model: Add "--" to parseargs.sh for additional arguments tools/memory-model: Make history-check scripts use mselect7 tools/memory-model: Make checkghlitmus.sh use mselect7 tools/memory-model: Fix scripting --jobs argument tools/memory-model: Implement --hw support for checkghlitmus.sh tools/memory-model: Add -v flag to jingle7 runs tools/memory-model: Make runlitmus.sh check for jingle errors tools/memory-model: Allow herd to deduce CPU type tools/memory-model: Keep assembly-language litmus tests tools/memory-model: Move from .AArch64.litmus.out to .litmus.AArch.out tools/memory-model: Make runlitmus.sh generate .litmus.out for --hw tools/memory-model: Split runlitmus.sh out of checklitmus.sh ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2023-04-24 12:02:25 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2023-04-24 12:02:25 -0700
commit: 60eb45074234b90333b6241b4fd8d196aa2dfd98 (patch)
tree: e4843847c30bf2e6e3176edaaff73439de706829 /tools/memory-model/Documentation
parent: 406037351e08dea03735178bf11046da85f00125 (diff)
parent: cc4a29819b0df9f3a2e7e0d5dee0830a3072d5aa (diff)
download: linux-60eb45074234b90333b6241b4fd8d196aa2dfd98.tar.gz
linux-60eb45074234b90333b6241b4fd8d196aa2dfd98.tar.bz2
linux-60eb45074234b90333b6241b4fd8d196aa2dfd98.zip
2 files changed, 299 insertions, 26 deletions
diff --git a/tools/memory-model/Documentation/litmus-tests.txt b/tools/memory-model/Documentation/litmus-tests.txt
index 26554b1c5575..acac527328a1 100644
--- a/tools/memory-model/Documentation/litmus-tests.txt
+++ b/tools/memory-model/Documentation/litmus-tests.txt
@@ -1028,32 +1028,7 @@ Limitations of the Linux-kernel memory model (LKMM) include:
 		additional call_rcu() process to the site of the
 		emulated rcu-barrier().
 
-	e.	Although sleepable RCU (SRCU) is now modeled, there
-		are some subtle differences between its semantics and
-		those in the Linux kernel.  For example, the kernel
-		might interpret the following sequence as two partially
-		overlapping SRCU read-side critical sections:
-
-			 1  r1 = srcu_read_lock(&my_srcu);
-			 2  do_something_1();
-			 3  r2 = srcu_read_lock(&my_srcu);
-			 4  do_something_2();
-			 5  srcu_read_unlock(&my_srcu, r1);
-			 6  do_something_3();
-			 7  srcu_read_unlock(&my_srcu, r2);
-
-		In contrast, LKMM will interpret this as a nested pair of
-		SRCU read-side critical sections, with the outer critical
-		section spanning lines 1-7 and the inner critical section
-		spanning lines 3-5.
-
-		This difference would be more of a concern had anyone
-		identified a reasonable use case for partially overlapping
-		SRCU read-side critical sections.  For more information
-		on the trickiness of such overlapping, please see:
-		https://paulmck.livejournal.com/40593.html
-
-	f.	Reader-writer locking is not modeled.  It can be
+	e.	Reader-writer locking is not modeled.  It can be
 		emulated in litmus tests using atomic read-modify-write
 		operations.
 
diff --git a/tools/memory-model/Documentation/locking.txt b/tools/memory-model/Documentation/locking.txt
new file mode 100644
index 000000000000..65c898c64a93
--- /dev/null
+++ b/tools/memory-model/Documentation/locking.txt
@@ -0,0 +1,298 @@
+Locking
+=======
+
+Locking is well-known and the common use cases are straightforward: Any
+CPU holding a given lock sees any changes previously seen or made by any
+CPU before it previously released that same lock.  This last sentence
+is the only part of this document that most developers will need to read.
+
+However, developers who would like to also access lock-protected shared
+variables outside of their corresponding locks should continue reading.
+
+
+Locking and Prior Accesses
+--------------------------
+
+The basic rule of locking is worth repeating:
+
+	Any CPU holding a given lock sees any changes previously seen
+	or made by any CPU before it previously released that same lock.
+
+Note that this statement is a bit stronger than "Any CPU holding a
+given lock sees all changes made by any CPU during the time that CPU was
+previously holding this same lock".  For example, consider the following
+pair of code fragments:
+
+	/* See MP+polocks.litmus. */
+	void CPU0(void)
+	{
+		WRITE_ONCE(x, 1);
+		spin_lock(&mylock);
+		WRITE_ONCE(y, 1);
+		spin_unlock(&mylock);
+	}
+
+	void CPU1(void)
+	{
+		spin_lock(&mylock);
+		r0 = READ_ONCE(y);
+		spin_unlock(&mylock);
+		r1 = READ_ONCE(x);
+	}
+
+The basic rule guarantees that if CPU0() acquires mylock before CPU1(),
+then both r0 and r1 must be set to the value 1.  This also has the
+consequence that if the final value of r0 is equal to 1, then the final
+value of r1 must also be equal to 1.  In contrast, the weaker rule would
+say nothing about the final value of r1.
+
+
+Locking and Subsequent Accesses
+-------------------------------
+
+The converse to the basic rule also holds:  Any CPU holding a given
+lock will not see any changes that will be made by any CPU after it
+subsequently acquires this same lock.  This converse statement is
+illustrated by the following litmus test:
+
+	/* See MP+porevlocks.litmus. */
+	void CPU0(void)
+	{
+		r0 = READ_ONCE(y);
+		spin_lock(&mylock);
+		r1 = READ_ONCE(x);
+		spin_unlock(&mylock);
+	}
+
+	void CPU1(void)
+	{
+		spin_lock(&mylock);
+		WRITE_ONCE(x, 1);
+		spin_unlock(&mylock);
+		WRITE_ONCE(y, 1);
+	}
+
+This converse to the basic rule guarantees that if CPU0() acquires
+mylock before CPU1(), then both r0 and r1 must be set to the value 0.
+This also has the consequence that if the final value of r1 is equal
+to 0, then the final value of r0 must also be equal to 0.  In contrast,
+the weaker rule would say nothing about the final value of r0.
+
+These examples show only a single pair of CPUs, but the effects of the
+locking basic rule extend across multiple acquisitions of a given lock
+across multiple CPUs.
+
+
+Double-Checked Locking
+----------------------
+
+It is well known that more than just a lock is required to make
+double-checked locking work correctly,  This litmus test illustrates
+one incorrect approach:
+
+	/* See Documentation/litmus-tests/locking/DCL-broken.litmus. */
+	void CPU0(void)
+	{
+		r0 = READ_ONCE(flag);
+		if (r0 == 0) {
+			spin_lock(&lck);
+			r1 = READ_ONCE(flag);
+			if (r1 == 0) {
+				WRITE_ONCE(data, 1);
+				WRITE_ONCE(flag, 1);
+			}
+			spin_unlock(&lck);
+		}
+		r2 = READ_ONCE(data);
+	}
+	/* CPU1() is the exactly the same as CPU0(). */
+
+There are two problems.  First, there is no ordering between the first
+READ_ONCE() of "flag" and the READ_ONCE() of "data".  Second, there is
+no ordering between the two WRITE_ONCE() calls.  It should therefore be
+no surprise that "r2" can be zero, and a quick herd7 run confirms this.
+
+One way to fix this is to use smp_load_acquire() and smp_store_release()
+as shown in this corrected version:
+
+	/* See Documentation/litmus-tests/locking/DCL-fixed.litmus. */
+	void CPU0(void)
+	{
+		r0 = smp_load_acquire(&flag);
+		if (r0 == 0) {
+			spin_lock(&lck);
+			r1 = READ_ONCE(flag);
+			if (r1 == 0) {
+				WRITE_ONCE(data, 1);
+				smp_store_release(&flag, 1);
+			}
+			spin_unlock(&lck);
+		}
+		r2 = READ_ONCE(data);
+	}
+	/* CPU1() is the exactly the same as CPU0(). */
+
+The smp_load_acquire() guarantees that its load from "flags" will
+be ordered before the READ_ONCE() from data, thus solving the first
+problem.  The smp_store_release() guarantees that its store will be
+ordered after the WRITE_ONCE() to "data", solving the second problem.
+The smp_store_release() pairs with the smp_load_acquire(), thus ensuring
+that the ordering provided by each actually takes effect.  Again, a
+quick herd7 run confirms this.
+
+In short, if you access a lock-protected variable without holding the
+corresponding lock, you will need to provide additional ordering, in
+this case, via the smp_load_acquire() and the smp_store_release().
+
+
+Ordering Provided by a Lock to CPUs Not Holding That Lock
+---------------------------------------------------------
+
+It is not necessarily the case that accesses ordered by locking will be
+seen as ordered by CPUs not holding that lock.  Consider this example:
+
+	/* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
+	void CPU0(void)
+	{
+		spin_lock(&mylock);
+		WRITE_ONCE(x, 1);
+		WRITE_ONCE(y, 1);
+		spin_unlock(&mylock);
+	}
+
+	void CPU1(void)
+	{
+		spin_lock(&mylock);
+		r0 = READ_ONCE(y);
+		WRITE_ONCE(z, 1);
+		spin_unlock(&mylock);
+	}
+
+	void CPU2(void)
+	{
+		WRITE_ONCE(z, 2);
+		smp_mb();
+		r1 = READ_ONCE(x);
+	}
+
+Counter-intuitive though it might be, it is quite possible to have
+the final value of r0 be 1, the final value of z be 2, and the final
+value of r1 be 0.  The reason for this surprising outcome is that CPU2()
+never acquired the lock, and thus did not fully benefit from the lock's
+ordering properties.
+
+Ordering can be extended to CPUs not holding the lock by careful use
+of smp_mb__after_spinlock():
+
+	/* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
+	void CPU0(void)
+	{
+		spin_lock(&mylock);
+		WRITE_ONCE(x, 1);
+		WRITE_ONCE(y, 1);
+		spin_unlock(&mylock);
+	}
+
+	void CPU1(void)
+	{
+		spin_lock(&mylock);
+		smp_mb__after_spinlock();
+		r0 = READ_ONCE(y);
+		WRITE_ONCE(z, 1);
+		spin_unlock(&mylock);
+	}
+
+	void CPU2(void)
+	{
+		WRITE_ONCE(z, 2);
+		smp_mb();
+		r1 = READ_ONCE(x);
+	}
+
+This addition of smp_mb__after_spinlock() strengthens the lock
+acquisition sufficiently to rule out the counter-intuitive outcome.
+In other words, the addition of the smp_mb__after_spinlock() prohibits
+the counter-intuitive result where the final value of r0 is 1, the final
+value of z is 2, and the final value of r1 is 0.
+
+
+No Roach-Motel Locking!
+-----------------------
+
+This example requires familiarity with the herd7 "filter" clause, so
+please read up on that topic in litmus-tests.txt.
+
+It is tempting to allow memory-reference instructions to be pulled
+into a critical section, but this cannot be allowed in the general case.
+For example, consider a spin loop preceding a lock-based critical section.
+Now, herd7 does not model spin loops, but we can emulate one with two
+loads, with a "filter" clause to constrain the first to return the
+initial value and the second to return the updated value, as shown below:
+
+	/* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
+	void CPU0(void)
+	{
+		spin_lock(&lck);
+		r2 = atomic_inc_return(&y);
+		WRITE_ONCE(x, 1);
+		spin_unlock(&lck);
+	}
+
+	void CPU1(void)
+	{
+		r0 = READ_ONCE(x);
+		r1 = READ_ONCE(x);
+		spin_lock(&lck);
+		r2 = atomic_inc_return(&y);
+		spin_unlock(&lck);
+	}
+
+	filter (1:r0=0 /\ 1:r1=1)
+	exists (1:r2=1)
+
+The variable "x" is the control variable for the emulated spin loop.
+CPU0() sets it to "1" while holding the lock, and CPU1() emulates the
+spin loop by reading it twice, first into "1:r0" (which should get the
+initial value "0") and then into "1:r1" (which should get the updated
+value "1").
+
+The "filter" clause takes this into account, constraining "1:r0" to
+equal "0" and "1:r1" to equal 1.
+
+Then the "exists" clause checks to see if CPU1() acquired its lock first,
+which should not happen given the filter clause because CPU0() updates
+"x" while holding the lock.  And herd7 confirms this.
+
+But suppose that the compiler was permitted to reorder the spin loop
+into CPU1()'s critical section, like this:
+
+	/* See Documentation/litmus-tests/locking/RM-broken.litmus. */
+	void CPU0(void)
+	{
+		int r2;
+
+		spin_lock(&lck);
+		r2 = atomic_inc_return(&y);
+		WRITE_ONCE(x, 1);
+		spin_unlock(&lck);
+	}
+
+	void CPU1(void)
+	{
+		spin_lock(&lck);
+		r0 = READ_ONCE(x);
+		r1 = READ_ONCE(x);
+		r2 = atomic_inc_return(&y);
+		spin_unlock(&lck);
+	}
+
+	filter (1:r0=0 /\ 1:r1=1)
+	exists (1:r2=1)
+
+If "1:r0" is equal to "0", "1:r1" can never equal "1" because CPU0()
+cannot update "x" while CPU1() holds the lock.  And herd7 confirms this,
+showing zero executions matching the "filter" criteria.
+
+And this is why Linux-kernel lock and unlock primitives must prevent
+code from entering critical sections.  It is not sufficient to only
+prevent code from leaving them.
author	Linus Torvalds <torvalds@linux-foundation.org>	2023-04-24 12:02:25 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2023-04-24 12:02:25 -0700
commit	60eb45074234b90333b6241b4fd8d196aa2dfd98 (patch)
tree	e4843847c30bf2e6e3176edaaff73439de706829 /tools/memory-model/Documentation
parent	406037351e08dea03735178bf11046da85f00125 (diff)
parent	cc4a29819b0df9f3a2e7e0d5dee0830a3072d5aa (diff)
download	linux-60eb45074234b90333b6241b4fd8d196aa2dfd98.tar.gz linux-60eb45074234b90333b6241b4fd8d196aa2dfd98.tar.bz2 linux-60eb45074234b90333b6241b4fd8d196aa2dfd98.zip