diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 12:04:41 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 12:04:41 -0800 |
commit | d089f48fba28db14d0fe7753248f2575a9ddfc73 (patch) | |
tree | a3821c02dd38342193459e41ba453c058f75e3d2 | |
parent | 3f6ec19f2d05d800bbc42d95dece433da7697864 (diff) | |
parent | 2b392cb11c0db645ba81a08b6a2e96c56ec1fc64 (diff) | |
download | linux-d089f48fba28db14d0fe7753248f2575a9ddfc73.tar.gz linux-d089f48fba28db14d0fe7753248f2575a9ddfc73.tar.bz2 linux-d089f48fba28db14d0fe7753248f2575a9ddfc73.zip |
Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"These are the latest RCU updates for v5.12:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide
allocator return addresses to more easily locate bugs. This has a
couple of RCU-related commits, but is mostly MM. Was pulled in with
akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks, which enables
better debugging information and smarter reactions to large numbers
of callbacks.
- The first round of changes to allow CPUs to be runtime switched
from and to callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a
"torture everything" script that runs rcutorture, locktorture,
scftorture, rcuscale, and refscale. Plus does an allmodconfig
build.
- nolibc fixes for the torture tests"
* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
mm: Make mem_obj_dump() vmalloc() dumps include start and length
mm: Make mem_dump_obj() handle vmalloc() memory
mm: Make mem_dump_obj() handle NULL and zero-sized pointers
mm: Add mem_dump_obj() to print source of memory block
tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
tools/nolibc: Fix position of -lgcc in the documented example
tools/nolibc: Emit detailed error for missing alternate syscall number definitions
tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
tools/nolibc: Get timeval, timespec and timezone from linux/time.h
tools/nolibc: Implement poll() based on ppoll()
tools/nolibc: Implement fork() based on clone()
tools/nolibc: Make getpgrp() fall back to getpgid(0)
tools/nolibc: Make dup2() rely on dup3() when available
tools/nolibc: Add the definition for dup()
rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
torture: Maintain torture-specific set of CPUs-online books
torture: Clean up after torture-test CPU hotplugging
rcutorture: Make object_debug also double call_rcu() heap object
...
63 files changed, 3108 insertions, 763 deletions
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst index 72f0f6fbd53c..6f89cf1e567d 100644 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst @@ -38,7 +38,7 @@ sections. RCU-preempt Expedited Grace Periods =================================== -``CONFIG_PREEMPT=y`` kernels implement RCU-preempt. +``CONFIG_PREEMPTION=y`` kernels implement RCU-preempt. The overall flow of the handling of a given CPU by an RCU-preempt expedited grace period is shown in the following diagram: @@ -112,7 +112,7 @@ things. RCU-sched Expedited Grace Periods --------------------------------- -``CONFIG_PREEMPT=n`` kernels implement RCU-sched. The overall flow of +``CONFIG_PREEMPTION=n`` kernels implement RCU-sched. The overall flow of the handling of a given CPU by an RCU-sched expedited grace period is shown in the following diagram: diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index d4c9a016074b..38a39476fc24 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -72,13 +72,13 @@ understanding of this guarantee. RCU's grace-period guarantee allows updaters to wait for the completion of all pre-existing RCU read-side critical sections. An RCU read-side -critical section begins with the marker ``rcu_read_lock()`` and ends -with the marker ``rcu_read_unlock()``. These markers may be nested, and +critical section begins with the marker rcu_read_lock() and ends +with the marker rcu_read_unlock(). These markers may be nested, and RCU treats a nested set as one big RCU read-side critical section. -Production-quality implementations of ``rcu_read_lock()`` and -``rcu_read_unlock()`` are extremely lightweight, and in fact have +Production-quality implementations of rcu_read_lock() and +rcu_read_unlock() are extremely lightweight, and in fact have exactly zero overhead in Linux kernels built for production use with -``CONFIG_PREEMPT=n``. +``CONFIG_PREEMPTION=n``. This guarantee allows ordering to be enforced with extremely low overhead to readers, for example: @@ -102,12 +102,12 @@ overhead to readers, for example: 15 WRITE_ONCE(y, 1); 16 } -Because the ``synchronize_rcu()`` on line 14 waits for all pre-existing -readers, any instance of ``thread0()`` that loads a value of zero from -``x`` must complete before ``thread1()`` stores to ``y``, so that +Because the synchronize_rcu() on line 14 waits for all pre-existing +readers, any instance of thread0() that loads a value of zero from +``x`` must complete before thread1() stores to ``y``, so that instance must also load a value of zero from ``y``. Similarly, any -instance of ``thread0()`` that loads a value of one from ``y`` must have -started after the ``synchronize_rcu()`` started, and must therefore also +instance of thread0() that loads a value of one from ``y`` must have +started after the synchronize_rcu() started, and must therefore also load a value of one from ``x``. Therefore, the outcome: :: @@ -121,14 +121,14 @@ cannot happen. +-----------------------------------------------------------------------+ | Wait a minute! You said that updaters can make useful forward | | progress concurrently with readers, but pre-existing readers will | -| block ``synchronize_rcu()``!!! | +| block synchronize_rcu()!!! | | Just who are you trying to fool??? | +-----------------------------------------------------------------------+ | **Answer**: | +-----------------------------------------------------------------------+ | First, if updaters do not wish to be blocked by readers, they can use | -| ``call_rcu()`` or ``kfree_rcu()``, which will be discussed later. | -| Second, even when using ``synchronize_rcu()``, the other update-side | +| call_rcu() or kfree_rcu(), which will be discussed later. | +| Second, even when using synchronize_rcu(), the other update-side | | code does run concurrently with readers, whether pre-existing or not. | +-----------------------------------------------------------------------+ @@ -170,34 +170,34 @@ recovery from node failure, more or less as follows: 29 WRITE_ONCE(state, STATE_NORMAL); 30 } -The RCU read-side critical section in ``do_something_dlm()`` works with -the ``synchronize_rcu()`` in ``start_recovery()`` to guarantee that -``do_something()`` never runs concurrently with ``recovery()``, but with -little or no synchronization overhead in ``do_something_dlm()``. +The RCU read-side critical section in do_something_dlm() works with +the synchronize_rcu() in start_recovery() to guarantee that +do_something() never runs concurrently with recovery(), but with +little or no synchronization overhead in do_something_dlm(). +-----------------------------------------------------------------------+ | **Quick Quiz**: | +-----------------------------------------------------------------------+ -| Why is the ``synchronize_rcu()`` on line 28 needed? | +| Why is the synchronize_rcu() on line 28 needed? | +-----------------------------------------------------------------------+ | **Answer**: | +-----------------------------------------------------------------------+ | Without that extra grace period, memory reordering could result in | -| ``do_something_dlm()`` executing ``do_something()`` concurrently with | -| the last bits of ``recovery()``. | +| do_something_dlm() executing do_something() concurrently with | +| the last bits of recovery(). | +-----------------------------------------------------------------------+ In order to avoid fatal problems such as deadlocks, an RCU read-side -critical section must not contain calls to ``synchronize_rcu()``. +critical section must not contain calls to synchronize_rcu(). Similarly, an RCU read-side critical section must not contain anything that waits, directly or indirectly, on completion of an invocation of -``synchronize_rcu()``. +synchronize_rcu(). Although RCU's grace-period guarantee is useful in and of itself, with `quite a few use cases <https://lwn.net/Articles/573497/>`__, it would be good to be able to use RCU to coordinate read-side access to linked data structures. For this, the grace-period guarantee is not sufficient, -as can be seen in function ``add_gp_buggy()`` below. We will look at the +as can be seen in function add_gp_buggy() below. We will look at the reader's code later, but in the meantime, just think of the reader as locklessly picking up the ``gp`` pointer, and, if the value loaded is non-\ ``NULL``, locklessly accessing the ``->a`` and ``->b`` fields. @@ -256,8 +256,8 @@ Publish/Subscribe Guarantee RCU's publish-subscribe guarantee allows data to be inserted into a linked data structure without disrupting RCU readers. The updater uses -``rcu_assign_pointer()`` to insert the new data, and readers use -``rcu_dereference()`` to access data, whether new or old. The following +rcu_assign_pointer() to insert the new data, and readers use +rcu_dereference() to access data, whether new or old. The following shows an example of insertion: :: @@ -279,7 +279,7 @@ shows an example of insertion: 15 return true; 16 } -The ``rcu_assign_pointer()`` on line 13 is conceptually equivalent to a +The rcu_assign_pointer() on line 13 is conceptually equivalent to a simple assignment statement, but also guarantees that its assignment will happen after the two assignments in lines 11 and 12, similar to the C11 ``memory_order_release`` store operation. It also prevents any @@ -289,7 +289,7 @@ number of “interesting” compiler optimizations, for example, the use of +-----------------------------------------------------------------------+ | **Quick Quiz**: | +-----------------------------------------------------------------------+ -| But ``rcu_assign_pointer()`` does nothing to prevent the two | +| But rcu_assign_pointer() does nothing to prevent the two | | assignments to ``p->a`` and ``p->b`` from being reordered. Can't that | | also cause problems? | +-----------------------------------------------------------------------+ @@ -303,7 +303,7 @@ number of “interesting” compiler optimizations, for example, the use of It is tempting to assume that the reader need not do anything special to control its accesses to the RCU-protected data, as shown in -``do_something_gp_buggy()`` below: +do_something_gp_buggy() below: :: @@ -321,11 +321,10 @@ control its accesses to the RCU-protected data, as shown in 12 } However, this temptation must be resisted because there are a -surprisingly large number of ways that the compiler (to say nothing of -`DEC Alpha CPUs <https://h71000.www7.hp.com/wizard/wiz_2637.html>`__) -can trip this code up. For but one example, if the compiler were short -of registers, it might choose to refetch from ``gp`` rather than keeping -a separate copy in ``p`` as follows: +surprisingly large number of ways that the compiler (or weak ordering +CPUs like the DEC Alpha) can trip this code up. For but one example, if +the compiler were short of registers, it might choose to refetch from +``gp`` rather than keeping a separate copy in ``p`` as follows: :: @@ -345,7 +344,7 @@ If this function ran concurrently with a series of updates that replaced the current structure with a new one, the fetches of ``gp->a`` and ``gp->b`` might well come from two different structures, which could cause serious confusion. To prevent this (and much else besides), -``do_something_gp()`` uses ``rcu_dereference()`` to fetch from ``gp``: +do_something_gp() uses rcu_dereference() to fetch from ``gp``: :: @@ -362,21 +361,21 @@ cause serious confusion. To prevent this (and much else besides), 11 return false; 12 } -The ``rcu_dereference()`` uses volatile casts and (for DEC Alpha) memory +The rcu_dereference() uses volatile casts and (for DEC Alpha) memory barriers in the Linux kernel. Should a `high-quality implementation of C11 ``memory_order_consume`` [PDF] <http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf>`__ -ever appear, then ``rcu_dereference()`` could be implemented as a +ever appear, then rcu_dereference() could be implemented as a ``memory_order_consume`` load. Regardless of the exact implementation, a -pointer fetched by ``rcu_dereference()`` may not be used outside of the +pointer fetched by rcu_dereference() may not be used outside of the outermost RCU read-side critical section containing that -``rcu_dereference()``, unless protection of the corresponding data +rcu_dereference(), unless protection of the corresponding data element has been passed from RCU to some other synchronization mechanism, most commonly locking or `reference counting <https://www.kernel.org/doc/Documentation/RCU/rcuref.txt>`__. -In short, updaters use ``rcu_assign_pointer()`` and readers use -``rcu_dereference()``, and these two RCU API elements work together to +In short, updaters use rcu_assign_pointer() and readers use +rcu_dereference(), and these two RCU API elements work together to ensure that readers have a consistent view of newly added data elements. Of course, it is also necessary to remove elements from RCU-protected @@ -388,9 +387,9 @@ data structures, for example, using the following process: the newly removed data element). #. At this point, only the updater has a reference to the newly removed data element, so it can safely reclaim the data element, for example, - by passing it to ``kfree()``. + by passing it to kfree(). -This process is implemented by ``remove_gp_synchronous()``: +This process is implemented by remove_gp_synchronous(): :: @@ -413,16 +412,16 @@ This process is implemented by ``remove_gp_synchronous()``: This function is straightforward, with line 13 waiting for a grace period before line 14 frees the old data element. This waiting ensures -that readers will reach line 7 of ``do_something_gp()`` before the data -element referenced by ``p`` is freed. The ``rcu_access_pointer()`` on -line 6 is similar to ``rcu_dereference()``, except that: +that readers will reach line 7 of do_something_gp() before the data +element referenced by ``p`` is freed. The rcu_access_pointer() on +line 6 is similar to rcu_dereference(), except that: -#. The value returned by ``rcu_access_pointer()`` cannot be +#. The value returned by rcu_access_pointer() cannot be dereferenced. If you want to access the value pointed to as well as - the pointer itself, use ``rcu_dereference()`` instead of - ``rcu_access_pointer()``. -#. The call to ``rcu_access_pointer()`` need not be protected. In - contrast, ``rcu_dereference()`` must either be within an RCU + the pointer itself, use rcu_dereference() instead of + rcu_access_pointer(). +#. The call to rcu_access_pointer() need not be protected. In + contrast, rcu_dereference() must either be within an RCU read-side critical section or in a code segment where the pointer cannot change, for example, in code protected by the corresponding update-side lock. @@ -430,13 +429,13 @@ line 6 is similar to ``rcu_dereference()``, except that: +-----------------------------------------------------------------------+ | **Quick Quiz**: | +-----------------------------------------------------------------------+ -| Without the ``rcu_dereference()`` or the ``rcu_access_pointer()``, | +| Without the rcu_dereference() or the rcu_access_pointer(), | | what destructive optimizations might the compiler make use of? | +-----------------------------------------------------------------------+ | **Answer**: | +-----------------------------------------------------------------------+ -| Let's start with what happens to ``do_something_gp()`` if it fails to | -| use ``rcu_dereference()``. It could reuse a value formerly fetched | +| Let's start with what happens to do_something_gp() if it fails to | +| use rcu_dereference(). It could reuse a value formerly fetched | | from this same pointer. It could also fetch the pointer from ``gp`` | | in a byte-at-a-time manner, resulting in *load tearing*, in turn | | resulting a bytewise mash-up of two distinct pointer values. It might | @@ -445,15 +444,15 @@ line 6 is similar to ``rcu_dereference()``, except that: | update has changed the pointer to match the wrong guess. Too bad | | about any dereferences that returned pre-initialization garbage in | | the meantime! | -| For ``remove_gp_synchronous()``, as long as all modifications to | +| For remove_gp_synchronous(), as long as all modifications to | | ``gp`` are carried out while holding ``gp_lock``, the above | | optimizations are harmless. However, ``sparse`` will complain if you | | define ``gp`` with ``__rcu`` and then access it without using either | -| ``rcu_access_pointer()`` or ``rcu_dereference()``. | +| rcu_access_pointer() or rcu_dereference(). | +-----------------------------------------------------------------------+ In short, RCU's publish-subscribe guarantee is provided by the -combination of ``rcu_assign_pointer()`` and ``rcu_dereference()``. This +combination of rcu_assign_pointer() and rcu_dereference(). This guarantee allows data elements to be safely added to RCU-protected linked data structures without disrupting RCU readers. This guarantee can be used in combination with the grace-period guarantee to also allow @@ -462,9 +461,9 @@ again without disrupting RCU readers. This guarantee was only partially premeditated. DYNIX/ptx used an explicit memory barrier for publication, but had nothing resembling -``rcu_dereference()`` for subscription, nor did it have anything +rcu_dereference() for subscription, nor did it have anything resembling the dependency-ordering barrier that was later subsumed -into ``rcu_dereference()`` and later still into ``READ_ONCE()``. The +into rcu_dereference() and later still into READ_ONCE(). The need for these operations made itself known quite suddenly at a late-1990s meeting with the DEC Alpha architects, back in the days when DEC was still a free-standing company. It took the Alpha architects a @@ -474,7 +473,7 @@ documentation did not make this point clear. More recent work with the C and C++ standards committees have provided much education on tricks and traps from the compiler. In short, compilers were much less tricky in the early 1990s, but in 2015, don't even think about omitting -``rcu_dereference()``! +rcu_dereference()! Memory-Barrier Guarantees ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -484,31 +483,31 @@ demonstrates the need for RCU's stringent memory-ordering guarantees on systems with more than one CPU: #. Each CPU that has an RCU read-side critical section that begins - before ``synchronize_rcu()`` starts is guaranteed to execute a full + before synchronize_rcu() starts is guaranteed to execute a full memory barrier between the time that the RCU read-side critical - section ends and the time that ``synchronize_rcu()`` returns. Without + section ends and the time that synchronize_rcu() returns. Without this guarantee, a pre-existing RCU read-side critical section might hold a reference to the newly removed ``struct foo`` after the - ``kfree()`` on line 14 of ``remove_gp_synchronous()``. + kfree() on line 14 of remove_gp_synchronous(). #. Each CPU that has an RCU read-side critical section that ends after - ``synchronize_rcu()`` returns is guaranteed to execute a full memory - barrier between the time that ``synchronize_rcu()`` begins and the + synchronize_rcu() returns is guaranteed to execute a full memory + barrier between the time that synchronize_rcu() begins and the time that the RCU read-side critical section begins. Without this guarantee, a later RCU read-side critical section running after the - ``kfree()`` on line 14 of ``remove_gp_synchronous()`` might later run - ``do_something_gp()`` and find the newly deleted ``struct foo |