linux.git/scripts/atomic/gen-atomic-fallback.sh, branch v6.18.21

locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback

2023-10-09T16:14:15+00:00

Provide the generic sync_try_cmpxchg() function from the
raw_ prefixed version, also adding explicit instrumentation.

The patch amends existing scripts to generate sync_try_cmpxchg()
locking primitive and its raw_sync_try_cmpxchg() fallback, while
leaving existing macros from the try_cmpxchg() family unchanged.

The target can define its own arch_sync_try_cmpxchg() to override the
generic version of raw_sync_try_cmpxchg(). This allows the target
to generate more optimal assembly than the generic version.

Additionally, the patch renames two scripts to better reflect
whet they really do.

Signed-off-by: Uros Bizjak 
Signed-off-by: Ingo Molnar 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Mark Rutland 
Cc: Linus Torvalds 
Cc: linux-kernel@vger.kernel.org

locking/atomic: scripts: fix fallback ifdeffery

2023-09-20T07:39:03+00:00

Since commit:

  9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")

The ordering fallbacks for atomic*_read_acquire() and
atomic*_set_release() erroneously fall back to the implictly relaxed
atomic*_read() and atomic*_set() variants respectively, without any
additional barriers. This loses the ACQUIRE and RELEASE ordering
semantics, which can result in a wide variety of problems, even on
strongly-ordered architectures where the implementation of
atomic*_read() and/or atomic*_set() allows the compiler to reorder those
relative to other accesses.

In practice this has been observed to break bit spinlocks on arm64,
resulting in dentry cache corruption.

The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to
be defined in terms of FULL ops, but where an op had RELAXED ordering by
default, this unintentionally permitted the ACQUIRE/RELEASE ops to be
defined in terms of the implicitly RELAXED default.

This patch corrects the logic to avoid falling back to implicitly
RELAXED ops, resulting in the same behaviour as prior to commit
9257959a6e5b4fca.

I've verified the resulting assembly on arm64 by generating outlined
wrappers of the atomics. Prior to this patch the compiler generates
sequences using relaxed load (LDR) and store (STR) instructions, e.g.

| :
|         ldr     x0, [x0]
|         ret
|
| :
|         str     x1, [x0]
|         ret

With this patch applied the compiler generates sequences using the
intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.

| :
|         ldar    x0, [x0]
|         ret
|
| :
|         stlr    x1, [x0]
|         ret

To make sure that there were no other victims of the ifdeffery rewrite,
I generated outlined copies of all of the {atomic,atomic64,atomic_long}
atomic operations before and after commit 9257959a6e5b4fca. A diff of
the generated assembly on arm64 shows that only the read_acquire() and
set_release() operations were changed, and only lost their intended
ordering:

| [mark@lakrids:~/src/linux]% diff -u \
| 	<(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o)
| 	<(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o)
| --- /proc/self/fd/11    2023-09-19 16:51:51.114779415 +0100
| +++ /proc/self/fd/16    2023-09-19 16:51:51.114779415 +0100
| @@ -1,5 +1,5 @@
|
| -before-9257959a6e5b4fca.o:     file format elf64-littleaarch64
| +after-9257959a6e5b4fca.o:     file format elf64-littleaarch64
|
|
|  Disassembly of section .text:
| @@ -9,7 +9,7 @@
|         4:      d65f03c0        ret
|
|  0000000000000008 :
| -       8:      88dffc00        ldar    w0, [x0]
| +       8:      b9400000        ldr     w0, [x0]
|         c:      d65f03c0        ret
|
|  0000000000000010 :
| @@ -17,7 +17,7 @@
|        14:      d65f03c0        ret
|
|  0000000000000018 :
| -      18:      889ffc01        stlr    w1, [x0]
| +      18:      b9000001        str     w1, [x0]
|        1c:      d65f03c0        ret
|
|  0000000000000020 :
| @@ -1230,7 +1230,7 @@
|      1070:      d65f03c0        ret
|
|  0000000000001074 :
| -    1074:      c8dffc00        ldar    x0, [x0]
| +    1074:      f9400000        ldr     x0, [x0]
|      1078:      d65f03c0        ret
|
|  000000000000107c :
| @@ -1238,7 +1238,7 @@
|      1080:      d65f03c0        ret
|
|  0000000000001084 :
| -    1084:      c89ffc01        stlr    x1, [x0]
| +    1084:      f9000001        str     x1, [x0]
|      1088:      d65f03c0        ret
|
|  000000000000108c :
| @@ -2427,7 +2427,7 @@
|      207c:      d65f03c0        ret
|
|  0000000000002080 :
| -    2080:      c8dffc00        ldar    x0, [x0]
| +    2080:      f9400000        ldr     x0, [x0]
|      2084:      d65f03c0        ret
|
|  0000000000002088 :
| @@ -2435,7 +2435,7 @@
|      208c:      d65f03c0        ret
|
|  0000000000002090 :
| -    2090:      c89ffc01        stlr    x1, [x0]
| +    2090:      f9000001        str     x1, [x0]
|      2094:      d65f03c0        ret
|
|  0000000000002098 :

I've build tested this with a variety of configs for alpha, arm, arm64,
csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv,
s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I
was unable to build test for ia64 and parisc due to existing build
breakage in v6.6-rc2.

Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
Reported-by: Ming Lei 
Reported-by: Darrick J. Wong 
Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Baokun Li 
Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com

locking/atomic: scripts: generate kerneldoc comments

2023-06-05T07:57:23+00:00

Currently the atomics are documented in Documentation/atomic_t.txt, and
have no kerneldoc comments. There are a sufficient number of gotchas
(e.g. semantics, noinstr-safety) that it would be nice to have comments
to call these out, and it would be nice to have kerneldoc comments such
that these can be collated.

While it's possible to derive the semantics from the code, this can be
painful given the amount of indirection we currently have (e.g. fallback
paths), and it's easy to be mislead by naming, e.g.

* The unconditional void-returning ops *only* have relaxed variants
  without a _relaxed suffix, and can easily be mistaken for being fully
  ordered.

  It would be nice to give these a _relaxed() suffix, but this would
  result in significant churn throughout the kernel.

* Our naming of conditional and unconditional+test ops is rather
  inconsistent, and it can be difficult to derive the name of an
  operation, or to identify where an op is conditional or
  unconditional+test.

  Some ops are clearly conditional:
  - dec_if_positive
  - add_unless
  - dec_unless_positive
  - inc_unless_negative

  Some ops are clearly unconditional+test:
  - sub_and_test
  - dec_and_test
  - inc_and_test

  However, what exactly those test is not obvious. A _test_zero suffix
  might be clearer.

  Others could be read ambiguously:
  - inc_not_zero	// conditional
  - add_negative	// unconditional+test

  It would probably be worth renaming these, e.g. to inc_unless_zero and
  add_test_negative.

As a step towards making this more consistent and easier to understand,
this patch adds kerneldoc comments for all generated *atomic*_*()
functions. These are generated from templates, with some common text
shared, making it easy to extend these in future if necessary.

I've tried to make these as consistent and clear as possible, and I've
deliberately ensured:

* All ops have their ordering explicitly mentioned in the short and long
  description.

* All test ops have "test" in their short description.

* All ops are described as an expression using their usual C operator.
  For example:

  andnot: "Atomically updates @v to (@v & ~@i)"
  inc:    "Atomically updates @v to (@v + 1)"

  Which may be clearer to non-naative English speakers, and allows all
  the operations to be described in the same style.

* All conditional ops have their condition described as an expression
  using the usual C operators. For example:

  add_unless: "If (@v != @u), atomically updates @v to (@v + @i)"
  cmpxchg:    "If (@v == @old), atomically updates @v to @new"

  Which may be clearer to non-naative English speakers, and allows all
  the operations to be described in the same style.

* All bitwise ops (and,andnot,or,xor) explicitly mention that they are
  bitwise in their short description, so that they are not mistaken for
  performing their logical equivalents.

* The noinstr safety of each op is explicitly described, with a
  description of whether or not to use the raw_ form of the op.

There should be no functional change as a result of this patch.

Reported-by: Paul E. McKenney 
Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20230605070124.3741859-26-mark.rutland@arm.com

locking/atomic: scripts: simplify raw_atomic*() definitions

2023-06-05T07:57:22+00:00

Currently each ordering variant has several potential definitions,
with a mixture of preprocessor and C definitions, including several
copies of its C prototype, e.g.

| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
|       int ret = arch_atomic_fetch_andnot_relaxed(i, v);
|       __atomic_acquire_fence();
|       return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
|       return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif

Make this a bit simpler by defining the C prototype once, and writing
the various potential definitions as plain C code guarded by ifdeffery.
For example, the above becomes:

| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| #if defined(arch_atomic_fetch_andnot_acquire)
|         return arch_atomic_fetch_andnot_acquire(i, v);
| #elif defined(arch_atomic_fetch_andnot_relaxed)
|         int ret = arch_atomic_fetch_andnot_relaxed(i, v);
|         __atomic_acquire_fence();
|         return ret;
| #elif defined(arch_atomic_fetch_andnot)
|         return arch_atomic_fetch_andnot(i, v);
| #else
|         return raw_atomic_fetch_and_acquire(~i, v);
| #endif
| }

Which is far easier to read. As we now always have a single copy of the
C prototype wrapping all the potential definitions, we now have an
obvious single location for kerneldoc comments.

At the same time, the fallbacks for raw_atomic*_xhcg() are made to use
'new' rather than 'i' as the name of the new value. This is what the
existing fallback template used, and is more consistent with the
raw_atomic{_try,}cmpxchg() fallbacks.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20230605070124.3741859-24-mark.rutland@arm.com

locking/atomic: scripts: restructure fallback ifdeffery

2023-06-05T07:57:21+00:00

Currently the various ordering variants of an atomic operation are
defined in groups of full/acquire/release/relaxed ordering variants with
some shared ifdeffery and several potential definitions of each ordering
variant in different branches of the shared ifdeffery.

As an ordering variant can have several potential definitions down
different branches of the shared ifdeffery, it can be painful for a
human to find a relevant definition, and we don't have a good location
to place anything common to all definitions of an ordering variant (e.g.
kerneldoc).

Historically the grouping of full/acquire/release/relaxed ordering
variants was necessary as we filled in the missing atomics in the same
namespace as the architecture used. It would be easy to accidentally
define one ordering fallback in terms of another ordering fallback with
redundant barriers, and avoiding that would otherwise require a lot of
baroque ifdeffery.

With recent changes we no longer need to fill in the missing atomics in
the arch_atomic*_() namespace, and only need to fill in the
raw_atomic*_() namespace. Due to this, there's no risk of a
namespace collision, and we can define each raw_atomic*_ ordering
variant with its own ifdeffery checking for the arch_atomic*_
ordering variants.

Restructure the fallbacks in this way, with each ordering variant having
its own ifdeffery of the form:

| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| 	int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| 	__atomic_acquire_fence();
| 	return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| 	return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif

Note that where there's no relevant arch_atomic*_() ordering
variant, we'll define the operation in terms of a distinct
raw_atomic*_(), as this itself might have been filled in with a
fallback.

As we now generate the raw_atomic*_() implementations directly, we
no longer need the trivial wrappers, so they are removed.

This makes the ifdeffery easier to follow, and will allow for further
improvements in subsequent patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com

locking/atomic: scripts: factor out order template generation

2023-06-05T07:57:19+00:00

Currently gen_proto_order_variants() hard codes the path for the templates used
for order fallbacks. Factor this out into a helper so that it can be reused
elsewhere.

This results in no change to the generated headers, so there should be
no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20230605070124.3741859-17-mark.rutland@arm.com

locking/atomic: scripts: remove bogus order parameter

2023-06-05T07:57:18+00:00

At the start of gen_proto_order_variants(), the ${order} variable is not
yet defined, and will be substituted with an empty string.

Replace the current bogus use of ${order} with an empty string instead.

This results in no change to the generated headers.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20230605070124.3741859-15-mark.rutland@arm.com

instrumentation: Wire up cmpxchg128()

2023-06-05T07:36:36+00:00

Wire up the cmpxchg128 family in the atomic wrapper scripts.

These provide the generic cmpxchg128 family of functions from the
arch_ prefixed version, adding explicit instrumentation where needed.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Arnd Bergmann 
Reviewed-by: Mark Rutland 
Acked-by: Mark Rutland 
Tested-by: Mark Rutland 
Link: https://lore.kernel.org/r/20230531132323.519237070@infradead.org

locking/atomic: Add generic try_cmpxchg{,64}_local() support

2023-04-29T07:09:02+00:00

Add generic support for try_cmpxchg{,64}_local() and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Signed-off-by: Uros Bizjak 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Mark Rutland 
Link: https://lore.kernel.org/r/20230405141710.3551-2-ubizjak@gmail.com
Cc: Linus Torvalds

locking/atomic: Add generic try_cmpxchg64 support

2022-05-17T22:08:27+00:00

Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed}
and their falbacks involving cmpxchg64.

Signed-off-by: Uros Bizjak 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com