linux.git/kernel/ucount.c, branch v6.18.21

ucount: check for CAP_SYS_RESOURCE using ns_capable_noaudit()

2026-02-26T22:59:20+00:00

[ Upstream commit 0895a000e4fff9e950a7894210db45973e485c35 ]

The user.* sysctls implement the ctl_table_root::permissions hook and they
override the file access mode based on the CAP_SYS_RESOURCE capability (at
most rwx if capable, at most r-- if not).  The capability is being checked
unconditionally, so if an LSM denies the capability, an audit record may
be logged even when access is in fact granted.

Given the logic in the set_permissions() function in kernel/ucount.c and
the unfortunate way the permission checking is implemented, it doesn't
seem viable to avoid false positive denials by deferring the capability
check.  Thus, do the same as in net_ctl_permissions() (net/sysctl_net.c) -
switch from ns_capable() to ns_capable_noaudit(), so that the check never
logs an audit record.

Link: https://lkml.kernel.org/r/20260122140745.239428-1-omosnace@redhat.com
Fixes: dbec28460a89 ("userns: Add per user namespace sysctls.")
Signed-off-by: Ondrej Mosnacek 
Reviewed-by: Paul Moore 
Acked-by: Serge Hallyn 
Cc: Eric Biederman 
Cc: Alexey Gladkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Sasha Levin

ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()

2025-08-02T19:01:38+00:00

Use atomic_long_try_cmpxchg() instead of
atomic_long_cmpxchg (*ptr, old, new) == old in atomic_long_inc_below().
x86 CMPXCHG instruction returns success in ZF flag, so this change saves
a compare after cmpxchg (and related move instruction in front of cmpxchg).

Also, atomic_long_try_cmpxchg implicitly assigns old *ptr value to "old"
when cmpxchg fails, enabling further code simplifications.

No functional change intended.

Link: https://lkml.kernel.org/r/20250721174610.28361-2-ubizjak@gmail.com
Signed-off-by: Uros Bizjak 
Reviewed-by: Alexey Gladkov 
Cc: Sebastian Andrzej Siewior 
Cc: "Paul E. McKenney" 
Cc: Alexey Gladkov 
Cc: Roman Gushchin 
Cc: MengEn Sun 
Cc: "Thomas Weißschuh" 
Signed-off-by: Andrew Morton

ucount: fix atomic_long_inc_below() argument type

2025-08-02T19:01:38+00:00

The type of u argument of atomic_long_inc_below() should be long to avoid
unwanted truncation to int.

The patch fixes the wrong argument type of an internal function to
prevent unwanted argument truncation.  It fixes an internal locking
primitive; it should not have any direct effect on userspace.

Mark said

: AFAICT there's no problem in practice because atomic_long_inc_below()
: is only used by inc_ucount(), and it looks like the value is
: constrained between 0 and INT_MAX.
: 
: In inc_ucount() the limit value is taken from
: user_namespace::ucount_max[], and AFAICT that's only written by
: sysctls, to the table setup by setup_userns_sysctls(), where
: UCOUNT_ENTRY() limits the value between 0 and INT_MAX.
: 
: This is certainly a cleanup, but there might be no functional issue in
: practice as above.

Link: https://lkml.kernel.org/r/20250721174610.28361-1-ubizjak@gmail.com
Fixes: f9c82a4ea89c ("Increase size of ucounts to atomic_long_t")
Signed-off-by: Uros Bizjak 
Reviewed-by: "Eric W. Biederman" 
Cc: Sebastian Andrzej Siewior 
Cc: "Paul E. McKenney" 
Cc: Alexey Gladkov 
Cc: Roman Gushchin 
Cc: MengEn Sun 
Cc: "Thomas Weißschuh" 
Cc: Mark Rutland 
Signed-off-by: Andrew Morton

ucount: use rcuref_t for reference counting

2025-03-17T05:30:50+00:00

Use rcuref_t for reference counting.  This eliminates the cmpxchg loop in
the get and put path.  This also eliminates the need to acquire the lock
in the put path because once the final user returns the reference, it can
no longer be obtained anymore.

Use rcuref_t for reference counting.

Link: https://lkml.kernel.org/r/20250203150525.456525-5-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Paul E. McKenney 
Cc: Thomas Gleixner 
Cc: Boqun Feng 
Cc: Joel Fernandes 
Cc: Josh Triplett 
Cc: Lai jiangshan 
Cc: Mathieu Desnoyers 
Cc: Mengen Sun 
Cc: Steven Rostedt 
Cc: "Uladzislau Rezki (Sony)" 
Cc: YueHong Wu 
Cc: Zqiang 
Signed-off-by: Andrew Morton

ucount: use RCU for ucounts lookups

2025-03-17T05:30:50+00:00

The ucounts element is looked up under ucounts_lock.  This can be
optimized by using RCU for a lockless lookup and return and element if the
reference can be obtained.

Replace hlist_head with hlist_nulls_head which is RCU compatible.  Let
find_ucounts() search for the required item within a RCU section and
return the item if a reference could be obtained.  This means
alloc_ucounts() will always return an element (unless the memory
allocation failed).  Let put_ucounts() RCU free the element if the
reference counter dropped to zero.

Link: https://lkml.kernel.org/r/20250203150525.456525-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Paul E. McKenney 
Cc: Thomas Gleixner 
Cc: Boqun Feng 
Cc: Joel Fernandes 
Cc: Josh Triplett 
Cc: Lai jiangshan 
Cc: Mathieu Desnoyers 
Cc: Mengen Sun 
Cc: Steven Rostedt 
Cc: "Uladzislau Rezki (Sony)" 
Cc: YueHong Wu 
Cc: Zqiang 
Signed-off-by: Andrew Morton

ucount: replace get_ucounts_or_wrap() with atomic_inc_not_zero()

2025-03-17T05:30:50+00:00

get_ucounts_or_wrap() increments the counter and if the counter is
negative then it decrements it again in order to reset the previous
increment.  This statement can be replaced with atomic_inc_not_zero() to
only increment the counter if it is not yet 0.

This simplifies the get function because the put (if the get failed) can
be removed.  atomic_inc_not_zero() is implement as a cmpxchg() loop which
can be repeated several times if another get/put is performed in parallel.
This will be optimized later.

Increment the reference counter only if not yet dropped to zero.

Link: https://lkml.kernel.org/r/20250203150525.456525-3-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Paul E. McKenney 
Cc: Thomas Gleixner 
Cc: Boqun Feng 
Cc: Joel Fernandes 
Cc: Josh Triplett 
Cc: Lai jiangshan 
Cc: Mathieu Desnoyers 
Cc: Mengen Sun 
Cc: Steven Rostedt 
Cc: "Uladzislau Rezki (Sony)" 
Cc: YueHong Wu 
Cc: Zqiang 
Signed-off-by: Andrew Morton

ucounts: move kfree() out of critical zone protected by ucounts_lock

2025-01-13T04:21:00+00:00

Although kfree is a non-sleep function, it is possible to enter a long
chain of calls probabilistically, so it looks better to move kfree from
alloc_ucounts() out of the critical zone of ucounts_lock.

Link: https://lkml.kernel.org/r/1733458427-11794-1-git-send-email-mengensun@tencent.com
Signed-off-by: MengEn Sun 
Reviewed-by: YueHong Wu 
Reviewed-by: Andrew Morton 
Cc: Andrei Vagin 
Cc: Joel Granados 
Cc: Thomas Weißschuh 
Signed-off-by: Andrew Morton

Merge tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

2024-11-23T04:36:11+00:00

Pull sysctl updates from Joel Granados:
 "sysctl ctl_table constification:

   - Constifying ctl_table structs prevents the modification of
     proc_handler function pointers. All ctl_table struct arguments are
     const qualified in the sysctl API in such a way that the ctl_table
     arrays being defined elsewhere and passed through sysctl can be
     constified one-by-one.

     We kick the constification off by qualifying user_table in
     kernel/ucount.c and expect all the ctl_tables to be constified in
     the coming releases.

  Misc fixes:

   - Adjust comments in two places to better reflect the code

   - Remove superfluous dput calls

   - Remove Luis from sysctl maintainership

   - Replace comments about holding a lock with calls to
     lockdep_assert_held"

* tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: Reduce dput(child) calls in proc_sys_fill_cache()
  sysctl: Reorganize kerneldoc parameter names
  ucounts: constify sysctl table user_table
  sysctl: update comments to new registration APIs
  MAINTAINERS: remove me from sysctl
  sysctl: Convert locking comments to lockdep assertions
  const_structs.checkpatch: add ctl_table
  sysctl: make internal ctl_tables const
  sysctl: allow registration of const struct ctl_table
  sysctl: move internal interfaces to const struct ctl_table
  bpf: Constify ctl_table argument of filter function

signal: restore the override_rlimit logic

2024-11-07T22:14:59+00:00

Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals.  However now it's enforced unconditionally, even if
override_rlimit is set.  This behavior change caused production issues.  

For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo. 
This prevents the process from correctly identifying the fault address and
handling the error.  From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'.  This can lead to unpredictable behavior and
crashes, as we observed with java applications.

Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set.  This effectively
restores the old behavior.

Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin 
Co-developed-by: Andrei Vagin 
Signed-off-by: Andrei Vagin 
Acked-by: Oleg Nesterov 
Acked-by: Alexey Gladkov 
Cc: Kees Cook 
Cc: "Eric W. Biederman" 
Cc: 
Signed-off-by: Andrew Morton

ucounts: fix counter leak in inc_rlimit_get_ucounts()

2024-11-07T22:14:59+00:00

The inc_rlimit_get_ucounts() increments the specified rlimit counter and
then checks its limit.  If the value exceeds the limit, the function
returns an error without decrementing the counter.

Link: https://lkml.kernel.org/r/20241101191940.3211128-1-roman.gushchin@linux.dev
Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting")
Signed-off-by: Andrei Vagin 
Co-developed-by: Roman Gushchin 
Signed-off-by: Roman Gushchin 
Tested-by: Roman Gushchin 
Acked-by: Alexey Gladkov 
Cc: Kees Cook 
Cc: Andrei Vagin 
Cc: "Eric W. Biederman" 
Cc: Alexey Gladkov 
Cc: Oleg Nesterov 
Cc: 
Signed-off-by: Andrew Morton