| Age | Commit message (Collapse) | Author | Files | Lines |
|
lockdep_unregister_key() is called from critical code paths, including
sections where rtnl_lock() is held. For example, when replacing a qdisc
in a network device, network egress traffic is disabled while
__qdisc_destroy() is called for every network queue.
If lockdep is enabled, __qdisc_destroy() calls lockdep_unregister_key(),
which gets blocked waiting for synchronize_rcu() to complete.
For example, a simple tc command to replace a qdisc could take 13
seconds:
# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq
real 0m13.195s
user 0m0.001s
sys 0m2.746s
During this time, network egress is completely frozen while waiting for
RCU synchronization.
Use synchronize_rcu_expedited() instead to minimize the impact on
critical operations like network connectivity changes.
This improves 10x the function call to tc, when replacing the qdisc for
a network card.
# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq
real 0m1.789s
user 0m0.000s
sys 0m1.613s
[boqun: Fixed the comment and add more information for the temporary
workaround, and add TODO information for hazptr]
Reported-by: Erik Lundgren <elundgren@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250321-lockdep-v1-1-78b732d195fb@debian.org
|
|
Returning a large structure from the lock_stats() function causes clang
to have multiple copies of it on the stack and copy between them, which
can end up exceeding the frame size warning limit:
kernel/locking/lockdep.c:300:25: error: stack frame size (1464) exceeds limit (1280) in 'lock_stats' [-Werror,-Wframe-larger-than]
300 | struct lock_class_stats lock_stats(struct lock_class *class)
Change the calling conventions to directly operate on the caller's copy,
which apparently is what gcc does already.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250610092941.2642847-1-arnd@kernel.org
|
|
There have been recent reports about running out of lockdep keys:
MAX_LOCKDEP_KEYS too low!
One possible reason is that too many dynamic keys have been registered.
A possible culprit is the lockdep_register_key() call in qdisc_alloc()
of net/sched/sch_generic.c.
Currently, there is no way to find out how many dynamic keys have been
registered. Add such a stat to the /proc/lockdep_stats to get better
clarity.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20250506042049.50060-4-boqun.feng@gmail.com
|
|
To catch the code trying to use a subclass value >= MAX_LOCKDEP_SUBCLASSES (8),
add a DEBUG_LOCKS_WARN_ON() statement to notify the users that such a
large value is not allowed.
[ boqun: Reword the commit log with a more objective tone ]
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20250506042049.50060-3-boqun.feng@gmail.com
|
|
When hlock_equal() is unused, it prevents kernel builds with clang,
`make W=1` and CONFIG_WERROR=y, CONFIG_LOCKDEP=y and
CONFIG_LOCKDEP_SMALL=n:
lockdep.c:2005:20: error: unused function 'hlock_equal' [-Werror,-Wunused-function]
Fix this by moving the function to the respective existing ifdeffery
for its the only user.
See also:
6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build")
Fixes: 68e305678583 ("lockdep: Adjust check_redundant() for recursive read change")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20250506042049.50060-2-boqun.feng@gmail.com
|
|
Currently, when a lock class is allocated, nr_unused_locks will be
increased by 1, until it gets used: nr_unused_locks will be decreased by
1 in mark_lock(). However, one scenario is missed: a lock class may be
zapped without even being used once. This could result into a situation
that nr_unused_locks != 0 but no unused lock class is active in the
system, and when `cat /proc/lockdep_stats`, a WARN_ON() will
be triggered in a CONFIG_DEBUG_LOCKDEP=y kernel:
[...] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != nr_unused)
[...] WARNING: CPU: 41 PID: 1121 at kernel/locking/lockdep_proc.c:283 lockdep_stats_show+0xba9/0xbd0
And as a result, lockdep will be disabled after this.
Therefore, nr_unused_locks needs to be accounted correctly at
zap_class() time.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250326180831.510348-1-boqun.feng@gmail.com
|
|
KASAN instrumentation of lockdep has been disabled, as we don't need
KASAN to check the validity of lockdep internal data structures and
incur unnecessary performance overhead. However, the lockdep_map pointer
passed in externally may not be valid (e.g. use-after-free) and we run
the risk of using garbage data resulting in false lockdep reports.
Add kasan_check_byte() call in lock_acquire() for non kernel core data
object to catch invalid lockdep_map and print out a KASAN report before
any lockdep splat, if any.
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20250214195242.2480920-1-longman@redhat.com
Link: https://lore.kernel.org/r/20250307232717.1759087-7-boqun.feng@gmail.com
|
|
Add some lock events to lockdep to profile its behavior.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-5-boqun.feng@gmail.com
|
|
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
When lockdep_assert_locked() is unused, it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y, CONFIG_LOCKDEP=y and
CONFIG_PROVE_LOCKING=n:
kernel/locking/lockdep.c:160:20: error: unused function 'lockdep_assert_locked' [-Werror,-Wunused-function]
Fix this by moving it under the respective ifdeffery.
See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
[Boqun: add more config information of the error]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20241202193445.769567-1-andriy.shevchenko@linux.intel.com
|
|
When chain_hlock_class_idx() is unused, it prevents kernel builds with
clang, `make W=1` and CONFIG_WERROR=y, CONFIG_LOCKDEP=y and
CONFIG_PROVE_LOCKING=n:
kernel/locking/lockdep.c:435:28: error: unused function 'chain_hlock_class_idx' [-Werror,-Wunused-function]
Fix this by marking it with __maybe_unused.
See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
[Boqun: add more config information of the error]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20241209170810.1485183-1-andriy.shevchenko@linux.intel.com
|
|
All those:
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES: 8
... MAX_LOCK_DEPTH: 48
... MAX_LOCKDEP_KEYS: 8192
and so on are dumped with the KERN_WARNING level. It is due to missing
KERN_* annotation.
Use pr_info() instead of bare printk() to dump the info with the info
level.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20241007065457.20128-1-jirislaby@kernel.org
|
|
Add a function to check that an offline CPU has left the tracing
infrastructure in a sane state.
Commit 9bb69ba4c177 ("ACPI: processor_idle: use raw_safe_halt() in
acpi_idle_play_dead()") fixed an issue where the acpi_idle_play_dead()
function called safe_halt() instead of raw_safe_halt(), which had the
side-effect of setting the hardirqs_enabled flag for the offline CPU.
On x86 this triggered warnings from lockdep_assert_irqs_disabled() when
the CPU was brought back online again later. These warnings were too
early for the exception to be handled correctly, leading to a
triple-fault.
Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
print the events leading up to it, and correct it so that the CPU can
come online again correctly. Re-introducing the original bug now merely
results in this warning instead:
[ 61.556652] smpboot: CPU 1 is now offline
[ 61.556769] CPU 1 left hardirqs enabled!
[ 61.556915] irq event stamp: 128149
[ 61.556965] hardirqs last enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
[ 61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
[ 61.557117] softirqs last enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
[ 61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100
[boqun: Capitalize the title and reword the message a bit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/f7bd2b3b999051bb3ef4be34526a9262008285f5.camel@infradead.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"lockdep:
- Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
- Use str_plural() to address Coccinelle warning (Thorsten Blum)
- Add debuggability enhancement (Luis Claudio R. Goncalves)
static keys & calls:
- Fix static_key_slow_dec() yet again (Peter Zijlstra)
- Handle module init failure correctly in static_call_del_module()
(Thomas Gleixner)
- Replace pointless WARN_ON() in static_call_module_notify() (Thomas
Gleixner)
<linux/cleanup.h>:
- Add usage and style documentation (Dan Williams)
rwsems:
- Move is_rwsem_reader_owned() and rwsem_owner() under
CONFIG_DEBUG_RWSEMS (Waiman Long)
atomic ops, x86:
- Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
- Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros
Bizjak)"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS
jump_label: Fix static_key_slow_dec() yet again
static_call: Replace pointless WARN_ON() in static_call_module_notify()
static_call: Handle module init failure correctly in static_call_del_module()
locking/lockdep: Simplify character output in seq_line()
lockdep: fix deadlock issue between lockdep and rcu
lockdep: Use str_plural() to fix Coccinelle warning
cleanup: Add usage and style documentation
lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug
locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void
locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
|
|
Merge all pending locking commits into a single branch.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Mark emergency sections wherever multiple lines of
lock debugging output are generated. In an emergency
section, every printk() call will attempt to directly
flush to the consoles using the EMERGENCY priority.
Note that debug_show_all_locks() and
lockdep_print_held_locks() rely on their callers to
enter the emergency section. This is because these
functions can also be called in non-emergency
situations (such as sysrq).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240820063001.36405-36-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
We won't find a contended lock if it's not being tracked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There is a deadlock scenario between lockdep and rcu when
rcu nocb feature is enabled, just as following call stack:
rcuop/x
-000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?)
-001|queued_spin_lock(inline) // try to hold nocb_gp_lock
-001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80)
-002|__raw_spin_lock_irqsave(inline)
-002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80)
-003|wake_nocb_gp_defer(inline)
-003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680)
-004|__call_rcu_common(inline)
-004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?)
-005|call_rcu_zapped(inline)
-005|free_zapped_rcu(ch = ?)// hold graph lock
-006|rcu_do_batch(rdp = 0xFFFFFF817F245680)
-007|nocb_cb_wait(inline)
-007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680)
-008|kthread(_create = 0xFFFFFF80803122C0)
-009|ret_from_fork(asm)
rcuop/y
-000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0)
-001|queued_spin_lock()
-001|lockdep_lock()
-001|graph_lock() // try to hold graph lock
-002|lookup_chain_cache_add()
-002|validate_chain()
-003|lock_acquire
-004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80)
-005|lock_timer_base(inline)
-006|mod_timer(inline)
-006|wake_nocb_gp_defer(inline)// hold nocb_gp_lock
-006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680)
-007|__call_rcu_common(inline)
-007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?)
-008|call_rcu_hurry(inline)
-008|rcu_sync_call(inline)
-008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58)
-009|rcu_do_batch(rdp = 0xFFFFFF817F266680)
-010|nocb_cb_wait(inline)
-010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680)
-011|kthread(_create = 0xFFFFFF8080363740)
-012|ret_from_fork(asm)
rcuop/x and rcuop/y are rcu nocb threads with the same nocb gp thread.
This patch release the graph lock before lockdep call_rcu.
Fixes: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use")
Cc: stable@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20240620225436.3127927-1-cmllamas@google.com
|
|
Fixes the following Coccinelle/coccicheck warning reported by
string_choices.cocci:
opportunity for str_plural(depth)
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20240528120008.403511-2-thorsten.blum@toblux.com
|
|
When lockdep fails while performing the Breadth-first-search operation
due to lack of memory, hint that increasing the value of the config
switch LOCKDEP_CIRCULAR_QUEUE_BITS should fix the warning.
Preface the scary backtrace with the suggestion:
[ 163.849242] Increase LOCKDEP_CIRCULAR_QUEUE_BITS to avoid this warning:
[ 163.849248] ------------[ cut here ]------------
[ 163.849250] lockdep bfs error:-1
[ 163.849263] WARNING: CPU: 24 PID: 2454 at kernel/locking/lockdep.c:2091 print_bfs_bug+0x27/0x40
...
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/Zqkmy0lS-9Sw0M9j@uudg.org
|
|
Pull bcachefs updates from Kent Overstreet:
- Metadata version 1.8: Stripe sectors accounting, BCH_DATA_unstriped
This splits out the accounting of dirty sectors and stripe sectors in
alloc keys; this lets us see stripe buckets that still have unstriped
data in them.
This is needed for ensuring that erasure coding is working correctly,
as well as completing stripe creation after a crash.
- Metadata version 1.9: Disk accounting rewrite
The previous disk accounting scheme relied heavily on percpu counters
that were also sharded by outstanding journal buffer; it was fast but
not extensible or scalable, and meant that all accounting counters
were recorded in every journal entry.
The new disk accounting scheme stores accounting as normal btree
keys; updates are deltas until they are flushed by the btree write
buffer.
This means we have no practical limit on the number of counters, and
a new tagged union format that's easy to extend.
We now have counters for compression type/ratio, per-snapshot-id
usage, per-btree-id usage, and pending rebalance work.
- Self healing on read IO/checksum error
Data is now automatically rewritten if we get a read error and then a
successful retry
- Mount API conversion (thanks to Thomas Bertschinger)
- Better lockdep coverage
Previously, btree node locks were tracked individually by lockdep,
like any other lock. But we may take _many_ btree node locks
simultaneously, we easily blow through the limit of 48 locks that
lockdep can track, leading to lockdep turning itself off.
Tracking each btree node lock individually isn't really necessary
since we have our own cycle detector for deadlock avoidance and
centralized tracking of btree node locks, so we now have a single
lockdep_map in btree_trans for "any btree nodes are locked".
- Some more small incremental work towards online check_allocations
- Lots more debugging improvements
- Fixes, including:
- undefined behaviour fixes, originally noted as breaking userspace
LTO builds
- fix a spurious warning in fsck_err, reported by Marcin
- fix an integer overflow on trans->nr_updates, also reported by
Marcin; this broke during deletion of highly fragmented indirect
extents
* tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefs: (120 commits)
lockdep: Add comments for lockdep_set_no{validate,track}_class()
bcachefs: Fix integer overflow on trans->nr_updates
bcachefs: silence silly kdoc warning
bcachefs: Fix fsck warning about btree_trans not passed to fsck error
bcachefs: Add an error message for insufficient rw journal devs
bcachefs: varint: Avoid left-shift of a negative value
bcachefs: darray: Don't pass NULL to memcpy()
bcachefs: Kill bch2_assert_btree_nodes_not_locked()
bcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTED
bcachefs: __bch2_read(): call trans_begin() on every loop iter
bcachefs: show none if label is not set
bcachefs: drop packed, aligned from bkey_inode_buf
bcachefs: btree node scan: fall back to comparing by journal seq
bcachefs: Add lockdep support for btree node locks
lockdep: lockdep_set_notrack_class()
bcachefs: Improve copygc_wait_to_text()
bcachefs: Convert clock code to u64s
bcachefs: Improve startup message
bcachefs: Self healing on read IO error
bcachefs: Make read_only a mount option again, but hidden
...
|
|
Add a new helper to disable lockdep tracking entirely for a given class.
This is needed for bcachefs, which takes too many btree node locks for
lockdep to track. Instead, we have a single lockdep_map for "btree_trans
has any btree nodes locked", which makes more since given that we have
centralized lock management and a cycle detector.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This commit is part of a greater effort to remove all
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
Kent reported an occasional KASAN splat in lockdep. Mark then noted:
> I suspect the dodgy access is to chain_block_buckets[-1], which hits the last 4
> bytes of the redzone and gets (incorrectly/misleadingly) attributed to
> nr_large_chain_blocks.
That would mean @size == 0, at which point size_to_bucket() returns -1
and the above happens.
alloc_chain_hlocks() has 'size - req', for the first with the
precondition 'size >= rq', which allows the 0.
This code is trying to split a block, del_chain_block() takes what we
need, and add_chain_block() puts back the remainder, except in the
above case the remainder is 0 sized and things go sideways.
Fixes: 810507fe6fd5 ("locking/lockdep: Reuse freed chain_hlocks entries")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lkml.kernel.org/r/20231121114126.GH8262@noisy.programming.kicks-ass.net
|
|
On the parisc architecture, lockdep reports for all static objects which
are in the __initdata section (e.g. "setup_done" in devtmpfs,
"kthreadd_done" in init/main.c) this warning:
INFO: trying to register non-static key.
The warning itself is wrong, because those objects are in the __initdata
section, but the section itself is on parisc outside of range from
_stext to _end, which is why the static_obj() functions returns a wrong
answer.
While fixing this issue, I noticed that the whole existing check can
be simplified a lot.
Instead of checking against the _stext and _end symbols (which include
code areas too) just check for the .data and .bss segments (since we check a
data object). This can be done with the existing is_kernel_core_data()
macro.
In addition objects in the __initdata section can be checked with
init_section_contains(), and is_kernel_rodata() allows keys to be in the
_ro_after_init section.
This partly reverts and simplifies commit bac59d18c701 ("x86/setup: Fix static
memory detection").
Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100
Fixes: bac59d18c701 ("x86/setup: Fix static memory detection")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()
The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.
Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.
- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.
The generated definitions are much cleaner now, and come with
documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.
This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.
* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fixes from Thomas Gleixner:
"Two fixes for debugobjects:
- Prevent the allocation path from waking up kswapd.
That's a long standing issue due to the GFP_ATOMIC allocation flag.
As debug objects can be invoked from pretty much any context waking
kswapd can end up in arbitrary lock chains versus the waitqueue
lock
- Correct the explicit lockdep wait-type violation in
debug_object_fill_pool()"
* tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Don't wake up kswapd from fill_pool()
debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
|
|
This implements a new interface to lockdep, lock_set_cmp_fn(), for
defining a custom ordering when taking multiple locks of the same
class.
This is an alternative to subclasses, but can not fully replace them
since subclasses allow lock hierarchies with other clasees
inter-twined, while this relies on pure class nesting.
Specifically, if A is our nesting class then:
A/0 <- B <- A/1
Would be a valid lock order with subclasses (each subclass really is a
full class from the validation PoV) but not with this annotation,
which requires all nesting to be consecutive.
Example output:
| ============================================
| WARNING: possible recursive locking detected
| 6.2.0-rc8-00003-g7d81e591ca6a-dirty #15 Not tainted
| --------------------------------------------
| kworker/14:3/938 is trying to acquire lock:
| ffff8880143218c8 (&b->lock l=0 0:2803368){++++}-{3:3}, at: bch_btree_node_get.part.0+0x81/0x2b0
|
| but task is already holding lock:
| ffff8880143de8c8 (&b->lock l=1 1048575:9223372036854775807){++++}-{3:3}, at: __bch_btree_map_nodes+0xea/0x1e0
| and the lock comparison function returns 1:
|
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(&b->lock l=1 1048575:9223372036854775807);
| lock(&b->lock l=0 0:2803368);
|
| *** DEADLOCK ***
|
| May be due to missing lock nesting notation
|
| 3 locks held by kworker/14:3/938:
| #0: ffff888005ea9d38 ((wq_completion)bcache){+.+.}-{0:0}, at: process_one_work+0x1ec/0x530
| #1: ffff8880098c3e70 ((work_completion)(&cl->work)#3){+.+.}-{0:0}, at: process_one_work+0x1ec/0x530
| #2: ffff8880143de8c8 (&b->lock l=1 1048575:9223372036854775807){++++}-{3:3}, at: __bch_btree_map_nodes+0xea/0x1e0
[peterz: extended changelog]
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230509195847.1745548-1-kent.overstreet@linux.dev
|
|
There is an explicit wait-type violation in debug_object_fill_pool()
for PREEMPT_RT=n kernels which allows them to more easily fill the
object pool and reduce the chance of allocation failures.
Lockdep's wait-type checks are designed to check the PREEMPT_RT
locking rules even for PREEMPT_RT=n kernels and object to this, so
create a lockdep annotation to allow this to stand.
Specifically, create a 'lock' type that overrides the inner wait-type
while it is held -- allowing one to temporarily raise it, such that
the violation is hidden.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Qi Zheng <zhengqi.arch@bytedance.com>
Link: https://lkml.kernel.org/r/20230429100614.GA1489784@hirez.programming.kicks-ass.net
|
|
Lock scenario print is always a weak spot of lockdep splats. Improvement
can be made if we rework the dependency search and the error printing.
However without touching the graph search, we can improve a little for
the circular deadlock case, since we have the to-be-added lock
dependency, and know whether these two locks are read/write/sync.
In order to know whether a held_lock is sync or not, a bit was
"stolen" from ->references, which reduce our limit for the same lock
class nesting from 2^12 to 2^11, and it should still be good enough.
Besides, since we now have bit in held_lock for sync, we don't need the
"hardirqoffs being 1" trick, and also we can avoid the __lock_release()
if we jump out of __lock_acquire() before the held_lock stored.
With these changes, a deadlock case evolved with read lock and sync gets
a better print-out from:
[...] Possible unsafe locking scenario:
[...]
[...] CPU0 CPU1
[...] ---- ----
[...] lock(srcuA);
[...] lock(srcuB);
[...] lock(srcuA);
[...] lock(srcuB);
to
[...] Possible unsafe locking scenario:
[...]
[...] CPU0 CPU1
[...] ---- ----
[...] rlock(srcuA);
[...] lock(srcuB);
[...] lock(srcuA);
[...] sync(srcuB);
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
Currently, functions like synchronize_srcu() do not have lockdep
annotations resembling those of other write-side locking primitives.
Such annotations might look as follows:
lock_acquire();
lock_release();
Such annotations would tell lockdep that synchronize_srcu() acts like
an empty critical section that waits for other (read-side) critical
sections to finish. This would definitely catch some deadlock, but
as pointed out by Paul Mckenney [1], this could also introduce false
positives because of irq-safe/unsafe detection. Of course, there are
tricks could help with this:
might_sleep(); // Existing statement in __synchronize_srcu().
if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
local_irq_disable();
lock_acquire();
lock_release();
local_irq_enable();
}
But it would be better for lockdep to provide a separate annonation for
functions like synchronize_srcu(), so that people won't need to repeat
the ugly tricks above.
Therefore introduce lock_sync(), which is simply an lock+unlock
pair with no irq safe/unsafe deadlock check. This works because the
to-be-annontated functions do not create real critical sections, and
there is therefore no way that irq can create extra dependencies.
[1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ boqun: Fix typos reported by Davidlohr Bueso and Paul E. Mckenney ]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
In order to avoid WARN/BUG from generating nested or even recursive
warnings, force rcu_is_watching() true during
WARN/lockdep_rcu_suspicious().
Notably things like unwinding the stack can trigger rcu_dereference()
warnings, which then triggers more unwinding which then triggers more
warnings etc..
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126151323.408156109@infradead.org
|
|
look_up_lock_class() got confused
Printing this information will be helpful:
------------[ cut here ]------------
Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
WARNING: CPU: 1 PID: 14195 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
Modules linked in:
CPU: 1 PID: 14195 Comm: a.out Not tainted 6.0.0-rc6-dirty #863
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:look_up_lock_class+0xcc/0x140
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/bd99391e-f787-efe9-5ec6-3c6dc4c587b0@I-love.SAKURA.ne.jp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS and
Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
parameter prevents offloaded callbacks from interfering with
real-time workloads and with energy-efficiency mechanisms
- Polled grace-period updates, perhaps most notably making these APIs
account for both normal and expedited grace periods
- Tasks RCU updates, perhaps most notably reducing the CPU overhead of
RCU tasks trace grace periods by more than a factor of two on a
system with 15,000 tasks.
The reduction is expected to increase with the number of tasks, so it
seems reasonable to hypothesize that a system with 150,000 tasks
might see a 20-fold reduction in CPU overhead
- Torture-test updates
- Updates that merge RCU's dyntick-idle tracking into context tracking,
thus reducing the overhead of transitioning to kernel mode from
either idle or nohz_full userspace execution for kernels that track
context independently of RCU.
This is expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y
* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
rcu: Diagnose extended sync_rcu_do_polled_gp() loops
rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
rcutorture: Test polled expedited grace-period primitives
rcu: Add polled expedited grace-period primitives
rcutorture: Verify that polled GP API sees synchronous grace periods
rcu: Make Tiny RCU grace periods visible to polled APIs
rcu: Make polled grace-period API account for expedited grace periods
rcu: Switch polled grace-period APIs to ->gp_seq_polled
rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
rcu/nocb: Add option to opt rcuo kthreads out of RT priority
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
rcu/nocb: Add an option to offload all CPUs on boot
rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
rcu/nocb: Add/del rdp to iterate from rcuog itself
rcu/tree: Add comment to describe GP-done condition in fqs loop
rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
rcu/kvfree: Remove useless monitor_todo flag
rcu: Cleanup RCU urgency state for offline CPU
...
|
|
The RCU dynticks counter is going to be merged into the context tracking
subsystem. Start with moving the idle extended quiescent states
entrypoints to context tracking. For now those are dumb redirections to
existing RCU calls.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
|
|
Commit dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t") added yet
another lockdep_init_map_*() variant, but forgot to update all the
existing users of the most complicated version.
This could lead to a loss of lock_type and hence an incorrect report.
Given the relative rarity of both local_lock and these annotations,
this is unlikely to happen in practise, still, best fix things.
Fixes: dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YqyEDtoan20K0CVD@worktop.programming.kicks-ass.net
|
|
Since the rewrote of prandom_u32(), in the commit mentioned below, the
function uses sleeping locks which extracing random numbers and filling
the batch.
This breaks lockdep on PREEMPT_RT because lock_pin_lock() disables
interrupts while calling __lock_pin_lock(). This can't be moved earlier
because the main user of the function (rq_pin_lock()) invokes that
function after disabling interrupts in order to acquire the lock.
The cookie does not require random numbers as its goal is to provide a
random value in order to notice unexpected "unlock + lock" sites.
Use sched_clock() to provide random numbers.
Fixes: a0103f4d86f88 ("random32: use real rng for non-deterministic randomness")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YoNn3pTkm5+QzE5k@linutronix.de
|