summaryrefslogtreecommitdiff
path: root/kernel/debug/debug_core.c
AgeCommit message (Collapse)AuthorFilesLines
2023-11-28kgdb: Flush console before entering kgdb on panicDouglas Anderson1-0/+3
[ Upstream commit dd712d3d45807db9fcae28a522deee85c1f2fde6 ] When entering kdb/kgdb on a kernel panic, it was be observed that the console isn't flushed before the `kdb` prompt came up. Specifically, when using the buddy lockup detector on arm64 and running: echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT I could see: [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6 [ 32.552865] Sending NMI from CPU 5 to CPUs 6: [ 32.557359] NMI backtrace for cpu 6 ... [backtrace for cpu 6] ... [ 32.558353] NMI backtrace for cpu 5 ... [backtrace for cpu 5] ... [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7: [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry [5]kdb> As you can see, backtraces for the other CPUs start printing and get interleaved with the kdb PANIC print. Let's replicate the commands to flush the console in the kdb panic entry point to avoid this. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230822131945.1.I5b460ae8f954e4c4f628a373d6e74713c06dd26f@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()Kees Cook1-1/+1
[ Upstream commit b13fecb1c3a603c4b8e99b306fecf4f668c11b32 ] This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: 1fdeb8b9f29d ("wifi: iwl3945: Add missing check for create_singlethread_workqueue") Signed-off-by: Sasha Levin <sashal@kernel.org> [Tom: fix backport to 5.4.y] AUTOSEL backport to 5.4.y of: b13fecb1c3a6 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") changed all locations of DECLARE_TASKLET with DECLARE_TASKLET_OLD, except one, in arch/mips/lasat/pcivue_proc.c. This is due to: 10760dde9be3 ("MIPS: Remove support for LASAT") preceeding b13fecb1c3a6 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") upstream and the former not being present in 5.4.y. Fix this by changing DECLARE_TASKLET to DECLARE_TASKLET_OLD in arch/mips/lasat/pcivue_proc.c. Fixes: 5de7a4254eb2 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20Revert "treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()"Tom Saeger1-1/+1
This reverts commit 5de7a4254eb2d501cbb59918a152665b29c02109 which caused mips build failures. kernelci.org bot reports: arch/mips/lasat/picvue_proc.c:87:20: error: ‘pvc_display_tasklet’ undeclared (first use in this function) arch/mips/lasat/picvue_proc.c:42:44: error: expected ‘)’ before ‘&’ token arch/mips/lasat/picvue_proc.c:33:13: error: ‘pvc_display’ defined but not used [-Werror=unused-function] Link: https://lore.kernel.org/stable/64041dda.170a0220.8cc25.79c9@mx.google.com/ Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()Kees Cook1-1/+1
[ Upstream commit b13fecb1c3a603c4b8e99b306fecf4f668c11b32 ] This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: 1fdeb8b9f29d ("wifi: iwl3945: Add missing check for create_singlethread_workqueue") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-06lockdown: also lock down previous kgdb useDaniel Thompson1-0/+24
commit eadb2f47a3ced5c64b23b90fd2a3463f63726066 upstream. KGDB and KDB allow read and write access to kernel memory, and thus should be restricted during lockdown. An attacker with access to a serial port (for example, via a hypervisor console, which some cloud vendors provide over the network) could trigger the debugger so it is important that the debugger respect the lockdown mode when/if it is triggered. Fix this by integrating lockdown into kdb's existing permissions mechanism. Unfortunately kgdb does not have any permissions mechanism (although it certainly could be added later) so, for now, kgdb is simply and brutally disabled by immediately exiting the gdb stub without taking any action. For lockdowns established early in the boot (e.g. the normal case) then this should be fine but on systems where kgdb has set breakpoints before the lockdown is enacted than "bad things" will happen. CVE: CVE-2022-21499 Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"Douglas Anderson1-8/+14
[ Upstream commit b18b099e04f450cdc77bec72acefcde7042bd1f3 ] On my system the kernel processes the "kgdb_earlycon" parameter before the "kgdbcon" parameter. When we setup "kgdb_earlycon" we'll end up in kgdb_register_callbacks() and "kgdb_use_con" won't have been set yet so we'll never get around to starting "kgdbcon". Let's remedy this by detecting that the IO module was already registered when setting "kgdb_use_con" and registering the console then. As part of this, to avoid pre-declaring things, move the handling of the "kgdbcon" further down in the file. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200630151422.1.I4aa062751ff5e281f5116655c976dff545c09a46@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09kgdb: Avoid suspicious RCU usage warningDouglas Anderson1-0/+4
[ Upstream commit 440ab9e10e2e6e5fd677473ee6f9e3af0f6904d6 ] At times when I'm using kgdb I see a splat on my console about suspicious RCU usage. I managed to come up with a case that could reproduce this that looked like this: WARNING: suspicious RCU usage 5.7.0-rc4+ #609 Not tainted ----------------------------- kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by swapper/0/1: #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac stack backtrace: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x1b8 show_stack+0x1c/0x24 dump_stack+0xd4/0x134 lockdep_rcu_suspicious+0xf0/0x100 find_task_by_pid_ns+0x5c/0x80 getthread+0x8c/0xb0 gdb_serial_stub+0x9d4/0xd04 kgdb_cpu_enter+0x284/0x7ac kgdb_handle_exception+0x174/0x20c kgdb_brk_fn+0x24/0x30 call_break_hook+0x6c/0x7c brk_handler+0x20/0x5c do_debug_exception+0x1c8/0x22c el1_sync_handler+0x3c/0xe4 el1_sync+0x7c/0x100 rpmh_rsc_probe+0x38/0x420 platform_drv_probe+0x94/0xb4 really_probe+0x134/0x300 driver_probe_device+0x68/0x100 __device_attach_driver+0x90/0xa8 bus_for_each_drv+0x84/0xcc __device_attach+0xb4/0x13c device_initial_probe+0x18/0x20 bus_probe_device+0x38/0x98 device_add+0x38c/0x420 If I understand properly we should just be able to blanket kgdb under one big RCU read lock and the problem should go away. We'll add it to the beast-of-a-function known as kgdb_cpu_enter(). With this I no longer get any splats and things seem to work fine. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e1b7a5bb7e4e@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22kgdb: Prevent infinite recursive entries to the debuggerDouglas Anderson1-0/+1
[ Upstream commit 3ca676e4ca60d1834bb77535dafe24169cadacef ] If we detect that we recursively entered the debugger we should hack our I/O ops to NULL so that the panic() in the next line won't actually cause another recursion into the debugger. The first line of kgdb_panic() will check this and return. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.6.I89de39f68736c9de610e6f241e68d8dbc44bc266@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22kgdb: Disable WARN_CONSOLE_UNLOCKED for all kgdbDouglas Anderson1-0/+4
[ Upstream commit 202164fbfa2b2ffa3e66b504e0f126ba9a745006 ] In commit 81eaadcae81b ("kgdboc: disable the console lock when in kgdb") we avoided the WARN_CONSOLE_UNLOCKED() yell when we were in kgdboc. That still works fine, but it turns out that we get a similar yell when using other I/O drivers. One example is the "I/O driver" for the kgdb test suite (kgdbts). When I enabled that I again got the same yells. Even though "kgdbts" doesn't actually interact with the user over the console, using it still causes kgdb to print to the consoles. That trips the same warning: con_is_visible+0x60/0x68 con_scroll+0x110/0x1b8 lf+0x4c/0xc8 vt_console_print+0x1b8/0x348 vkdb_printf+0x320/0x89c kdb_printf+0x68/0x90 kdb_main_loop+0x190/0x860 kdb_stub+0x2cc/0x3ec kgdb_cpu_enter+0x268/0x744 kgdb_handle_exception+0x1a4/0x200 kgdb_compiled_brk_fn+0x34/0x44 brk_handler+0x7c/0xb8 do_debug_exception+0x1b4/0x228 Let's increment/decrement the "ignore_console_lock_warning" variable all the time when we enter the debugger. This will allow us to later revert commit 81eaadcae81b ("kgdboc: disable the console lock when in kgdb"). Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.1.Ied2b058357152ebcc8bf68edd6f20a11d98d7d4e@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-25kgdb: don't use a notifier to enter kgdb at panic; call directlyDouglas Anderson1-20/+11
Right now kgdb/kdb hooks up to debug panics by registering for the panic notifier. This works OK except that it means that kgdb/kdb gets called _after_ the CPUs in the system are taken offline. That means that if anything important was happening on those CPUs (like something that might have contributed to the panic) you can't debug them. Specifically I ran into a case where I got a panic because a task was "blocked for more than 120 seconds" which was detected on CPU 2. I nicely got shown stack traces in the kernel log for all CPUs including CPU 0, which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of __mmc_switch(). I then ended up at the kdb prompt where switched over to kgdb to try to look at local variables of the process on CPU 0. I found that I couldn't. Digging more, I found that I had no info on any tasks running on CPUs other than CPU 2 and that asking kdb for help showed me "Error: no saved data for this cpu". This was because all the CPUs were offline. Let's move the entry of kdb/kgdb to a direct call from panic() and stop using the generic notifier. Putting a direct call in allows us to order things more properly and it also doesn't seem like we're breaking any abstractions by calling into the debugger from the panic function. Daniel said: : This patch changes the way kdump and kgdb interact with each other. : However it would seem rather odd to have both tools simultaneously armed : and, even if they were, the user still has the option to use panic_timeout : to force a kdump to happen. Thus I think the change of order is : acceptable. Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-03kgdb: fix comment regarding static functionNadav Amit1-4/+1
The comment that says that module_event() is not static is clearly wrong. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kdb: Don't back trace on a cpu that didn't round upDouglas Anderson1-0/+4
If you have a CPU that fails to round up and then run 'btc' you'll end up crashing in kdb becaue we dereferenced NULL. Let's add a check. It's wise to also set the task to NULL when leaving the debugger so that if we fail to round up on a later entry into the debugger we won't backtrace a stale task. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kgdb: Don't round up a CPU that failed rounding up beforeDouglas Anderson1-1/+19
If we're using the default implementation of kgdb_roundup_cpus() that uses smp_call_function_single_async() we can end up hanging kgdb_roundup_cpus() if we try to round up a CPU that failed to round up before. Specifically smp_call_function_single_async() will try to wait on the csd lock for the CPU that we're trying to round up. If the previous round up never finished then that lock could still be held and we'll just sit there hanging. There's not a lot of use trying to round up a CPU that failed to round up before. Let's keep a flag that indicates whether the CPU started but didn't finish to round up before. If we see that flag set then we'll skip the next round up. In general we have a few goals here: - We never want to end up calling smp_call_function_single_async() when the csd is still locked. This is accomplished because flush_smp_call_function_queue() unlocks the csd _before_ invoking the callback. That means that when kgdb_nmicallback() runs we know for sure the the csd is no longer locked. Thus when we set "rounding_up = false" we know for sure that the csd is unlocked. - If there are no timeouts rounding up we should never skip a round up. NOTE #1: In general trying to continue running after failing to round up CPUs doesn't appear to be supported in the debugger. When I simulate this I find that kdb reports "Catastrophic error detected" when I try to continue. I can overrule and continue anyway, but it should be noted that we may be entering the land of dragons here. Possibly the "Catastrophic error detected" was added _because_ of the future failure to round up, but even so this is an area of the code that hasn't been strongly tested. NOTE #2: I did a bit of testing before and after this change. I introduced a 10 second hang in the kernel while holding a spinlock that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...". Before this change if I did: - Invoke hang - Enter debugger - g (which warns about Catastrophic error, g again to go anyway) - g - Enter debugger ...I'd hang the rest of the 10 seconds without getting a debugger prompt. After this change I end up in the debugger the 2nd time after only 1 second with the standard warning about 'Timed out waiting for secondary CPUs.' I'll also note that once the CPU finished waiting I could actually debug it (aka "btc" worked) I won't promise that everything works perfectly if the errant CPU comes back at just the wrong time (like as we're entering or exiting the debugger) but it certainly seems like an improvement. NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some implementations override kgdb_call_nmi_hook(). It shouldn't hurt to have it in kgdb_nmicallback() in any case. NOTE #4: this logic is really only needed because there is no API call like "smp_try_call_function_single_async()" or "smp_csd_is_locked()". If such an API existed then we'd use it instead, but it seemed a bit much to add an API like this just for kgdb. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()Douglas Anderson1-0/+41
When I had lockdep turned on and dropped into kgdb I got a nice splat on my system. Specifically it hit: DEBUG_LOCKS_WARN_ON(current->hardirq_context) Specifically it looked like this: sysrq: SysRq : DEBUG ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(current->hardirq_context) WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27 pstate: 604003c9 (nZCv DAIF +PAN -UAO) pc : lockdep_hardirqs_on+0xf0/0x160 ... Call trace: lockdep_hardirqs_on+0xf0/0x160 trace_hardirqs_on+0x188/0x1ac kgdb_roundup_cpus+0x14/0x3c kgdb_cpu_enter+0x53c/0x5cc kgdb_handle_exception+0x180/0x1d4 kgdb_compiled_brk_fn+0x30/0x3c brk_handler+0x134/0x178 do_debug_exception+0xfc/0x178 el1_dbg+0x18/0x78 kgdb_breakpoint+0x34/0x58 sysrq_handle_dbg+0x54/0x5c __handle_sysrq+0x114/0x21c handle_sysrq+0x30/0x3c qcom_geni_serial_isr+0x2dc/0x30c ... ... irq event stamp: ...45 hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4 hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130 softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34 softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100 ---[ end trace adf21f830c46e638 ]--- Looking closely at it, it seems like a really bad idea to be calling local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems like it could violate spinlock semantics and cause a deadlock. Instead, let's use a private csd alongside smp_call_function_single_async() to round up the other CPUs. Using smp_call_function_single_async() doesn't require interrupts to be enabled so we can remove the offending bit of code. In order to avoid duplicating this across all the architectures that use the default kgdb_roundup_cpus(), we'll add a "weak" implementation to debug_core.c. Looking at all the people who previously had copies of this code, there were a few variants. I've attempted to keep the variants working like they used to. Specifically: * For arch/arc we passed NULL to kgdb_nmicallback() instead of get_irq_regs(). * For arch/mips there was a bit of extra code around kgdb_nmicallback() NOTE: In this patch we will still get into trouble if we try to round up a CPU that failed to round up before. We'll try to round it up again and potentially hang when we try to grab the csd lock. That's not new behavior but we'll still try to do better in a future patch. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kgdb: Remove irq flags from roundupDouglas Anderson1-1/+1
The function kgdb_roundup_cpus() was passed a parameter that was documented as: > the flags that will be used when restoring the interrupts. There is > local_irq_save() call before kgdb_roundup_cpus(). Nobody used those flags. Anyone who wanted to temporarily turn on interrupts just did local_irq_enable() and local_irq_disable() without looking at them. So we can definitely remove the flags. Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/nmi.h> We are going to move softlockup APIs out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. <linux/nmi.h> already includes <linux/sched.h>. Include the <linux/nmi.h> header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02mm/vmacache, sched/headers: Introduce 'struct vmacache' and move it from ↵Ingo Molnar1-2/+2
<linux/sched.h> to <linux/mm_types> The <linux/sched.h> header includes various vmacache related defines, which are arguably misplaced. Move them to mm_types.h and minimize the sched.h impact by putting all task vmacache state into a new 'struct vmacache' structure. No change in functionality. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-14kernel/debug/debug_core.c: more properly delay for secondary CPUsDouglas Anderson1-2/+2
We've got a delay loop waiting for secondary CPUs. That loop uses loops_per_jiffy. However, loops_per_jiffy doesn't actually mean how many tight loops make up a jiffy on all architectures. It is quite common to see things like this in the boot log: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=24000) In my case I was seeing lots of cases where other CPUs timed out entering the debugger only to print their stack crawls shortly after the kdb> prompt was written. Elsewhere in kgdb we already use udelay(), so that should be safe enough to use to implement our timeout. We'll delay 1 ms for 1000 times, which should give us a full second of delay (just like the old code wanted) but allow us to notice that we're done every 1 ms. [akpm@linux-foundation.org: simplifications, per Daniel] Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Brian Norris <briannorris@chromium.org> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-19debug: prevent entering debug mode on panic/exception.Colin Cross1-0/+17
On non-developer devices, kgdb prevents the device from rebooting after a panic. Incase of panics and exceptions, to allow the device to reboot, prevent entering debug mode to avoid getting stuck waiting for the user to interact with debugger. To avoid entering the debugger on panic/exception without any extra configuration, panic_timeout is being used which can be set via /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default value. Setting panic_timeout indicates that the user requested machine to perform unattended reboot after panic. We dont want to get stuck waiting for the user input incase of panic. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kgdb-bugreport@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Cc: Android Kernel Team <kernel-team@android.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Colin Cross <ccross@android.com> [Kiran: Added context to commit message. panic_timeout is used instead of break_on_panic and break_on_exception to honor CONFIG_PANIC_TIMEOUT Modified the commit as per community feedback] Signed-off-by: Kiran Raparthy <kiran.kumar@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19kdb: Fix off by one error in kdb_cpu()Jason Wessel1-1/+1
There was a follow on replacement patch against the prior "kgdb: Timeout if secondary CPUs ignore the roundup". See: https://lkml.org/lkml/2015/1/7/442 This patch is the delta vs the patch that was committed upstream: * Fix an off-by-one error in kdb_cpu(). * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we really want a static limit. * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c (kgdb-next contains a patch which introduced pr_fmt() to this file to the tag will now be applied automatically). Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kernel/debug/debug_core.c: Logging clean-upFabian Frederick1-22/+21
-Convert printk( to pr_foo() -Add pr_fmt -Coalesce formats Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kgdb: timeout if secondary CPUs ignore the roundupDaniel Thompson1-2/+7
Currently if an active CPU fails to respond to a roundup request the CPU that requested the roundup will become stuck. This needlessly reduces the robustness of the debugger. This patch introduces a timeout allowing the system state to be examined even when the system contains unresponsive processors. It also modifies kdb's cpu command to make it censor attempts to switch to unresponsive processors and to report their state as (D)ead. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-04-18arch: Mass conversion of smp_mb__*()Peter Zijlstra1-2/+2
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-07mm: per-thread vma cachingDavidlohr Bueso1-3/+11
This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 50.61% | 19.90 | | patched | 73.45% | 13.58 | +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 75.28% | 11.03 | | patched | 88.09% | 9.31 | +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 70.66% | 17.14 | | patched | 91.15% | 12.57 | +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 1.06% | 91.54 | | patched | 99.97% | 14.18 | +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-26KGDB: make kgdb_breakpoint() as noinlineVijaya Kumar K1-1/+1
The function kgdb_breakpoint() sets up break point at compile time by calling arch_kgdb_breakpoint(); Though this call is surrounded by wmb() barrier, the compile can still re-order the break point, because this scheduling barrier is not a code motion barrier in gcc. Making kgdb_breakpoint() as noinline solves this problem of code reording around break point instruction and also avoids problem of being called as inline function from other places More details about discussion on this can be found here http://comments.gmane.org/gmane.linux.ports.arm.kernel/269732 Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-25kgdb/kdb: Fix no KDB config problemMike Travis1-2/+3
Some code added to the debug_core module had KDB dependencies that it shouldn't have. Move the KDB dependent REASON back to the caller to remove the dependency in the debug core code. Update the call from the UV NMI handler to conform to the new interface. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.318251993@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03kdb: Add support for external NMI handler to call KGDB/KDBMike Travis1-2/+30
This patch adds a kgdb_nmicallin() interface that can be used by external NMI handlers to call the KGDB/KDB handler. The primary need for this is for those types of NMI interrupts where all the CPUs have already received the NMI signal. Therefore no send_IPI(NMI) is required, and in fact it will cause a 2nd unhandled NMI to occur. This generates the "Dazed and Confuzed" messages. Since all the CPUs are getting the NMI at roughly the same time, it's not guaranteed that the first CPU that hits the NMI handler will manage to enter KGDB and set the dbg_master_lock before the slaves start entering. The new argument "send_ready" was added for KGDB to signal the NMI handler to release the slave CPUs for entry into KGDB. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Jason Wessel <jason.wessel@windriver.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20131002151417.928886849@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-30kgdb/sysrq: fix inconstistent help message of sysrq keyzhangwei(Jovi)1-1/+1
Currently help message of /proc/sysrq-trigger highlight its upper-case characters, like below: SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) ... this would confuse user trigger sysrq by upper-case character, which is inconsistent with the real lower-case character registed key. This inconsistent help message will also lead more confused when 26 upper-case letters put into use in future. This patch fix kgdb sysrq key: "debug(g)" Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-04kgdb: remove #include <linux/serial_8250.h> from kgdb.hGreg Kroah-Hartman1-0/+1
There's no reason kgdb.h itself needs to include the 8250 serial port header file. So push it down to the _very_ limited number of individual drivers that need the values in that file, and fix up the places where people really wanted serial_core.h and platform_device.h. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13Merge tag 'for_linus-3.7' of ↵Linus Torvalds1-0/+18
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB fixes and cleanups from Jason Wessel: "Cleanups - Clean up compile warnings in kgdboc.c and x86/kernel/kgdb.c - Add module event hooks for simplified debugging with gdb Fixes - Fix kdb to stop paging with 'q' on bta and dmesg - Fix for data that scrolls off the vga console due to line wrapping when using the kdb pager New - The debug core registers for kernel module events which allows a kernel aware gdb to automatically load symbols and break on entry to a kernel module - Allow kgdboc=kdb to setup kdb on the vga console" * tag 'for_linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: tty/console: fix warnings in drivers/tty/serial/kgdboc.c kdb,vt_console: Fix missed data due to pager overruns kdb: Fix dmesg/bta scroll to quit with 'q' kgdboc: Accept either kbd or kdb to activate the vga + keyboard kdb shell kgdb,x86: fix warning about unused variable mips,kgdb: fix recursive page fault with CONFIG_KPROBES kgdb: Add module event hooks
2012-10-12kgdb: Add module event hooksJason Wessel1-0/+18
Allow gdb to auto load kernel modules when it is attached, which makes it trivially easy to debug module init functions or pre-set breakpoints in a kernel module that has not loaded yet. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-09-26kernel/debug: Mask KGDB NMI upon entryAnton Vorontsov1-3/+11
The new arch callback should manage NMIs that usually cause KGDB to enter. That is, not all NMIs should be enabled/disabled, but only those that issue kgdb_handle_exception(). We must mask it as serial-line interrupt can be used as an NMI, so if the original KGDB-entry cause was say a breakpoint, then every input to KDB console will cause KGDB to reenter, which we don't want. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-04Merge tag 'for_linus-3.4-rc2' of ↵Linus Torvalds1-29/+24
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB regression fixes from Jason Wessel: - Fix a Smatch warning that appeared in the 3.4 merge window - Fix kgdb test suite with SMP for all archs without HW single stepping - Fix kgdb sw breakpoints with CONFIG_DEBUG_RODATA=y limitations on x86 - Fix oops on kgdb test suite with CONFIG_DEBUG_RODATA - Fix kgdb test suite with SMP for all archs with HW single stepping * tag 'for_linus-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: x86,kgdb: Fix DEBUG_RODATA limitation using text_poke() kgdb,debug_core: pass the breakpoint struct instead of address and memory kgdbts: (2 of 2) fix single step awareness to work correctly with SMP kgdbts: (1 of 2) fix single step awareness to work correctly with SMP kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATA kdb: Fix smatch warning on dbg_io_ops->is_console
2012-03-29kgdb,debug_core: pass the breakpoint struct instead of address and memoryJason Wessel1-29/+24
There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Cc: stable@vger.kernel.org # >= 2.6.36 Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Remove all #inclusions of asm/system.hDavid Howells1-1/+0
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-22kgdb,debug_core: add the ability to control the reboot notifierJason Wessel1-0/+1