linux.git/kernel/kexec.c, branch v6.6.132

kernel: kexec: copy user-array safely

2023-11-28T17:19:40+00:00

[ Upstream commit 569c8d82f95eb5993c84fb61a649a9c4ddd208b3 ]

Currently, there is no overflow-check with memdup_user().

Use the new function memdup_array_user() instead of memdup_user() for
duplicating the user-space array safely.

Suggested-by: David Airlie 
Signed-off-by: Philipp Stanner 
Acked-by: Baoquan He 
Reviewed-by: Kees Cook 
Reviewed-by: Zack Rusin 
Signed-off-by: Dave Airlie 
Link: https://patchwork.freedesktop.org/patch/msgid/20230920123612.16914-4-pstanner@redhat.com
Signed-off-by: Sasha Levin

crash: hotplug support for kexec_load()

2023-08-24T23:25:14+00:00

The hotplug support for kexec_load() requires changes to the userspace
kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a subsequent
hotplug event, the crash hotplug handler finds the elfcorehdr and rewrites
it to reflect the hotplug change.  That is the desired outcome, however,
at kernel panic time, the purgatory integrity check fails (because the
elfcorehdr changed), and the capture kernel does not boot and no vmcore is
generated.

Therefore, the userspace kexec-tools/kexec must indicate to the kernel
that the elfcorehdr can be modified (because the kexec excluded the
elfcorehdr from the digest, and sized the elfcorehdr memory buffer
appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 -------+-----+----
 Old    |Old  |New
        |  a  | a
 -------+-----+----
 New    |  a  | b
 -------+-----+----

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and 'new'
delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump image. 
For the kexec 'old' column, the unload-then-reload occurs due to the
missing flag KEXEC_UPDATE_ELFCOREHDR.  An 'old' kernel (with 'new' kexec)
does not present the crash_hotplug sysfs node, which leads to the
unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload of
the kdump image.

If the udev rule is not updated with crash_hotplug node check, then no
matter any combination of kernel or kexec is new or old, the kdump image
continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout of
kernel, kexec-tools and udev rule changes.  However, the order of the
rollout of these pieces does not matter; kexec_load()'d kdump images still
function for hotplug as-is.

Link: https://lkml.kernel.org/r/20230814214446.6659-7-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder 
Suggested-by: Hari Bathini 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
Cc: Akhil Raj 
Cc: Bjorn Helgaas 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric W. Biederman 
Cc: Greg Kroah-Hartman 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Konrad Rzeszutek Wilk 
Cc: Mimi Zohar 
Cc: Naveen N. Rao 
Cc: Oscar Salvador 
Cc: "Rafael J. Wysocki" 
Cc: Sean Christopherson 
Cc: Sourabh Jain 
Cc: Takashi Iwai 
Cc: Thomas Gleixner 
Cc: Thomas Weißschuh 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton

kexec: introduce sysctl parameters kexec_load_limit_*

2023-02-03T06:50:05+00:00

kexec allows replacing the current kernel with a different one.  This is
usually a source of concerns for sysadmins that want to harden a system.

Linux already provides a way to disable loading new kexec kernel via
kexec_load_disabled, but that control is very coard, it is all or nothing
and does not make distinction between a panic kexec and a normal kexec.

This patch introduces new sysctl parameters, with finer tuning to specify
how many times a kexec kernel can be loaded.  The sysadmin can set
different limits for kexec panic and kexec reboot kernels.  The value can
be modified at runtime via sysctl, but only with a stricter value.

With these new parameters on place, a system with loadpin and verity
enabled, using the following kernel parameters:
sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a
good warranty that if initrd tries to load a panic kernel, a malitious
user will have small chances to replace that kernel with a different one,
even if they can trigger timeouts on the disk where the panic kernel
lives.

Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org
Signed-off-by: Ricardo Ribalda 
Reviewed-by: Steven Rostedt (Google) 
Acked-by: Baoquan He 
Cc: Bagas Sanjaya 
Cc: "Eric W. Biederman" 
Cc: Guilherme G. Piccoli  # Steam Deck
Cc: Joel Fernandes (Google) 
Cc: Jonathan Corbet 
Cc: Philipp Rudo 
Cc: Ross Zwisler 
Cc: Sergey Senozhatsky 
Signed-off-by: Andrew Morton

kexec: factor out kexec_load_permitted

2023-02-03T06:50:04+00:00

Both syscalls (kexec and kexec_file) do the same check, let's factor it
out.

Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org
Signed-off-by: Ricardo Ribalda 
Reviewed-by: Steven Rostedt (Google) 
Acked-by: Baoquan He 
Cc: Bagas Sanjaya 
Cc: "Eric W. Biederman" 
Cc: Guilherme G. Piccoli 
Cc: Joel Fernandes (Google) 
Cc: Jonathan Corbet 
Cc: Philipp Rudo 
Cc: Ross Zwisler 
Cc: Sergey Senozhatsky 
Signed-off-by: Andrew Morton

panic, kexec: make __crash_kexec() NMI safe

2022-09-12T04:55:06+00:00

Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
of mutex_trylock():

	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		return 0;

This prevents an nmi_panic() from executing the main body of
__crash_kexec() which does the actual kexec into the kdump kernel.  The
warning and return are explained by:

  6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
  [...]
  The reasons for this are:

      1) There is a potential deadlock in the slowpath

      2) Another cpu which blocks on the rtmutex will boost the task
	 which allegedly locked the rtmutex, but that cannot work
	 because the hard/softirq context borrows the task context.

Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
and replace it with an atomic variable.  This is somewhat overzealous as
*some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
like crash_shrink_memory()), but this has the benefit of involving a
single unified lock and preventing any future NMI-related surprises.

Tested by triggering NMI panics via:

  $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
  $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
  $ echo 1 > /proc/sys/kernel/panic

  $ ipmitool power diag

Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
Signed-off-by: Valentin Schneider 
Cc: Arnd Bergmann 
Cc: Baoquan He 
Cc: "Eric W . Biederman" 
Cc: Juri Lelli 
Cc: Luis Claudio R. Goncalves 
Cc: Miaohe Lin 
Cc: Petr Mladek 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton

kexec: avoid compat_alloc_user_space

2021-09-08T22:32:34+00:00

kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load()
uses compat_alloc_user_space() to convert the layout and put it back onto
the user space caller stack.

Moving the user space access into the syscall handler directly actually
makes the code simpler, as the conversion for compat mode can now be done
on kernel memory.

Link: https://lkml.kernel.org/r/20210727144859.4150043-3-arnd@kernel.org
Link: https://lore.kernel.org/lkml/YPbtsU4GX6PL7%2F42@infradead.org/
Link: https://lore.kernel.org/lkml/m1y2cbzmnw.fsf@fess.ebiederm.org/
Signed-off-by: Arnd Bergmann 
Co-developed-by: Eric Biederman 
Co-developed-by: Christoph Hellwig 
Acked-by: "Eric W. Biederman" 
Cc: Al Viro 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Christian Borntraeger 
Cc: Christoph Hellwig 
Cc: "David S. Miller" 
Cc: Feng Tang 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: "James E.J. Bottomley" 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Vasily Gorbik 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kexec: move locking into do_kexec_load

2021-09-08T22:32:34+00:00

Patch series "compat: remove compat_alloc_user_space", v5.

Going through compat_alloc_user_space() to convert indirect system call
arguments tends to add complexity compared to handling the native and
compat logic in the same code.

This patch (of 6):

The locking is the same between the native and compat version of
sys_kexec_load(), so it can be done in the common implementation to reduce
duplication.

Link: https://lkml.kernel.org/r/20210727144859.4150043-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20210727144859.4150043-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann 
Co-developed-by: Eric Biederman 
Co-developed-by: Christoph Hellwig 
Acked-by: "Eric W. Biederman" 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Thomas Bogendoerfer 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Al Viro 
Cc: Feng Tang 
Cc: Christoph Hellwig 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

LSM: Introduce kernel_post_load_data() hook

2020-10-05T11:37:03+00:00

There are a few places in the kernel where LSMs would like to have
visibility into the contents of a kernel buffer that has been loaded or
read. While security_kernel_post_read_file() (which includes the
buffer) exists as a pairing for security_kernel_read_file(), no such
hook exists to pair with security_kernel_load_data().

Earlier proposals for just using security_kernel_post_read_file() with a
NULL file argument were rejected (i.e. "file" should always be valid for
the security_..._file hooks, but it appears at least one case was
left in the kernel during earlier refactoring. (This will be fixed in
a subsequent patch.)

Since not all cases of security_kernel_load_data() can have a single
contiguous buffer made available to the LSM hook (e.g. kexec image
segments are separately loaded), there needs to be a way for the LSM to
reason about its expectations of the hook coverage. In order to handle
this, add a "contents" argument to the "kernel_load_data" hook that
indicates if the newly added "kernel_post_load_data" hook will be called
with the full contents once loaded. That way, LSMs requiring full contents
can choose to unilaterally reject "kernel_load_data" with contents=false
(which is effectively the existing hook coverage), but when contents=true
they can allow it and later evaluate the "kernel_post_load_data" hook
once the buffer is loaded.

With this change, LSMs can gain coverage over non-file-backed data loads
(e.g. init_module(2) and firmware userspace helper), which will happen
in subsequent patches.

Additionally prepare IMA to start processing these cases.

Signed-off-by: Kees Cook 
Reviewed-by: KP Singh 
Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman

kexec: add machine_kexec_post_load()

2020-01-08T16:32:55+00:00

It is the same as machine_kexec_prepare(), but is called after segments are
loaded. This way, can do processing work with already loaded relocation
segments. One such example is arm64: it has to have segments loaded in
order to create a page table, but it cannot do it during kexec time,
because at that time allocations won't be possible anymore.

Signed-off-by: Pavel Tatashin 
Acked-by: Dave Young 
Signed-off-by: Will Deacon

kexec_load: Disable at runtime if the kernel is locked down

2019-08-20T04:54:15+00:00

The kexec_load() syscall permits the loading and execution of arbitrary
code in ring 0, which is something that lock-down is meant to prevent. It
makes sense to disable kexec_load() in this situation.

This does not affect kexec_file_load() syscall which can check for a
signature on the image to be booted.

Signed-off-by: David Howells 
Signed-off-by: Matthew Garrett 
Acked-by: Dave Young 
Reviewed-by: Kees Cook 
cc: kexec@lists.infradead.org
Signed-off-by: James Morris