| Age | Commit message (Collapse) | Author | Files | Lines |
|
mmap_base by default"
commit 14d7c92f8df9c0964ae6f8b813c1b3ac38120825 upstream.
This reverts commit 3afb76a66b5559a7b595155803ce23801558a7a9.
This was a wrongheaded workaround for an issue that had already been
fixed much better by commit 4ef9ad19e176 ("mm: huge_memory: don't force
huge page alignment on 32 bit").
Asking users questions at kernel compile time that they can't make sense
of is not a viable strategy. And the fact that even the kernel VM
maintainers apparently didn't catch that this "fix" is not a fix any
more pretty much proves the point that people can't be expected to
understand the implications of the question.
It may well be the case that we could improve things further, and that
__thp_get_unmapped_area() should take the mapping randomization into
account even for 64-bit kernels. Maybe we should not be so eager to use
THP mappings.
But in no case should this be a kernel config option.
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
default
commit 3afb76a66b5559a7b595155803ce23801558a7a9 upstream.
An ASLR regression was noticed [1] and tracked down to file-mapped areas
being backed by THP in recent kernels. The 21-bit alignment constraint
for such mappings reduces the entropy for randomizing the placement of
64-bit library mappings and breaks ASLR completely for 32-bit libraries.
The reported issue is easily addressed by increasing vm.mmap_rnd_bits and
vm.mmap_rnd_compat_bits. This patch just provides a simple way to set
ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values
allowed by the architecture at build time.
[1] https://zolutal.github.io/aslrnt/
[akpm@linux-foundation.org: default to `y' if 32-bit, per Rafael]
Link: https://lkml.kernel.org/r/20240606180622.102099-1-aquini@redhat.com
Fixes: 1854bc6e2420 ("mm/readahead: Align file mappings for non-DAX")
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fe42754b94a42d08cf9501790afc25c4f6a5f631 upstream.
Rename x86's to CPU_MITIGATIONS, define it in generic code, and force it
on for all architectures exception x86. A recent commit to turn
mitigations off by default if SPECULATION_MITIGATIONS=n kinda sorta
missed that "cpu_mitigations" is completely generic, whereas
SPECULATION_MITIGATIONS is x86-specific.
Rename x86's SPECULATIVE_MITIGATIONS instead of keeping both and have it
select CPU_MITIGATIONS, as having two configs for the same thing is
unnecessary and confusing. This will also allow x86 to use the knob to
manage mitigations that aren't strictly related to speculative
execution.
Use another Kconfig to communicate to common code that CPU_MITIGATIONS
is already defined instead of having x86's menu depend on the common
CPU_MITIGATIONS. This allows keeping a single point of contact for all
of x86's mitigations, and it's not clear that other architectures *want*
to allow disabling mitigations at compile-time.
Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n")
Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240420000556.2645001-2-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6f9dc684cae638dda0570154509884ee78d0f75c upstream.
The shadow call stack implementation fails to build without CONFIG_MMU:
ld.lld: error: undefined symbol: vfree_atomic
>>> referenced by scs.c
>>> kernel/scs.o:(scs_free) in archive vmlinux.a
Link: https://lkml.kernel.org/r/20240122175204.2371009-1-samuel.holland@sifive.com
Fixes: a2abe7cbd8fe ("scs: switch to vmapped shadow stacks")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options")
- kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
couple of macros to args.h")
- gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
commands")
- vsprintf inclusion rationalization from Andy Shevchenko
("lib/vsprintf: Rework header inclusions")
- Switch the handling of kdump from a udev scheme to in-kernel
handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
hot un/plug")
- Many singleton patches to various parts of the tree
* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
document while_each_thread(), change first_tid() to use for_each_thread()
drivers/char/mem.c: shrink character device's devlist[] array
x86/crash: optimize CPU changes
crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
crash: hotplug support for kexec_load()
x86/crash: add x86 crash hotplug support
crash: memory and CPU hotplug sysfs attributes
kexec: exclude elfcorehdr from the segment digest
crash: add generic infrastructure for crash hotplug support
crash: move a few code bits to setup support of crash hotplug
kstrtox: consistently use _tolower()
kill do_each_thread()
nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
scripts/bloat-o-meter: count weak symbol sizes
treewide: drop CONFIG_EMBEDDED
lockdep: fix static memory detection even more
lib/vsprintf: declare no_hash_pointers in sprintf.h
lib/vsprintf: split out sprintf() and friends
kernel/fork: stop playing lockless games for exe_file replacement
adfs: delete unused "union adfs_dirtail" definition
...
|
|
Both KASAN and KCOV had issues with LTO_CLANG if DEBUG_INFO is enabled.
With LTO inlinable function calls are required to have debug info if they
are inlined into a function that has debug info.
Starting with LLVM 17 this will be fixed ([1],[2]) and enabling LTO with
KASAN/KCOV and DEBUG_INFO doesn't cause linker errors anymore.
Link: https://github.com/llvm/llvm-project/commit/913f7e93dac67ecff47bade862ba42f27cb68ca9
Link: https://github.com/llvm/llvm-project/commit/4a8b1249306ff11f229320abdeadf0c215a00400
Link: https://lkml.kernel.org/r/20230717-enable-kasan-lto1-v3-1-650e1efc19d1@gmail.com
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jakob Koschel <jkl820.git@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6.
The Kconfig is refactored to consolidate KEXEC and CRASH options from
various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec.
The Kconfig.kexec is now a submenu titled "Kexec and crash features"
located under "General Setup".
The following options are impacted:
- KEXEC
- KEXEC_FILE
- KEXEC_SIG
- KEXEC_SIG_FORCE
- KEXEC_IMAGE_VERIFY_SIG
- KEXEC_BZIMAGE_VERIFY_SIG
- KEXEC_JUMP
- CRASH_DUMP
Over time, these options have been copied between Kconfig files and
are very similar to one another, but with slight differences.
The following architectures are impacted by the refactor (because of
use of one or more KEXEC/CRASH options):
- arm
- arm64
- ia64
- loongarch
- m68k
- mips
- parisc
- powerpc
- riscv
- s390
- sh
- x86
More information:
In the patch series "crash: Kernel handling of CPU and memory hot
un/plug"
https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devolder@oracle.com/
the new kernel feature introduces the config option CRASH_HOTPLUG.
In reviewing, Thomas Gleixner requested that the new config option
not be placed in x86 Kconfig. Rather the option needs a generic/common
home. To Thomas' point, the KEXEC and CRASH options have largely been
duplicated in the various arch/<arch>/Kconfig files, with minor
differences. This kind of proliferation is to be avoid/stopped.
https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/
To that end, I have refactored the arch Kconfigs so as to consolidate
the various KEXEC and CRASH options. Generally speaking, this work has
the following themes:
- KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec
- These items from arch/Kconfig:
CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC
- These items from arch/x86/Kconfig form the common options:
KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE
KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP
- These items from arch/arm64/Kconfig form the common options:
KEXEC_IMAGE_VERIFY_SIG
- The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec
- The Kconfig.kexec is now a submenu titled "Kexec and crash features"
and is now listed in "General Setup" submenu from init/Kconfig.
- To control the common options, each has a new ARCH_SUPPORTS_<option>
option. These gateway options determine whether the common options
options are valid for the architecture.
- To account for the slight differences in the original architecture
coding of the common options, each now has a corresponding
ARCH_SELECTS_<option> which are used to elicit the same side effects
as the original arch/<arch>/Kconfig files for KEXEC and CRASH options.
An example, 'make menuconfig' illustrating the submenu:
> General setup > Kexec and crash features
[*] Enable kexec system call
[*] Enable kexec file based system call
[*] Verify kernel signature during kexec_file_load() syscall
[ ] Require a valid signature in kexec_file_load() syscall
[ ] Enable bzImage signature verification support
[*] kexec jump
[*] kernel crash dumps
[*] Update the crash elfcorehdr on system configuration changes
In the process of consolidating the common options, I encountered
slight differences in the coding of these options in several of the
architectures. As a result, I settled on the following solution:
- Each of the common options has a 'depends on ARCH_SUPPORTS_<option>'
statement. For example, the KEXEC_FILE option has a 'depends on
ARCH_SUPPORTS_KEXEC_FILE' statement.
This approach is needed on all common options so as to prevent
options from appearing for architectures which previously did
not allow/enable them. For example, arm supports KEXEC but not
KEXEC_FILE. The arch/arm/Kconfig does not provide
ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options
are not available to arm.
- The boolean ARCH_SUPPORTS_<option> in effect allows the arch to
determine when the feature is allowed. Archs which don't have the
feature simply do not provide the corresponding ARCH_SUPPORTS_<option>.
For each arch, where there previously were KEXEC and/or CRASH
options, these have been replaced with the corresponding boolean
ARCH_SUPPORTS_<option>, and an appropriate def_bool statement.
For example, if the arch supports KEXEC_FILE, then the
ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits
the KEXEC_FILE option to be available.
If the arch has a 'depends on' statement in its original coding
of the option, then that expression becomes part of the def_bool
expression. For example, arm64 had:
config KEXEC
depends on PM_SLEEP_SMP
and in this solution, this converts to:
config ARCH_SUPPORTS_KEXEC
def_bool PM_SLEEP_SMP
- In order to account for the architecture differences in the
coding for the common options, the ARCH_SELECTS_<option> in the
arch/<arch>/Kconfig is used. This option has a 'depends on
<option>' statement to couple it to the main option, and from
there can insert the differences from the common option and the
arch original coding of that option.
For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and
'select CRYPTO' and 'select CRYPTO_SHA256' statements.
Illustrating the option relationships:
For each of the common KEXEC and CRASH options:
ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option>
<option> # in Kconfig.kexec
ARCH_SUPPORTS_<option> # in arch/<arch>/Kconfig, as needed
ARCH_SELECTS_<option> # in arch/<arch>/Kconfig, as needed
For example, KEXEC:
ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC
KEXEC # in Kconfig.kexec
ARCH_SUPPORTS_KEXEC # in arch/<arch>/Kconfig, as needed
ARCH_SELECTS_KEXEC # in arch/<arch>/Kconfig, as needed
To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
select statements).
Examples:
A few examples to show the new strategy in action:
===== x86 (minus the help section) =====
Original:
config KEXEC
bool "kexec system call"
select KEXEC_CORE
config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
select HAVE_IMA_KEXEC if IMA
depends on X86_64
depends on CRYPTO=y
depends on CRYPTO_SHA256=y
config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
config KEXEC_SIG_FORCE
bool "Require a valid signature in kexec_file_load() syscall"
depends on KEXEC_SIG
config KEXEC_BZIMAGE_VERIFY_SIG
bool "Enable bzImage signature verification support"
depends on KEXEC_SIG
depends on SIGNED_PE_FILE_VERIFICATION
select SYSTEM_TRUSTED_KEYRING
config CRASH_DUMP
bool "kernel crash dumps"
depends on X86_64 || (X86_32 && HIGHMEM)
config KEXEC_JUMP
bool "kexec jump"
depends on KEXEC && HIBERNATION
help
becomes...
New:
config ARCH_SUPPORTS_KEXEC
def_bool y
config ARCH_SUPPORTS_KEXEC_FILE
def_bool X86_64 && CRYPTO && CRYPTO_SHA256
config ARCH_SELECTS_KEXEC_FILE
def_bool y
depends on KEXEC_FILE
select HAVE_IMA_KEXEC if IMA
config ARCH_SUPPORTS_KEXEC_PURGATORY
def_bool KEXEC_FILE
config ARCH_SUPPORTS_KEXEC_SIG
def_bool y
config ARCH_SUPPORTS_KEXEC_SIG_FORCE
def_bool y
config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
def_bool y
config ARCH_SUPPORTS_KEXEC_JUMP
def_bool y
config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
===== powerpc (minus the help section) =====
Original:
config KEXEC
bool "kexec system call"
depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
select KEXEC_CORE
config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
select HAVE_IMA_KEXEC if IMA
select KEXEC_ELF
depends on PPC64
depends on CRYPTO=y
depends on CRYPTO_SHA256=y
config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
config CRASH_DUMP
bool "Build a dump capture kernel"
depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
select RELOCATABLE if PPC64 || 44x || PPC_85xx
becomes...
New:
config ARCH_SUPPORTS_KEXEC
def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
config ARCH_SUPPORTS_KEXEC_FILE
def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
config ARCH_SUPPORTS_KEXEC_PURGATORY
def_bool KEXEC_FILE
config ARCH_SELECTS_KEXEC_FILE
def_bool y
depends on KEXEC_FILE
select KEXEC_ELF
select HAVE_IMA_KEXEC if IMA
config ARCH_SUPPORTS_CRASH_DUMP
def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
config ARCH_SELECTS_CRASH_DUMP
def_bool y
depends on CRASH_DUMP
select RELOCATABLE if PPC64 || 44x || PPC_85xx
Testing Approach and Results
There are 388 config files in the arch/<arch>/configs directories.
For each of these config files, a .config is generated both before and
after this Kconfig series, and checked for equivalence. This approach
allows for a rather rapid check of all architectures and a wide
variety of configs wrt/ KEXEC and CRASH, and avoids requiring
compiling for all architectures and running kernels and run-time
testing.
For each config file, the olddefconfig, allnoconfig and allyesconfig
targets are utilized. In testing the randconfig has revealed problems
as well, but is not used in the before and after equivalence check
since one can not generate the "same" .config for before and after,
even if using the same KCONFIG_SEED since the option list is
different.
As such, the following script steps compare the before and after
of 'make olddefconfig'. The new symbols introduced by this series
are filtered out, but otherwise the config files are PASS only if
they were equivalent, and FAIL otherwise.
The script performs the test by doing the following:
# Obtain the "golden" .config output for given config file
# Reset test sandbox
git checkout master
git branch -D test_Kconfig
git checkout -B test_Kconfig master
make distclean
# Write out updated config
cp -f <config file> .config
make ARCH=<arch> olddefconfig
# Track each item in .config, LHSB is "golden"
scoreboard .config
# Obtain the "changed" .config output for given config file
# Reset test sandbox
make distclean
# Apply this Kconfig series
git am <this Kconfig series>
# Write out updated config
cp -f <config file> .config
make ARCH=<arch> olddefconfig
# Track each item in .config, RHSB is "changed"
scoreboard .config
# Determine test result
# Filter-out new symbols introduced by this series
# Filter-out symbol=n which not in either scoreboard
# Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL
The script was instrumental during the refactoring of Kconfig as it
continually revealed problems. The end result being that the solution
presented in this series passes all configs as checked by the script,
with the following exceptions:
- arch/ia64/configs/zx1_config with olddefconfig
This config file has:
# CONFIG_KEXEC is not set
CONFIG_CRASH_DUMP=y
and this refactor now couples KEXEC to CRASH_DUMP, so it is not
possible to enable CRASH_DUMP without KEXEC.
- arch/sh/configs/* with allyesconfig
The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU
(which clearly is not meant to be set). This symbol is not provided
but with the allyesconfig it is set to yes which enables CRASH_DUMP.
But KEXEC is coded as dependent upon MMU, and is set to no in
arch/sh/mm/Kconfig, so KEXEC is not enabled.
This refactor now couples KEXEC to CRASH_DUMP, so it is not
possible to enable CRASH_DUMP without KEXEC.
While the above exceptions are not equivalent to their original,
the config file produced is valid (and in fact better wrt/ CRASH_DUMP
handling).
This patch (of 14)
The config options for kexec and crash features are consolidated
into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
is a new submenu "Kexec and crash handling". All the kexec and
crash options that were once in the arch-dependent submenu "Processor
type and features" are now consolidated in the new submenu.
The following options are impacted:
- KEXEC
- KEXEC_FILE
- KEXEC_SIG
- KEXEC_SIG_FORCE
- KEXEC_BZIMAGE_VERIFY_SIG
- KEXEC_JUMP
- CRASH_DUMP
The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.
Architectures specify support of certain KEXEC and CRASH features with
similarly named new ARCH_SUPPORTS_<option> config options.
Architectures can utilize the new ARCH_SELECTS_<option> config
options to specify additional components when <option> is enabled.
To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
select statements).
Link: https://lkml.kernel.org/r/20230712161545.87870-1-eric.devolder@oracle.com
Link: https://lkml.kernel.org/r/20230712161545.87870-2-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cc. "H. Peter Anvin" <hpa@zytor.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com> # for x86
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Juerg Haefliger <juerg.haefliger@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Marc Aurèle La France <tsi@tuyoix.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Xin Li <xin3.li@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Some architectures allows partial SMT states, i.e. when not all SMT threads
are brought online.
To support that, add an architecture helper which checks whether a given
CPU is allowed to be brought online depending on how many SMT threads are
currently enabled. Since this is only applicable to architecture supporting
partial SMT, only these architectures should select the new configuration
variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not
supporting the partial SMT states, there is no need to define
topology_cpu_smt_allowed(), the generic code assumed that all the threads
are allowed or only the primary ones.
Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.
[ ldufour: Slightly reword the commit's description ]
[ ldufour: Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230705145143.40545-7-ldufour@linux.ibm.com
|
|
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.
One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.
But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.
So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.
Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
create a new arch config HAS_HUGE_PAGE that can be used to tell if
pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
compiler would not be able to find pmd_mkwrite_novma().
No functional change.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/
Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
directories
- Douglas Anderson has added a new "buddy" mode to the hardlockup
detector. It permits the detector to work on architectures which
cannot provide the required interrupts, by having CPUs periodically
perform checks on other CPUs
- Zhen Lei has enhanced kexec's ability to support two crash regions
- Petr Mladek has done a lot of cleanup on the hard lockup detector's
Kconfig entries
- And the usual bunch of singleton patches in various places
* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
kernel/time/posix-stubs.c: remove duplicated include
ocfs2: remove redundant assignment to variable bit_off
watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
devres: show which resource was invalid in __devm_ioremap_resource()
watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
watchdog/hardlockup: make the config checks more straightforward
watchdog/hardlockup: sort hardlockup detector related config values a logical way
watchdog/hardlockup: move SMP barriers from common code to buddy code
watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"Add support for Landlock to UML.
To do this, this fixes the way hostfs manages inodes according to the
underlying filesystem [1]. They are now properly handled as for other
filesystems, which enables Landlock support (and probably other
features).
This also extends Landlock's tests with 6 pseudo filesystems,
including hostfs"
[1] https://lore.kernel.org/all/20230612191430.339153-1-mic@digikod.net/
* tag 'landlock-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add hostfs tests
selftests/landlock: Add tests for pseudo filesystems
selftests/landlock: Make mounts configurable
selftests/landlock: Add supports_filesystem() helper
selftests/landlock: Don't create useless file layouts
hostfs: Fix ephemeral inodes
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
"A large update for SMP management:
- Parallel CPU bringup
The reason why people are interested in parallel bringup is to
shorten the (kexec) reboot time of cloud servers to reduce the
downtime of the VM tenants.
The current fully serialized bringup does the following per AP:
1) Prepare callbacks (allocate, intialize, create threads)
2) Kick the AP alive (e.g. INIT/SIPI on x86)
3) Wait for the AP to report alive state
4) Let the AP continue through the atomic bringup
5) Let the AP run the threaded bringup to full online state
There are two significant delays:
#3 The time for an AP to report alive state in start_secondary()
on x86 has been measured in the range between 350us and 3.5ms
depending on vendor and CPU type, BIOS microcode size etc.
#4 The atomic bringup does the microcode update. This has been
measured to take up to ~8ms on the primary threads depending
on the microcode patch size to apply.
On a two socket SKL server with 56 cores (112 threads) the boot CPU
spends on current mainline about 800ms busy waiting for the APs to
come up and apply microcode. That's more than 80% of the actual
onlining procedure.
This can be reduced significantly by splitting the bringup
mechanism into two parts:
1) Run the prepare callbacks and kick the AP alive for each AP
which needs to be brought up.
The APs wake up, do their firmware initialization and run the
low level kernel startup code including microcode loading in
parallel up to the first synchronization point. (#1 and #2
above)
2) Run the rest of the bringup code strictly serialized per CPU
(#3 - #5 above) as it's done today.
Parallelizing that stage of the CPU bringup might be possible
in theory, but it's questionable whether required surgery
would be justified for a pretty small gain.
If the system is large enough the first AP is already waiting at
the first synchronization point when the boot CPU finished the
wake-up of the last AP. That reduces the AP bringup time on that
SKL from ~800ms to ~80ms, i.e. by a factor ~10x.
The actual gain varies wildly depending on the system, CPU,
microcode patch size and other factors. There are some
opportunities to reduce the overhead further, but that needs some
deep surgery in the x86 CPU bringup code.
For now this is only enabled on x86, but the core functionality
obviously works for all SMP capable architectures.
- Enhancements for SMP function call tracing so it is possible to
locate the scheduling and the actual execution points. That allows
to measure IPI delivery time precisely"
* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
trace,smp: Add tracepoints for scheduling remotelly called functions
trace,smp: Add tracepoints around remotelly called functions
MAINTAINERS: Add CPU HOTPLUG entry
x86/smpboot: Fix the parallel bringup decision
x86/realmode: Make stack lock work in trampoline_compat()
x86/smp: Initialize cpu_primary_thread_mask late
cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
x86/smpboot: Support parallel startup of secondary CPUs
x86/smpboot: Implement a bit spinlock to protect the realmode stack
x86/apic: Save the APIC virtual base address
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
x86/apic: Provide cpu_primary_thread mask
x86/smpboot: Enable split CPU startup
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
cpu/hotplug: Reset task stack state in _cpu_up()
cpu/hotplug: Remove unused state functions
riscv: Switch to hotplug core state synchronization
parisc: Switch to hotplug core state synchronization
...
|
|
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.
The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.
Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are four possible variants of hardlockup detectors:
+ buddy: available when SMP is set.
+ perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.
+ arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.
+ sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
The check for the sparc64 variant is more complicated because
HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific
and sparc64 specific variant. Therefore it is automatically
selected with HAVE_HARDLOCKUP_DETECTOR_ARCH.
This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH.
It reduces the size of some checks but it makes them harder to follow.
Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH
is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global
HARDLOCKUP_DETECTOR switch is enabled/disabled.
Make the logic more straightforward by the following changes:
+ Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and
HAVE_NMI_WATCHDOG in comments.
+ Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate
HAVE_* for all four hardlockup detector variants.
Use it in the other conditions instead of SMP. It makes it
clear that it is about the buddy detector.
+ Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR
and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand
the conditions between the four hardlockup detector variants.
+ Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY
can be enabled. It explains the dependency on the other
hardlockup detector variants.
Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply".
It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when
the global HARDLOCKUP_DETECTOR switch is changed.
+ Add dependency on HARDLOCKUP_DETECTOR so that the affected variables
disappear when the hardlockup detectors are disabled.
Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY
value is not preserved when the global switch is disabled.
The user has to make the decision again when it gets re-enabled.
Link: https://lkml.kernel.org/r/20230616150618.6073-3-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
watchdog_hardlockup_probe()
Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
architecture, we have some special case code in the watchdog core to
handle the fact that watchdog_hardlockup_probe() isn't implemented.
Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the
special case.
As a side effect of doing this, code inspection tells us that we could fix
a minor bug where the system won't properly realize that NMI watchdogs are
disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off
the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which
selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
nothing will override the "weak" watchdog_hardlockup_probe() and we'll
fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
check_bugs() has become a dumping ground for all sorts of activities to
finalize the CPU initialization before running the rest of the init code.
Most are empty, a few do actual bug checks, some do alternative patching
and some cobble a CPU advertisement string together....
Aside of that the current implementation requires duplicated function
declaration and mostly empty header files for them.
Provide a new function arch_cpu_finalize_init(). Provide a generic
declaration if CONFIG_ARCH_HAS_CPU_FINALIZE_INIT is selected and a stub
inline otherwise.
This requires a temporary #ifdef in start_kernel() which will be removed
along with check_bugs() once the architectures are converted over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224544.957805717@linutronix.de
|
|
hostfs creates a new inode for each opened or created file, which
created useless inode allocations and forbade identifying a host file
with a kernel inode.
Fix this uncommon filesystem behavior by tying kernel inodes to host
file's inode and device IDs. Even if the host filesystem inodes may be
recycled, this cannot happen while a file referencing it is opened,
which is the case with hostfs. It should be noted that hostfs inode IDs
may not be unique for the same hostfs superblock because multiple host's
(backed) superblocks may be used.
Delete inodes when dropping them to force backed host's file descriptors
closing.
This enables to entirely remove ARCH_EPHEMERAL_INODES, and then makes
Landlock fully supported by UML. This is very useful for testing
changes.
These changes also factor out and simplify some helpers thanks to the
new hostfs_inode_update() and the hostfs_iget() revamp: read_name(),
hostfs_create(), hostfs_lookup(), hostfs_mknod(), and
hostfs_fill_sb_common().
A following commit with new Landlock tests check this new hostfs inode
consistency.
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Richard Weinberger <richard@nod.at>
Link: https://lore.kernel.org/r/20230612191430.339153-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
There is often significant latency in the early stages of CPU bringup, and
time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
then waiting for it to respond before moving on to the next.
Allow a platform to enable parallel setup which brings all to be onlined
CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
control CPU (BP) is single-threaded the important part is the last state
CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.
This allows the CPUs to run up to the first sychronization point
cpuhp_ap_sync_alive() where they wait for the control CPU to release them
one by one for the full onlining procedure.
This parallelism depends on the CPU hotplug core sync mechanism which
ensures that the parallel brought up CPUs wait for release before touching
any state which would make the CPU visible to anything outside the hotplug
control mechanism.
To handle the SMT constraints of X86 correctly the bringup happens in two
iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up
the primary SMT threads of each core first, which can load the microcode
without the need to rendevouz with the thread siblings. Once that's
completed it brings up the secondary SMT threads.
Co-developed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
|
|
The bring up logic of a to be onlined CPU consists of several parts, which
are considered to be a single hotplug state:
1) Control CPU issues the wake-up
2) To be onlined CPU starts up, does the minimal initialization,
reports to be alive and waits for release into the complete bring-up.
3) Control CPU waits for the alive report and releases the upcoming CPU
for the complete bring-up.
Allow to split this into two states:
1) Control CPU issues the wake-up
After that the to be onlined CPU starts up, does the minimal
initialization, reports to be alive and waits for release into the
full bring-up. As this can run after the control CPU dropped the
hotplug locks the code which is executed on the AP before it reports
alive has to be carefully audited to not violate any of the hotplug
constraints, especially not modifying any of the various cpumasks.
This is really only meant to avoid waiting for the AP to react on the
wake-up. Of course an architecture can move strict CPU related setup
functionality, e.g. microcode loading, with care before the
synchronization point to save further pointless waiting time.
2) Control CPU waits for the alive report and releases the upcoming CPU
for the complete bring-up.
This allows that the two states can be split up to run all to be onlined
CPUs up to state #1 on the control CPU and then at a later point run state
#2. This spares some of the latencies of the full serialized per CPU
bringup by avoiding the per CPU wakeup/wait serialization. The assumption
is that the first AP already waits when the last AP has been woken up. This
obvioulsy depends on the hardware latencies and depending on the timings
this might still not completely eliminate all wait scenarios.
This split is just a preparatory step for enabling the parallel bringup
later. The boot time bringup is st |