linux.git/include/linux, branch master

Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2024-11-10T17:04:27+00:00

Pull misc fixes from Andrew Morton:
 "20 hotfixes, 14 of which are cc:stable.

  Three affect DAMON. Lorenzo's five-patch series to address the
  mmap_region error handling is here also.

  Apart from that, various singletons"

* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: add entry for Thorsten Blum
  ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
  signal: restore the override_rlimit logic
  fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
  ucounts: fix counter leak in inc_rlimit_get_ucounts()
  selftests: hugetlb_dio: check for initial conditions to skip in the start
  mm: fix docs for the kernel parameter ``thp_anon=``
  mm/damon/core: avoid overflow in damon_feed_loop_next_input()
  mm/damon/core: handle zero schemes apply interval
  mm/damon/core: handle zero {aggregation,ops_update} intervals
  mm/mlock: set the correct prev on failure
  objpool: fix to make percpu slot allocation more robust
  mm/page_alloc: keep track of free highatomic
  mm: resolve faulty mmap_region() error path behaviour
  mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
  mm: refactor map_deny_write_exec()
  mm: unconditionally close VMAs on error
  mm: avoid unsafe VMA hook invocation when error arises on mmap hook
  mm/thp: fix deferred split unqueue naming and locking
  mm/thp: fix deferred split queue not partially_mapped

Merge tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

2024-11-08T23:08:23+00:00

Pull ACPI fix from Rafael Wysocki:
 "Fix the ACPI processor driver initialization ordering after recent
  changes to avoid calling init_freq_invariance_cppc() too early on AMD
  platforms (Mario Limonciello)"

* tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Move arch_init_invariance_cppc() call later

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2024-11-08T17:19:58+00:00

Pull arm64 fixes from Will Deacon:
 "Here is a (hopefully) final round of arm64 fixes for 6.12 that address
  some user-visible floating point register corruption. Both of the
  Marks have been working on this for a couple of weeks and we've ended
  up in a position where SVE is solid but SME still has enough pending
  issues that the most pragmatic solution for the release and stable
  backports is to disable the feature. Yes, it's a shame, but the
  hardware is rare as hen's teeth at the moment and we're better off
  getting back to a known good state before fixing it all properly.
  We're also improving the selftests for 6.13 to help avoid merging
  broken code in the future.

  Anyway, the good news is that we're removing a lot more code than
  we're adding.

  Summary:

   - Fix handling of SVE traps from userspace on preemptible kernels
     when converting the saved floating point state into SVE state.

   - Remove broken support for the SMCCCv1.3 "SVE discard hint"
     optimisation.

   - Disable SME support, as the current support code suffers from
     numerous issues around signal delivery, ptrace access and
     context-switch which can lead to user-visible corruption of the
     register state"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Kconfig: Make SME depend on BROKEN for now
  arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint
  arm64/sve: Discard stale CPU state when handling SVE traps

signal: restore the override_rlimit logic

2024-11-07T22:14:59+00:00

Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals.  However now it's enforced unconditionally, even if
override_rlimit is set.  This behavior change caused production issues.  

For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo. 
This prevents the process from correctly identifying the fault address and
handling the error.  From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'.  This can lead to unpredictable behavior and
crashes, as we observed with java applications.

Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set.  This effectively
restores the old behavior.

Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin 
Co-developed-by: Andrei Vagin 
Signed-off-by: Andrei Vagin 
Acked-by: Oleg Nesterov 
Acked-by: Alexey Gladkov 
Cc: Kees Cook 
Cc: "Eric W. Biederman" 
Cc: 
Signed-off-by: Andrew Morton

mm/page_alloc: keep track of free highatomic

2024-11-07T22:14:58+00:00

OOM kills due to vastly overestimated free highatomic reserves were
observed:

  ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
  Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
  Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB

The second line above shows that the OOM kill was due to the following
condition:

  free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)

And the third line shows there were no free pages in any
MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. 
Therefore __zone_watermark_unusable_free() underestimated the usable free
memory by over 1GB, which resulted in the unnecessary OOM kill above.

The comments in __zone_watermark_unusable_free() warns about the potential
risk, i.e.,

  If the caller does not have rights to reserves below the min
  watermark then subtract the high-atomic reserves. This will
  over-estimate the size of the atomic reserve but it avoids a search.

However, it is possible to keep track of free pages in reserved highatomic
pageblocks with a new per-zone counter nr_free_highatomic protected by the
zone lock, to avoid a search when calculating the usable free memory.  And
the cost would be minimal, i.e., simple arithmetics in the highatomic
alloc/free/move paths.

Note that since nr_free_highatomic can be relatively small, using a
per-cpu counter might cause too much drift and defeat its purpose, in
addition to the extra memory overhead.

Dependson e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") - see [1]

[akpm@linux-foundation.org: s/if/else if/, per Johannes, stealth whitespace tweak]
Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/0d0ddb33-fcdc-43e2-801f-0c1df2031afb@suse.cz [1]
Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Yu Zhao 
Reported-by: Link Lin 
Acked-by: David Rientjes 
Acked-by: Vlastimil Babka 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton

arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint

2024-11-07T11:18:52+00:00

SMCCCv1.3 added a hint bit which callers can set in an SMCCC function ID
(AKA "FID") to indicate that it is acceptable for the SMCCC
implementation to discard SVE and/or SME state over a specific SMCCC
call. The kernel support for using this hint is broken and SMCCC calls
may clobber the SVE and/or SME state of arbitrary tasks, though FPSIMD
state is unaffected.

The kernel support is intended to use the hint when there is no SVE or
SME state to save, and to do this it checks whether TIF_FOREIGN_FPSTATE
is set or TIF_SVE is clear in assembly code:

|        ldr     , [, #TSK_TI_FLAGS]
|        tbnz    , #TIF_FOREIGN_FPSTATE, 1f   // Any live FP state?
|        tbnz    , #TIF_SVE, 2f               // Does that state include SVE?
|
| 1:     orr     , , ARM_SMCCC_1_3_SVE_HINT
| 2:
|        << SMCCC call using FID >>

This is not safe as-is:

(1) SMCCC calls can be made in a preemptible context and preemption can
    result in TIF_FOREIGN_FPSTATE being set or cleared at arbitrary
    points in time. Thus checking for TIF_FOREIGN_FPSTATE provides no
    guarantee.

(2) TIF_FOREIGN_FPSTATE only indicates that the live FP/SVE/SME state in
    the CPU does not belong to the current task, and does not indicate
    that clobbering this state is acceptable.

    When the live CPU state is clobbered it is necessary to update
    fpsimd_last_state.st to ensure that a subsequent context switch will
    reload FP/SVE/SME state from memory rather than consuming the
    clobbered state. This and the SMCCC call itself must happen in a
    critical section with preemption disabled to avoid races.

(3) Live SVE/SME state can exist with TIF_SVE clear (e.g. with only
    TIF_SME set), and checking TIF_SVE alone is insufficient.

Remove the broken support for the SMCCCv1.3 SVE saving hint. This is
effectively a revert of commits:

* cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
* a7c3acca5380 ("arm64: smccc: Save lr before calling __arm_smccc_sve_check()")

... leaving behind the ARM_SMCCC_VERSION_1_3 and ARM_SMCCC_1_3_SVE_HINT
definitions, since these are simply definitions from the SMCCC
specification, and the latter is used in KVM via ARM_SMCCC_CALL_HINTS.

If we want to bring this back in future, we'll probably want to handle
this logic in C where we can use all the usual FPSIMD/SVE/SME helper
functions, and that'll likely require some rework of the SMCCC code
and/or its callers.

Fixes: cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
Signed-off-by: Mark Rutland 
Cc: Ard Biesheuvel 
Cc: Catalin Marinas 
Cc: Marc Zyngier 
Cc: Mark Brown 
Cc: Will Deacon 
Cc: stable@vger.kernel.org
Reviewed-by: Mark Brown 
Link: https://lore.kernel.org/r/20241106160448.2712997-1-mark.rutland@arm.com
Signed-off-by: Will Deacon

Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

2024-11-06T23:09:22+00:00

Pull NFS client fixes from Anna Schumaker:
 "These are mostly fixes that came up during the nfs bakeathon the other
  week.

  Stable Fixes:
   - Fix KMSAN warning in decode_getfattr_attrs()

  Other Bugfixes:
   - Handle -ENOTCONN in xs_tcp_setup_socked()
   - NFSv3: only use NFS timeout for MOUNT when protocols are compatible
   - Fix attribute delegation behavior on exclusive create and a/mtime
     changes
   - Fix localio to cope with racing nfs_local_probe()
   - Avoid i_lock contention in fs_clear_invalid_mapping()"

* tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  nfs: avoid i_lock contention in nfs_clear_invalid_mapping
  nfs_common: fix localio to cope with racing nfs_local_probe()
  NFS: Further fixes to attribute delegation a/mtime changes
  NFS: Fix attribute delegation behaviour on exclusive create
  nfs: Fix KMSAN warning in decode_getfattr_attrs()
  NFSv3: only use NFS timeout for MOUNT when protocols are compatible
  sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()

ACPI: processor: Move arch_init_invariance_cppc() call later

2024-11-06T20:31:36+00:00

arch_init_invariance_cppc() is called at the end of
acpi_cppc_processor_probe() in order to configure frequency invariance
based upon the values from _CPC.

This however doesn't work on AMD CPPC shared memory designs that have
AMD preferred cores enabled because _CPC needs to be analyzed from all
cores to judge if preferred cores are enabled.

This issue manifests to users as a warning since commit 21fb59ab4b97
("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"):
```
Could not retrieve highest performance (-19)
```

However the warning isn't the cause of this, it was actually
commit 279f838a61f9 ("x86/amd: Detect preferred cores in
amd_get_boost_ratio_numerator()") which exposed the issue.

To fix this problem, change arch_init_invariance_cppc() into a new weak
symbol that is called at the end of acpi_processor_driver_init().
Each architecture that supports it can declare the symbol to override
the weak one.

Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the
architectures using the generic arch_topology.c code.

Fixes: 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()")
Reported-by: Ivan Shapovalov 
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431
Tested-by: Oleksandr Natalenko 
Signed-off-by: Mario Limonciello 
Link: https://patch.msgid.link/20241104222855.3959267-1-superm1@kernel.org
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki

mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling

2024-11-06T00:49:55+00:00

Currently MTE is permitted in two circumstances (desiring to use MTE
having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
specified, as checked by arch_calc_vm_flag_bits() and actualised by
setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
hook is activated in mmap_region().

The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
set is the arm64 implementation of arch_validate_flags().

Unfortunately, we intend to refactor mmap_region() to perform this check
earlier, meaning that in the case of a shmem backing we will not have
invoked shmem_mmap() yet, causing the mapping to fail spuriously.

It is inappropriate to set this architecture-specific flag in general mm
code anyway, so a sensible resolution of this issue is to instead move the
check somewhere else.

We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
the arch_calc_vm_flag_bits() call.

This is an appropriate place to do this as we already check for the
MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
the same idea - we permit RAM-backed memory.

This requires a modification to the arch_calc_vm_flag_bits() signature to
pass in a pointer to the struct file associated with the mapping, however
this is not too egregious as this is only used by two architectures anyway
- arm64 and parisc.

So this patch performs this adjustment and removes the unnecessary
assignment of VM_MTE_ALLOWED in shmem_mmap().

[akpm@linux-foundation.org: fix whitespace, per Catalin]
Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes 
Suggested-by: Catalin Marinas 
Reported-by: Jann Horn 
Reviewed-by: Catalin Marinas 
Reviewed-by: Vlastimil Babka 
Cc: Andreas Larsson 
Cc: David S. Miller 
Cc: Helge Deller 
Cc: James E.J. Bottomley 
Cc: Liam R. Howlett 
Cc: Linus Torvalds 
Cc: Mark Brown 
Cc: Peter Xu 
Cc: Will Deacon 
Cc: 
Signed-off-by: Andrew Morton

mm: refactor map_deny_write_exec()

2024-11-06T00:49:55+00:00

Refactor the map_deny_write_exec() to not unnecessarily require a VMA
parameter but rather to accept VMA flags parameters, which allows us to
use this function early in mmap_region() in a subsequent commit.

While we're here, we refactor the function to be more readable and add
some additional documentation.

Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes 
Reported-by: Jann Horn 
Reviewed-by: Liam R. Howlett 
Reviewed-by: Vlastimil Babka 
Reviewed-by: Jann Horn 
Cc: Andreas Larsson 
Cc: Catalin Marinas 
Cc: David S. Miller 
Cc: Helge Deller 
Cc: James E.J. Bottomley 
Cc: Linus Torvalds 
Cc: Mark Brown 
Cc: Peter Xu 
Cc: Will Deacon 
Cc: 
Signed-off-by: Andrew Morton