summaryrefslogtreecommitdiff
path: root/arch
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2023-04-27 19:42:02 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2023-04-27 19:42:02 -0700
commit7fa8a8ee9400fe8ec188426e40e481717bc5e924 (patch)
treecc8fd6b4f936ec01e73238643757451e20478c07 /arch
parent91ec4b0d11fe115581ce2835300558802ce55e6c (diff)
parent4d4b6d66db63ceed399f1fb1a4b24081d2590eb1 (diff)
downloadlinux-7fa8a8ee9400fe8ec188426e40e481717bc5e924.tar.gz
linux-7fa8a8ee9400fe8ec188426e40e481717bc5e924.tar.bz2
linux-7fa8a8ee9400fe8ec188426e40e481717bc5e924.zip
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
Diffstat (limited to 'arch')
-rw-r--r--arch/Kconfig32
-rw-r--r--arch/arc/Kconfig4
-rw-r--r--arch/arc/mm/init.c5
-rw-r--r--arch/arm/Kconfig27
-rw-r--r--arch/arm/configs/imx_v6_v7_defconfig2
-rw-r--r--arch/arm/configs/milbeaut_m10v_defconfig2
-rw-r--r--arch/arm/configs/pxa_defconfig2
-rw-r--r--arch/arm/configs/sama7_defconfig2
-rw-r--r--arch/arm/configs/sp7021_defconfig2
-rw-r--r--arch/arm/mach-rpc/ecard.c2
-rw-r--r--arch/arm64/Kconfig52
-rw-r--r--arch/arm64/include/asm/memory.h8
-rw-r--r--arch/arm64/include/asm/mte-kasan.h81
-rw-r--r--arch/arm64/include/asm/mte.h12
-rw-r--r--arch/arm64/include/asm/pgtable.h2
-rw-r--r--arch/arm64/include/asm/sparsemem.h2
-rw-r--r--arch/arm64/include/asm/uaccess.h66
-rw-r--r--arch/arm64/include/asm/word-at-a-time.h4
-rw-r--r--arch/arm64/kvm/hyp/include/nvhe/gfp.h2
-rw-r--r--arch/arm64/kvm/hyp/nvhe/page_alloc.c10
-rw-r--r--arch/arm64/mm/fault.c36
-rw-r--r--arch/csky/Kconfig4
-rw-r--r--arch/ia64/Kconfig7
-rw-r--r--arch/ia64/include/asm/sparsemem.h4
-rw-r--r--arch/ia64/mm/hugetlbpage.c2
-rw-r--r--arch/loongarch/Kconfig14
-rw-r--r--arch/m68k/Kconfig.cpu19
-rw-r--r--arch/m68k/include/asm/mcf_pgtable.h2
-rw-r--r--arch/mips/Kconfig15
-rw-r--r--arch/mips/include/asm/fixmap.h2
-rw-r--r--arch/mips/include/asm/pgtable.h3
-rw-r--r--arch/nios2/Kconfig22
-rw-r--r--arch/powerpc/Kconfig38
-rw-r--r--arch/powerpc/configs/85xx/ge_imp3a_defconfig2
-rw-r--r--arch/powerpc/configs/fsl-emb-nonhw.config2
-rw-r--r--arch/powerpc/include/asm/book3s/64/tlbflush.h3
-rw-r--r--arch/powerpc/kernel/smp.c2
-rw-r--r--arch/powerpc/mm/book3s64/iommu_api.c2
-rw-r--r--arch/powerpc/mm/book3s64/radix_tlb.c4
-rw-r--r--arch/powerpc/mm/fault.c37
-rw-r--r--arch/powerpc/mm/hugetlbpage.c2
-rw-r--r--arch/powerpc/platforms/powernv/Kconfig1
-rw-r--r--arch/powerpc/platforms/powernv/pci-ioda.c2
-rw-r--r--arch/powerpc/platforms/pseries/Kconfig1
-rw-r--r--arch/s390/Kconfig3
-rw-r--r--arch/s390/include/asm/pgtable.h12
-rw-r--r--arch/s390/mm/fault.c24
-rw-r--r--arch/s390/mm/gmap.c7
-rw-r--r--arch/s390/mm/hugetlbpage.c2
-rw-r--r--arch/s390/mm/mmap.c2
-rw-r--r--arch/sh/configs/ecovec24_defconfig2
-rw-r--r--arch/sh/mm/Kconfig30
-rw-r--r--arch/sparc/Kconfig21
-rw-r--r--arch/sparc/include/asm/pgtable_64.h116
-rw-r--r--arch/sparc/kernel/pci_sun4v.c2
-rw-r--r--arch/sparc/kernel/traps_64.c2
-rw-r--r--arch/sparc/mm/tsb.c4
-rw-r--r--arch/x86/Kconfig3
-rw-r--r--arch/x86/include/asm/pgtable.h2
-rw-r--r--arch/x86/include/asm/string_64.h23
-rw-r--r--arch/x86/mm/fault.c36
-rw-r--r--arch/x86/mm/pat/memtype.c12
-rw-r--r--arch/xtensa/Kconfig23
63 files changed, 521 insertions, 350 deletions
diff --git a/arch/Kconfig b/arch/Kconfig
index e3511afbb7f2..205fd23e0cad 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -465,6 +465,38 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
irqs disabled over activate_mm. Architectures that do IPI based TLB
shootdowns should enable this.
+# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references.
+# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching
+# to/from kernel threads when the same mm is running on a lot of CPUs (a large
+# multi-threaded application), by reducing contention on the mm refcount.
+#
+# This can be disabled if the architecture ensures no CPUs are using an mm as a
+# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm
+# or its kernel page tables). This could be arranged by arch_exit_mmap(), or
+# final exit(2) TLB flush, for example.
+#
+# To implement this, an arch *must*:
+# Ensure the _lazy_tlb variants of mmgrab/mmdrop are used when manipulating
+# the lazy tlb reference of a kthread's ->active_mm (non-arch code has been
+# converted already).
+config MMU_LAZY_TLB_REFCOUNT
+ def_bool y
+ depends on !MMU_LAZY_TLB_SHOOTDOWN
+
+# This option allows MMU_LAZY_TLB_REFCOUNT=n. It ensures no CPUs are using an
+# mm as a lazy tlb beyond its last reference count, by shooting down these
+# users before the mm is deallocated. __mmdrop() first IPIs all CPUs that may
+# be using the mm as a lazy tlb, so that they may switch themselves to using
+# init_mm for their active mm. mm_cpumask(mm) is used to determine which CPUs
+# may be using mm as a lazy tlb mm.
+#
+# To implement this, an arch *must*:
+# - At the time of the final mmdrop of the mm, ensure mm_cpumask(mm) contains
+# at least all possible CPUs in which the mm is lazy.
+# - It must meet the requirements for MMU_LAZY_TLB_REFCOUNT=n (see above).
+config MMU_LAZY_TLB_SHOOTDOWN
+ bool
+
config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index d9a13ccf89a3..ab6d701365bb 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -556,7 +556,7 @@ endmenu # "ARC Architecture Configuration"
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
- default "12" if ARC_HUGEPAGE_16M
- default "11"
+ default "11" if ARC_HUGEPAGE_16M
+ default "10"
source "kernel/power/Kconfig"
diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index ce4e939a7f07..2b89b6c53801 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -74,11 +74,6 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
base, TO_MB(size), !in_use ? "Not used":"");
}
-bool arch_has_descending_max_zone_pfns(void)
-{
- return !IS_ENABLED(CONFIG_ARC_HAS_PAE40);
-}
-
/*
* First memory setup routine called from setup_arch()
* 1. setup swapper's mm @init_mm
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a8b82c2f333b..0fb4b218f665 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1352,20 +1352,19 @@ config ARM_MODULE_PLTS
configurations. If unsure, say y.
config ARCH_FORCE_MAX_ORDER
- int "Maximum zone order"
- default "12" if SOC_AM33XX
- default "9" if SA1111
- default "11"
- help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages. This option selects the largest power of two that the kernel
- keeps in the memory allocator. If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
-
- This config option is actually maximum order plus one. For example,
- a value of 11 means that the largest free memory block is 2^10 pages.
+ int "Order of maximal physically contiguous allocations"
+ default "11" if SOC_AM33XX
+ default "8" if SA1111
+ default "10"
+ help
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ Don't change if unsure.
config ALIGNMENT_TRAP
def_bool CPU_CP15_MMU
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index 62f2b8972160..4de293da4789 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
CONFIG_SMP=y
CONFIG_ARM_PSCI=y
CONFIG_HIGHMEM=y
-CONFIG_ARCH_FORCE_MAX_ORDER=14
+CONFIG_ARCH_FORCE_MAX_ORDER=13
CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
index bd29e5012cb0..385ad0f391a8 100644
--- a/arch/arm/configs/milbeaut_m10v_defconfig
+++ b/arch/arm/configs/milbeaut_m10v_defconfig
@@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
# CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
# CONFIG_ARM_PATCH_IDIV is not set
CONFIG_HIGHMEM=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
CONFIG_SECCOMP=y
CONFIG_KEXEC=y
CONFIG_EFI=y
diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
index e656d3af2266..b46e39369dbb 100644
--- a/arch/arm/configs/pxa_defconfig
+++ b/arch/arm/configs/pxa_defconfig
@@ -20,7 +20,7 @@ CONFIG_PXA_SHARPSL=y
CONFIG_MACH_AKITA=y
CONFIG_MACH_BORZOI=y
CONFIG_AEABI=y
-CONFIG_ARCH_FORCE_MAX_ORDER=9
+CONFIG_ARCH_FORCE_MAX_ORDER=8
CONFIG_CMDLINE="root=/dev/ram0 ro"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
index 0d964c613d71..954112041403 100644
--- a/arch/arm/configs/sama7_defconfig
+++ b/arch/arm/configs/sama7_defconfig
@@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
# CONFIG_CACHE_L2X0 is not set
# CONFIG_ARM_PATCH_IDIV is not set
# CONFIG_CPU_SW_DOMAIN_PAN is not set
-CONFIG_ARCH_FORCE_MAX_ORDER=15
+CONFIG_ARCH_FORCE_MAX_ORDER=14
CONFIG_UACCESS_WITH_MEMCPY=y
# CONFIG_ATAGS is not set
CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
diff --git a/arch/arm/configs/sp7021_defconfig b/arch/arm/configs/sp7021_defconfig
index 5bca2eb59b86..c6448ac860b6 100644
--- a/arch/arm/configs/sp7021_defconfig
+++ b/arch/arm/configs/sp7021_defconfig
@@ -17,7 +17,7 @@ CONFIG_ARCH_SUNPLUS=y
# CONFIG_VDSO is not set
CONFIG_SMP=y
CONFIG_THUMB2_KERNEL=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
CONFIG_VFP=y
CONFIG_NEON=y
CONFIG_MODULES=y
diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c
index 53813f9464a2..c30df1097c52 100644
--- a/arch/arm/mach-rpc/ecard.c
+++ b/arch/arm/mach-rpc/ecard.c
@@ -253,7 +253,7 @@ static int ecard_init_mm(void)
current->mm = mm;
current->active_mm = mm;
activate_mm(active_mm, mm);
- mmdrop(active_mm);
+ mmdrop_lazy_tlb(active_mm);
ecard_init_pgtables(mm);
return 0;
}
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3f5bf55050e8..b1201d25a8a4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -95,6 +95,7 @@ config ARM64
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
+ select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
select ARCH_WANT_DEFAULT_BPF_JIT
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
@@ -1505,39 +1506,34 @@ config XEN
# include/linux/mmzone.h requires the following to be true:
#
-# MAX_ORDER - 1 + PAGE_SHIFT <= SECTION_SIZE_BITS
+# MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
#
-# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS + 1 - PAGE_SHIFT:
+# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS - PAGE_SHIFT:
#
# | SECTION_SIZE_BITS | PAGE_SHIFT | max MAX_ORDER | default MAX_ORDER |
# ----+-------------------+--------------+-----------------+--------------------+
-# 4K | 27 | 12 | 16 | 11 |
-# 16K | 27 | 14 | 14 | 12 |
-# 64K | 29 | 16 | 14 | 14 |
+# 4K | 27 | 12 | 15 | 10 |
+# 16K | 27 | 14 | 13 | 11 |
+# 64K | 29 | 16 | 13 | 13 |
config ARCH_FORCE_MAX_ORDER
- int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
- default "14" if ARM64_64K_PAGES
- range 12 14 if ARM64_16K_PAGES
- default "12" if ARM64_16K_PAGES
- range 11 16 if ARM64_4K_PAGES
- default "11"
- help
- The kernel memory allocator divides physically contiguous memory
- blocks into "zones", where each zone is a power of two number of
- pages. This option selects the largest power of two that the kernel
- keeps in the memory allocator. If you need to allocate very large
- blocks of physically contiguous memory, then you may need to
- increase this value.
-
- This config option is actually maximum order plus one. For example,
- a value of 11 means that the largest free memory block is 2^10 pages.
-
- We make sure that we can allocate up to a HugePage size for each configuration.
- Hence we have :
- MAX_ORDER = (PMD_SHIFT - PAGE_SHIFT) + 1 => PAGE_SHIFT - 2
-
- However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
- 4M allocations matching the default size used by generic code.
+ int "Order of maximal physically contiguous allocations" if EXPERT && (ARM64_4K_PAGES || ARM64_16K_PAGES)
+ default "13" if ARM64_64K_PAGES
+ default "11" if ARM64_16K_PAGES
+ default "10"
+ help
+ The kernel page allocator limits the size of maximal physically
+ contiguous allocations. The limit is called MAX_ORDER and it
+ defines the maximal power of two of number of pages that can be
+ allocated as a single contiguous block. This option allows
+ overriding the default setting when ability to allocate very
+ large blocks of physically contiguous memory is required.
+
+ The maximal size of allocation cannot exceed the size of the
+ section, so the value of MAX_ORDER should satisfy
+
+ MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
+
+ Don't change if unsure.
config UNMAP_KERNEL_AT_EL0
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index efcd68154a3a..c735afdf639b 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -261,9 +261,11 @@ static inline const void *__tag_set(const void *addr, u8 tag)
}
#ifdef CONFIG_KASAN_HW_TAGS
-#define arch_enable_tagging_sync() mte_enable_kernel_sync()
-#define arch_enable_tagging_async() mte_enable_kernel_async()
-#define arch_enable_tagging_asymm() mte_enable_kernel_asymm()
+#define arch_enable_tag_checks_sync() mte_enable_kernel_sync()
+#define arch_enable_tag_checks_async() mte_enable_kernel_async()
+#define arch_enable_tag_checks_asymm() mte_enable_kernel_asymm()
+#define arch_suppress_tag_checks_start() mte_enable_tco()
+#define arch_suppress_tag_checks_stop() mte_disable_tco()
#define arch_force_async_tag_fault() mte_check_tfsr_exit()
#define arch_get_random_tag() mte_get_random_tag()
#define arch_get_mem_tag(addr) mte_get_mem_tag(addr)
diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
index 9f79425fc65a..2e98028c1965 100644
--- a/arch/arm64/include/asm/mte-kasan.h
+++ b/arch/arm64/include/asm/mte-kasan.h
@@ -13,9 +13,74 @@
#include <linux/types.h>
+#ifdef CONFIG_KASAN_HW_TAGS
+
+/* Whether the MTE asynchronous mode is enabled. */
+DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
+
+static inline bool system_uses_mte_async_or_asymm_mode(void)
+{
+ return static_branch_unlikely(&mte_async_or_asymm_mode);
+}
+
+#else /* CONFIG_KASAN_HW_TAGS */
+
+static inline bool system_uses_mte_async_or_asymm_mode(void)
+{
+ return false;
+}
+
+#endif /* CONFIG_KASAN_HW_TAGS */
+
#ifdef CONFIG_ARM64_MTE
/*
+ * The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0
+ * affects EL0 and TCF affects EL1 irrespective of which TTBR is
+ * used.
+ * The kernel accesses TTBR0 usually with LDTR/STTR instructions
+ * when UAO is available, so these would act as EL0 accesses using
+ * TCF0.
+ * However futex.h code uses exclusives which would be executed as
+ * EL1, this can potentially cause a tag check fault even if the
+ * user disables TCF0.
+ *
+ * To address the problem we set the PSTATE.TCO bit in uaccess_enable()
+ * and reset it in uaccess_disable().
+ *
+ * The Tag check override (TCO) bit disables temporarily the tag checking
+ * preventing the issue.
+ */
+static inline void mte_disable_tco(void)
+{
+ asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0),
+ ARM64_MTE, CONFIG_KASAN_HW_TAGS));
+}
+
+static inline void mte_enable_tco(void)
+{
+ asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1),
+ ARM64_MTE, CONFIG_KASAN_HW_TAGS));
+}
+
+/*
+ * These functions disable tag checking only if in MTE async mode
+ * since the sync mode generates exceptions synchronously and the
+ * nofault or load_unaligned_zeropad can handle them.
+ */
+static inline void __mte_disable_tco_async(void)
+{
+ if (system_uses_mte_async_or_asymm_mode())
+ mte_disable_tco();
+}
+
+static inline void __mte_enable_tco_async(void)
+{
+ if (system_uses_mte_async_or_asymm_mode())
+ mte_enable_tco();
+}
+
+/*
* These functions are meant to be only used from KASAN runtime through
* the arch_*() interface defined in asm/memory.h.
* These functions don't include system_supports_mte() checks,
@@ -138,6 +203,22 @@ void mte_enable_kernel_asymm(void);
#else /* CONFIG_ARM64_MTE */
+static inline void mte_disable_tco(void)
+{
+}
+
+static inline void mte_enable_tco(void)
+{
+}
+
+static inline void __mte_disable_tco_async(void)
+{
+}
+
+static inline void __mte_enable_tco_async(void)
+{
+}
+
static inline u8 mte_get_ptr_tag(void *ptr)
{
return 0xFF;
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 20dd06d70af5..c028afb1cd0b 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -178,14 +178,6 @@ static inline void mte_disable_tco_entry(struct task_struct *task)
}
#ifdef CONFIG_KASAN_HW_TAGS
-/* Whether the MTE asynchronous mode is enabled. */
-DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
-
-static inline bool system_uses_mte_async_or_asymm_mode(void)
-{
- return static_branch_unlikely(&mte_async_or_asymm_mode);
-}
-
void mte_check_tfsr_el1(void);
static inline void mte_check_tfsr_entry(void)
@@ -212,10 +204,6 @@ static inline void mte_check_tfsr_exit(void)
mte_check_tfsr_el1();
}
#else
-static inline bool system_uses_mte_async_or_asymm_mode(void)
-{
- return false;
-}
static inline void mte_check_tfsr_el1(void)
{
}
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b6ba466e2e8a..0bd18de9fd97 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/