linux.git/mm/memory_hotplug.c, branch v6.6.131

hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio

2025-05-22T12:12:25+00:00

commit af288a426c3e3552b62595c6138ec6371a17dbba upstream.

Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page.  However folio lock must be held before
calling try_to_unmap.  Add it to fix this problem.

Warning will be produced if folio is not locked during unmap:

  ------------[ cut here ]------------
  kernel BUG at ./include/linux/swapops.h:400!
  Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G        W          6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
  Tainted: [W]=WARN
  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
  pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : try_to_unmap_one+0xb08/0xd3c
  lr : try_to_unmap_one+0x3dc/0xd3c
  Call trace:
   try_to_unmap_one+0xb08/0xd3c (P)
   try_to_unmap_one+0x3dc/0xd3c (L)
   rmap_walk_anon+0xdc/0x1f8
   rmap_walk+0x3c/0x58
   try_to_unmap+0x88/0x90
   unmap_poisoned_folio+0x30/0xa8
   do_migrate_range+0x4a0/0x568
   offline_pages+0x5a4/0x670
   memory_block_action+0x17c/0x374
   memory_subsys_offline+0x3c/0x78
   device_offline+0xa4/0xd0
   state_store+0x8c/0xf0
   dev_attr_store+0x18/0x2c
   sysfs_kf_write+0x44/0x54
   kernfs_fop_write_iter+0x118/0x1a8
   vfs_write+0x3a8/0x4bc
   ksys_write+0x6c/0xf8
   __arm64_sys_write+0x1c/0x28
   invoke_syscall+0x44/0x100
   el0_svc_common.constprop.0+0x40/0xe0
   do_el0_svc+0x1c/0x28
   el0_svc+0x30/0xd0
   el0t_64_sync_handler+0xc8/0xcc
   el0t_64_sync+0x198/0x19c
  Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
  ---[ end trace 0000000000000000 ]---

Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng 
Acked-by: David Hildenbrand 
Acked-by: Miaohe Lin 
Cc: Michal Hocko 
Cc: Naoya Horiguchi 
Cc: Oscar Salvador 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Xiangyu Chen 
Signed-off-by: He Zhe 
Signed-off-by: Greg Kroah-Hartman

x86/kaslr: Expose and use the end of the physical memory address space

2024-09-12T09:11:25+00:00

commit ea72ce5da22806d5713f3ffb39a6d5ae73841f93 upstream.

iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.

KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions.  It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory.  This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.

The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.

request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards.  That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.

MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits.  All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.

Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.

To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.

Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions")
Reported-by: Max Ramanouski 
Reported-by: Alistair Popple 
Signed-off-by: Thomas Gleixner 
Tested-By: Max Ramanouski 
Tested-by: Alistair Popple 
Reviewed-by: Dan Williams 
Reviewed-by: Alistair Popple 
Reviewed-by: Kees Cook 
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
Signed-off-by: Greg Kroah-Hartman

mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval

2024-01-20T10:51:49+00:00

commit 11684134140bb708b6e6de969a060535630b1b53 upstream.

set_memmap_mode() stores the kernel parameter memmap mode as an integer.
However, the get_memmap_mode() function utilizes param_get_bool() to fetch
the value as a boolean, leading to potential endianness issue.  On
Big-endian architectures, the memmap_on_memory is consistently displayed
as 'N' regardless of its actual status.

To address this endianness problem, the solution involves obtaining the
mode as an integer.  This adjustment ensures the proper display of the
memmap_on_memory parameter, presenting it as one of the following options:
Force, Y, or N.

Link: https://lkml.kernel.org/r/20240110140127.241451-1-sumanthk@linux.ibm.com
Fixes: 2d1f649c7c08 ("mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks")
Signed-off-by: Sumanth Korikkar 
Suggested-by: Gerald Schaefer 
Acked-by: David Hildenbrand 
Cc: Alexander Gordeev 
Cc: Aneesh Kumar K.V 
Cc: Heiko Carstens 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Vasily Gorbik 
Cc: 	[6.6+]
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

mm/memory_hotplug: fix error handling in add_memory_resource()

2023-12-13T17:45:25+00:00

commit f42ce5f087eb69e47294ababd2e7e6f88a82d308 upstream.

In add_memory_resource(), creation of memory block devices occurs after
successful call to arch_add_memory().  However, creation of memory block
devices could fail.  In that case, arch_remove_memory() is called to
perform necessary cleanup.

Currently with or without altmap support, arch_remove_memory() is always
passed with altmap set to NULL during error handling.  This leads to
freeing of struct pages using free_pages(), eventhough the allocation
might have been performed with altmap support via
altmap_alloc_block_buf().

Fix the error handling by passing altmap in arch_remove_memory(). This
ensures the following:
* When altmap is disabled, deallocation of the struct pages array occurs
  via free_pages().
* When altmap is enabled, deallocation occurs via vmem_altmap_free().

Link: https://lkml.kernel.org/r/20231120145354.308999-3-sumanthk@linux.ibm.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: Sumanth Korikkar 
Reviewed-by: Gerald Schaefer 
Acked-by: David Hildenbrand 
Cc: Alexander Gordeev 
Cc: Aneesh Kumar K.V 
Cc: Anshuman Khandual 
Cc: Heiko Carstens 
Cc: kernel test robot 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Vasily Gorbik 
Cc: 	[5.15+]
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

mm/memory_hotplug: add missing mem_hotplug_lock

2023-12-13T17:45:24+00:00

commit 001002e73712cdf6b8d9a103648cda3040ad7647 upstream.

From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).

mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.

When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0:					     | CPU 1:
memory_offline()			     |
-> offline_pages()			     |
	-> mem_hotplug_begin()		     |
	   ...				     |
	-> mem_hotplug_done()		     |
					     | kmemleak_scan()
					     | -> get_online_mems()
					     |    ...
-> mhp_deinit_memmap_on_memory()	     |
  [not protected by mem_hotplug_begin/done()]|
  Marks memory section as offline,	     |   Retrieves zone_start_pfn
  poisons vmemmap struct pages and updates   |   and struct page members.
  the zone related data			     |
   					     |    ...
   					     | -> put_online_mems()

Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory().  Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.

online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.

Link: https://lkml.kernel.org/r/20231120145354.308999-2-sumanthk@linux.ibm.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: Sumanth Korikkar 
Reviewed-by: Gerald Schaefer 
Acked-by: David Hildenbrand 
Cc: Alexander Gordeev 
Cc: Aneesh Kumar K.V 
Cc: Anshuman Khandual 
Cc: Heiko Carstens 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Vasily Gorbik 
Cc: kernel test robot 
Cc: 	[5.15+]
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

mm/memory_hotplug: use pfn math in place of direct struct page manipulation

2023-11-28T17:20:06+00:00

commit 1640a0ef80f6d572725f5b0330038c18e98ea168 upstream.

When dealing with hugetlb pages, manipulating struct page pointers
directly can get to wrong struct page, since struct page is not guaranteed
to be contiguous on SPARSEMEM without VMEMMAP.  Use pfn calculation to
handle it properly.

Without the fix, a wrong number of page might be skipped. Since skip cannot be
negative, scan_movable_page() will end early and might miss a movable page with
-ENOENT. This might fail offline_pages(). No bug is reported. The fix comes
from code inspection.

Link: https://lkml.kernel.org/r/20230913201248.452081-4-zi.yan@sent.com
Fixes: eeb0efd071d8 ("mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages")
Signed-off-by: Zi Yan 
Reviewed-by: Muchun Song 
Acked-by: David Hildenbrand 
Cc: Matthew Wilcox (Oracle) 
Cc: Mike Kravetz 
Cc: Mike Rapoport (IBM) 
Cc: Thomas Bogendoerfer 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

mm/memory_hotplug: embed vmem_altmap details in memory block

2023-08-21T20:37:49+00:00

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory.  Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

[yangyingliang@huawei.com: fix error return code in add_memory_resource()]
  Link: https://lkml.kernel.org/r/20230809081552.1351184-1-yangyingliang@huawei.com
Link: https://lkml.kernel.org/r/20230808091501.287660-7-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Yang Yingliang 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks

2023-08-21T20:37:49+00:00

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks.  This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space.  This helps the start address to be page block
aligned with different memory block sizes.  Using this facility implies
the kernel will be reserving some pages for every memoryblock.  This
allows the memmap on memory feature to be widely useful with different
memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages.  ie, for every 4096 pages 28 pages get reserved.

Link: https://lkml.kernel.org/r/20230808091501.287660-5-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: allow architecture to override memmap on memory support check

2023-08-21T20:37:48+00:00

Some architectures would want different restrictions. Hence add an
architecture-specific override.

The PMD_SIZE check is moved there.

Link: https://lkml.kernel.org/r/20230808091501.287660-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: allow memmap on memory hotplug request to fallback

2023-08-21T20:37:48+00:00

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Link: https://lkml.kernel.org/r/20230808091501.287660-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton