diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-08-29 14:25:26 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-08-29 14:25:26 -0700 |
commit | b96a3e9142fdf346b05b20e867b4f0dfca119e96 (patch) | |
tree | b338a8f8930abc24888fc3871c6627f6ad46e23b /mm | |
parent | 651a00bc56403161351090a9d7ddbd7095975324 (diff) | |
parent | 52ae298e3e5c9be5bb95e1c6d9199e5210f2a156 (diff) | |
download | linux-b96a3e9142fdf346b05b20e867b4f0dfca119e96.tar.gz linux-b96a3e9142fdf346b05b20e867b4f0dfca119e96.tar.bz2 linux-b96a3e9142fdf346b05b20e867b4f0dfca119e96.zip |
Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
Diffstat (limited to 'mm')
79 files changed, 2638 insertions, 2625 deletions
diff --git a/mm/Kconfig b/mm/Kconfig index 4bf7dc5ae5ef..264a2df5ecf5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -25,7 +25,6 @@ menuconfig SWAP config ZSWAP bool "Compressed cache for swap pages" depends on SWAP - select FRONTSWAP select CRYPTO select ZPOOL help @@ -504,7 +503,10 @@ config SPARSEMEM_VMEMMAP # Select this config option from the architecture Kconfig, if it is preferred # to enable the feature of HugeTLB/dev_dax vmemmap optimization. # -config ARCH_WANT_OPTIMIZE_VMEMMAP +config ARCH_WANT_OPTIMIZE_DAX_VMEMMAP + bool + +config ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP bool config HAVE_MEMBLOCK_PHYS_MAP @@ -586,6 +588,9 @@ config MHP_MEMMAP_ON_MEMORY endif # MEMORY_HOTPLUG +config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE + bool + # Heavily threaded applications may benefit from splitting the mm-wide # page_table_lock, so that faults on different parts of the user address # space can be handled with less contention: split it at this NR_CPUS. @@ -887,9 +892,6 @@ config USE_PERCPU_NUMA_NODE_ID config HAVE_SETUP_PER_CPU_AREA bool -config FRONTSWAP - bool - config CMA bool "Contiguous Memory Allocator" depends on MMU @@ -1161,6 +1163,9 @@ config KMAP_LOCAL_NON_LINEAR_PTE_ARRAY config IO_MAPPING bool +config MEMFD_CREATE + bool "Enable memfd_create() system call" if EXPERT + config SECRETMEM default y bool "Enable memfd_secret() system call" if EXPERT diff --git a/mm/Makefile b/mm/Makefile index d4ee20988dd1..ec65984e2ade 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -72,7 +72,6 @@ ifdef CONFIG_MMU endif obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o -obj-$(CONFIG_FRONTSWAP) += frontswap.o obj-$(CONFIG_ZSWAP) += zswap.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 3ffc3cfa7a14..1e3447bccdb1 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -16,6 +16,7 @@ #include <linux/writeback.h> #include <linux/device.h> #include <trace/events/writeback.h> +#include "internal.h" struct backing_dev_info noop_backing_dev_info; EXPORT_SYMBOL_GPL(noop_backing_dev_info); @@ -34,8 +35,6 @@ LIST_HEAD(bdi_list); /* bdi_wq serves all asynchronous writeback tasks */ struct workqueue_struct *bdi_wq; -#define K(x) ((x) << (PAGE_SHIFT - 10)) - #ifdef CONFIG_DEBUG_FS #include <linux/debugfs.h> #include <linux/seq_file.h> @@ -733,9 +732,6 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, might_alloc(gfp); - if (!memcg_css->parent) - return &bdi->wb; - do { wb = wb_get_lookup(bdi, memcg_css); } while (!wb && !cgwb_create(bdi, memcg_css, gfp)); @@ -436,8 +436,8 @@ struct page *cma_alloc(struct cma *cma, unsigned long count, if (!cma || !cma->count || !cma->bitmap) goto out; - pr_debug("%s(cma %p, count %lu, align %d)\n", __func__, (void *)cma, - count, align); + pr_debug("%s(cma %p, name: %s, count %lu, align %d)\n", __func__, + (void *)cma, cma->name, count, align); if (!count) goto out; diff --git a/mm/compaction.c b/mm/compaction.c index eacca2794e47..38c8d216c6a3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -249,11 +249,36 @@ static unsigned long skip_offline_sections(unsigned long start_pfn) return 0; } + +/* + * If the PFN falls into an offline section, return the end PFN of the + * next online section in reverse. If the PFN falls into an online section + * or if there is no next online section in reverse, return 0. + */ +static unsigned long skip_offline_sections_reverse(unsigned long start_pfn) +{ + unsigned long start_nr = pfn_to_section_nr(start_pfn); + + if (!start_nr || online_section_nr(start_nr)) + return 0; + + while (start_nr-- > 0) { + if (online_section_nr(start_nr)) + return section_nr_to_pfn(start_nr) + PAGES_PER_SECTION; + } + + return 0; +} #else static unsigned long skip_offline_sections(unsigned long start_pfn) { return 0; } + +static unsigned long skip_offline_sections_reverse(unsigned long start_pfn) +{ + return 0; +} #endif /* @@ -438,12 +463,13 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) { struct zone *zone = cc->zone; - pfn = pageblock_end_pfn(pfn); - /* Set for isolation rather than compaction */ if (cc->no_set_skip_hint) return; + pfn = pageblock_end_pfn(pfn); + + /* Update where async and sync compaction should restart */ if (pfn > zone->compact_cached_migrate_pfn[0]) zone->compact_cached_migrate_pfn[0] = pfn; if (cc->mode != MIGRATE_ASYNC && @@ -465,7 +491,6 @@ static void update_pageblock_skip(struct compact_control *cc, set_pageblock_skip(page); - /* Update where async and sync compaction should restart */ if (pfn < zone->compact_cached_free_pfn) zone->compact_cached_free_pfn = pfn; } @@ -564,7 +589,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, bool strict) { int nr_scanned = 0, total_isolated = 0; - struct page *cursor; + struct page *page; unsigned long flags = 0; bool locked = false; unsigned long blockpfn = *start_pfn; @@ -574,12 +599,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (strict) stride = 1; - cursor = pfn_to_page(blockpfn); + page = pfn_to_page(blockpfn); /* Isolate free pages. */ - for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) { + for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; - struct page *page = cursor; /* * Periodically drop the lock (if held) regardless of its @@ -604,7 +628,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (likely(order <= MAX_ORDER)) { blockpfn += (1UL << order) - 1; - cursor += (1UL << order) - 1; + page += (1UL << order) - 1; nr_scanned += (1UL << order) - 1; } goto isolate_fail; @@ -641,14 +665,12 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, } /* Advance to the end of split page */ blockpfn += isolated - 1; - cursor += isolated - 1; + page += isolated - 1; continue; isolate_fail: if (strict) break; - else - continue; } @@ -715,8 +737,6 @@ isolate_freepages_range(struct compact_control *cc, /* Protect pfn from changing by isolate_freepages_block */ unsigned long isolate_start_pfn = pfn; - block_end_pfn = min(block_end_pfn, end_pfn); - /* * pfn could pass the block_end_pfn if isolated freepage * is more than pageblock order. In this case, we adjust @@ -725,9 +745,10 @@ isolate_freepages_range(struct compact_control *cc, if (pfn >= block_end_pfn) { block_start_pfn = pageblock_start_pfn(pfn); block_end_pfn = pageblock_end_pfn(pfn); - block_end_pfn = min(block_end_pfn, end_pfn); } + block_end_pfn = min(block_end_pfn, end_pfn); + if (!pageblock_pfn_to_page(block_start_pfn, block_end_pfn, cc->zone)) break; @@ -1076,13 +1097,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, bool migrate_dirty; /* - * Only pages without mappings or that have a - * ->migrate_folio callback are possible to migrate - * without blocking. However, we can be racing with - * truncation so it's necessary to lock the page - * to stabilise the mapping as truncation holds - * the page lock until after the page is removed - * from the page cache. + * Only folios without mappings or that have + * a ->migrate_folio callback are possible to + * migrate without blocking. However, we may + * be racing with truncation, which can free + * the mapping. Truncation holds the folio lock + * until after the folio is removed from the page + * cache so holding it ourselves is sufficient. */ if (!folio_trylock(folio)) goto isolate_fail_put; @@ -1120,6 +1141,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, skip_updated = true; if (test_and_set_skip(cc, valid_page) && !cc->finish_pageblock) { + low_pfn = end_pfn; goto isolate_abort; } } @@ -1421,10 +1443,8 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn) isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); /* Skip this pageblock in the future as it's full or nearly full */ - if (start_pfn == end_pfn) + if (start_pfn == end_pfn && !cc->no_set_skip_hint) set_pageblock_skip(page); - - return; } /* Search orders in round-robin fashion */ @@ -1501,7 +1521,7 @@ static void fast_isolate_freepages(struct compact_control *cc) spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; - list_for_each_entry_reverse(freepage, freelist, lru) { + list_for_each_entry_reverse(freepage, freelist, buddy_list) { unsigned long pfn; order_scanned++; @@ -1530,7 +1550,7 @@ static void fast_isolate_freepages(struct compact_control *cc) break; } - /* Use a minimum pfn if a preferred one was not found */ + /* Use a maximum candidate pfn if a preferred one was not found */ if (!page && high_pfn) { page = pfn_to_page(high_pfn); @@ -1669,8 +1689,15 @@ static void isolate_freepages(struct compact_control *cc) page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); - if (!page) + if (!page) { + unsigned long next_pfn; + + next_pfn = skip_offline_sections_reverse(block_start_pfn); + if (next_pfn) + block_start_pfn = max(next_pfn, low_pfn); + continue; + } /* Check the block is suitable for migration */ if (!suitable_migration_target(cc, page)) @@ -1686,7 +1713,8 @@ static void isolate_freepages(struct compact_control *cc) /* Update the skip hint if the full pageblock was scanned */ if (isolate_start_pfn == block_end_pfn) - update_pageblock_skip(cc, page, block_start_pfn); + update_pageblock_skip(cc, page, block_start_pfn - + pageblock_nr_pages); /* Are enough freepages isolated? */ if (cc->nr_freepages >= cc->nr_migratepages) { @@ -1884,7 +1912,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; - list_for_each_entry(freepage, freelist, lru) { + list_for_each_entry(freepage, freelist, buddy_list) { unsigned long free_pfn; if (nr_scanned++ >= limit) { @@ -1958,9 +1986,9 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) block_start_pfn = cc->zone->zone_start_pfn; /* - * fast_find_migrateblock marks a pageblock skipped so to avoid - * the isolation_suitable check below, check whether the fast - * search was successful. + * fast_find_migrateblock() has already ensured the pageblock is not + * set with a skipped flag, so to avoid the isolation_suitable check + * below again, check whether the fast search was successful. */ fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail; @@ -2114,7 +2142,7 @@ static unsigned int fragmentation_score_node(pg_data_t *pgdat) return score; } -static unsigned int fragmentation_score_wmark(pg_data_t *pgdat, bool low) +static |