summaryrefslogtreecommitdiff
path: root/mm/util.c
AgeCommit message (Collapse)AuthorFilesLines
2024-12-30mm/util: make memdup_user_nul() similar to memdup_user()Tetsuo Handa1-6/+1
Since the string data to copy from userspace is likely less than PAGE_SIZE bytes, replace GFP_KERNEL with GFP_USER like commit 6c2c97a24f09 ("memdup_user(): switch to GFP_USER") does and add __GFP_NOWARN like commit 6c8fcc096be9 ("mm: don't let userspace spam allocations warnings") does. Also, use dedicated slab buckets like commit d73778e4b867 ("mm/util: Use dedicated slab buckets for memdup_user()") does. Link: https://lkml.kernel.org/r/014cd694-cc27-4a07-a34a-2ae95d744515@I-love.SAKURA.ne.jp Reported-by: syzbot+7e12e97b36154c54414b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7e12e97b36154c54414b Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-25Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of ↵Linus Torvalds1-35/+27
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[] - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity - Apart from that, singelton patches in many places - please see the individual changelogs for details * tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) gdb: lx-symbols: do not error out on monolithic build kernel/reboot: replace sprintf() with sysfs_emit() lib: util_macros_kunit: add kunit test for util_macros.h util_macros.h: fix/rework find_closest() macros Improve consistency of '#error' directive messages ocfs2: fix uninitialized value in ocfs2_file_read_iter() hung_task: add docs for hung_task_detect_count hung_task: add detect count for hung tasks dma-buf: use atomic64_inc_return() in dma_buf_getfile() fs/proc/kcore.c: fix coccinelle reported ERROR instances resource: avoid unnecessary resource tree walking in __region_intersects() ocfs2: remove unused errmsg function and table ocfs2: cluster: fix a typo lib/scatterlist: use sg_phys() helper checkpatch: always parse orig_commit in fixes tag nilfs2: convert metadata aops from writepage to writepages nilfs2: convert nilfs_recovery_copy_block() to take a folio nilfs2: convert nilfs_page_count_clean_buffers() to take a folio nilfs2: remove nilfs_writepage nilfs2: convert checkpoint file to be folio-based ...
2024-11-07mm: renovate page_address_in_vma()Matthew Wilcox (Oracle)1-1/+1
This function doesn't modify any of its arguments, so if we make a few other functions take const pointers, we can make page_address_in_vma() take const pointers too. All of its callers have the containing folio already, so pass that in as an argument instead of recalculating it. Also add kernel-doc Link: https://lkml.kernel.org/r/20241005200121.3231142-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05mm/util: deduplicate code in {kstrdup,kstrndup,kmemdup_nul}Yafang Shao1-42/+27
These three functions follow the same pattern. To deduplicate the code, let's introduce a common helper __kmemdup_nul(). Link: https://lkml.kernel.org/r/20241007144911.27693-7-laoar.shao@gmail.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Simon Horman <horms@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alejandro Colomar <alx@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Jan Kara <jack@suse.cz> Cc: Justin Stitt <justinstitt@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matus Jokay <matus.jokay@stuba.sk> Cc: Maxime Ripard <mripard@kernel.org> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Quentin Monnet <qmo@kernel.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05mm/util: fix possible race condition in kstrdup()Yafang Shao1-1/+8
In kstrdup(), it is critical to ensure that the dest string is always NUL-terminated. However, potential race condition can occur between a writer and a reader. Consider the following scenario involving task->comm: reader writer len = strlen(s) + 1; strlcpy(tsk->comm, buf, sizeof(tsk->comm)); memcpy(buf, s, len); In this case, there is a race condition between the reader and the writer. The reader calculates the length of the string `s` based on the old value of task->comm. However, during the memcpy(), the string `s` might be updated by the writer to a new value of task->comm. If the new task->comm is larger than the old one, the `buf` might not be NUL-terminated. This can lead to undefined behavior and potential security vulnerabilities. Let's fix it by explicitly adding a NUL terminator after the memcpy. It is worth noting that memcpy() is not atomic, so the new string can be shorter when memcpy() already copied past the new NUL. Link: https://lkml.kernel.org/r/20241007144911.27693-6-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Alejandro Colomar <alx@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Jan Kara <jack@suse.cz> Cc: Justin Stitt <justinstitt@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matus Jokay <matus.jokay@stuba.sk> Cc: Maxime Ripard <mripard@kernel.org> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Quentin Monnet <qmo@kernel.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Simon Horman <horms@kernel.org> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: only enforce minimum stack gap size if it's sensibleDavid Gow1-1/+1
The generic mmap_base code tries to leave a gap between the top of the stack and the mmap base address, but enforces a minimum gap size (MIN_GAP) of 128MB, which is too large on some setups. In particular, on arm tasks without ADDR_LIMIT_32BIT, the STACK_TOP value is less than 128MB, so it's impossible to fit such a gap in. Only enforce this minimum if MIN_GAP < MAX_GAP, as we'd prefer to honour MAX_GAP, which is defined proportionally, so scales better and always leaves us with both _some_ stack space and some room for mmap. This fixes the usercopy KUnit test suite on 32-bit arm, as it doesn't set any personality flags so gets the default (in this case 26-bit) task size. This test can be run with: ./tools/testing/kunit/kunit.py run --arch arm usercopy --make_options LLVM=1 Link: https://lkml.kernel.org/r/20240803074642.1849623-2-davidgow@google.com Fixes: dba79c3df4a2 ("arm: use generic mmap top-down layout and brk randomization") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich1-30/+70
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados1-3/+3
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-18Merge tag 'slab-for-6.11' of ↵Linus Torvalds1-6/+17
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: "The most prominent change this time is the kmem_buckets based hardening of kmalloc() allocations from Kees Cook. We have also extended the kmalloc() alignment guarantees for non-power-of-two sizes in a way that benefits rust. The rest are various cleanups and non-critical fixups. - Dedicated bucket allocator (Kees Cook) This series [1] enhances the probabilistic defense against heap spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year. kmalloc() users that are known to be useful for exploits can get completely separate set of kmalloc caches that can't be shared with other users. The first converted users are alloc_msg() and memdup_user(). The hardening is enabled by CONFIG_SLAB_BUCKETS. - Extended kmalloc() alignment guarantees (Vlastimil Babka) For years now we have guaranteed natural alignment for power-of-two allocations, but nothing was defined for other sizes (in practice, we have two such buckets, kmalloc-96 and kmalloc-192). To avoid unnecessary padding in the rust layer due to its alignment rules, extend the guarantee so that the alignment is at least the largest power-of-two divisor of the requested size. This fits what rust needs, is a superset of the existing power-of-two guarantee, and does not in practice change the layout (and thus does not add overhead due to padding) of the kmalloc-96 and kmalloc-192 caches, unless slab debugging is enabled for them. - Cleanups and non-critical fixups (Chengming Zhou, Suren Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka) Various tweaks related to the new alloc profiling code, folio conversion, debugging and more leftovers after SLAB" Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1] * tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/memcg: alignment memcg_data define condition mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING mm, slab: move allocation tagging code in the alloc path into a hook mm/util: Use dedicated slab buckets for memdup_user() ipc, msg: Use dedicated slab buckets for alloc_msg() mm/slab: Introduce kmem_buckets_create() and family mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument mm/slab: Plumb kmem_buckets into __do_kmalloc_node() mm/slab: Introduce kmem_buckets typedef slab, rust: extend kmalloc() alignment guarantees to remove Rust padding slab: delete useless RED_INACTIVE and RED_ACTIVE slab: don't put freepointer outside of object if only orig_size slab: make check_object() more consistent mm: Reduce the number of slab->folio casts mm, slab: don't wrap internal functions with alloc_hooks()
2024-07-16Merge tag 'linux_kselftest-kunit-6.11-rc1' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - add vm_mmap() allocation resource manager - convert usercopy kselftest to KUnit - disable usercopy testing on !CONFIG_MMU - add MODULE_DESCRIPTION() to core, list, and usercopy tests - add tests for assertion formatting functions - assert.c - introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros - fix KUNIT_ASSERT_STRNEQ comments to make it clear that it is an assertion - rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT * tag 'linux_kselftest-kunit-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT for readability kunit: Fix the comment of KUNIT_ASSERT_STRNEQ as assertion kunit: executor: Simplify string allocation handling kunit/usercopy: Add missing MODULE_DESCRIPTION() kunit/usercopy: Disable testing on !CONFIG_MMU usercopy: Convert test_user_copy to KUnit test kunit: test: Add vm_mmap() allocation resource manager list: test: add the missing MODULE_DESCRIPTION() macro kunit: add missing MODULE_DESCRIPTION() macros to core modules list: test: remove unused struct 'klist_test_struct' kunit: Cover 'assert.c' with tests
2024-07-15Merge branch 'slab/for-6.11/buckets' into slab/for-nextVlastimil Babka1-6/+17
Merge all the slab patches previously collected on top of v6.10-rc1, over cleanups/fixes that had to be based on rc6.
2024-07-06mm: add folio_mc_copy()Kefeng Wang1-0/+17
Add a #MC variant of folio_copy() which uses copy_mc_highpage() to support #MC handled during folio copy, it will be used in folio migration soon. Link: https://lkml.kernel.org/r/20240626085328.608006-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03mm/util: Use dedicated slab buckets for memdup_user()Kees Cook1-2/+12
Both memdup_user() and vmemdup_user() handle allocations that are regularly used for exploiting use-after-free type confusion flaws in the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4] respectively). Since both are designed for contents coming from userspace, it allows for userspace-controlled allocation sizes. Use a dedicated set of kmalloc buckets so these allocations do not share caches with the global kmalloc buckets. After a fresh boot under Ubuntu 23.10, we can see the caches are already in active use: # grep ^memdup /proc/slabinfo memdup_user-8k 4 4 8192 4 8 : ... memdup_user-4k 8 8 4096 8 8 : ... memdup_user-2k 16 16 2048 16 8 : ... memdup_user-1k 0 0 1024 16 4 : ... memdup_user-512 0 0 512 16 2 : ... memdup_user-256 0 0 256 16 1 : ... memdup_user-128 0 0 128 32 1 : ... memdup_user-64 256 256 64 64 1 : ... memdup_user-32 512 512 32 128 1 : ... memdup_user-16 1024 1024 16 256 1 : ... memdup_user-8 2048 2048 8 512 1 : ... memdup_user-192 0 0 192 21 1 : ... memdup_user-96 168 168 96 42 1 : ... Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1] Link: https://duasynt.com/blog/linux-kernel-heap-spray [2] Link: https://etenal.me/archives/1336 [3] Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4] Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argumentKees Cook1-4/+5
Plumb kmem_buckets arguments through kvmalloc_node_noprof() so it is possible to provide an API to perform kvmalloc-style allocations with a particular set of buckets. Introduce kvmalloc_buckets_node() that takes a kmem_buckets argument. Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-02kunit/usercopy: Disable testing on !CONFIG_MMUKees Cook1-0/+2
Since arch_pick_mmap_layout() is an inline for non-MMU systems, disable this test there. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406160505.uBge6TMY-lkp@intel.com/ Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-06-17Merge tag 'hardening-v6.10-rc5' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - yama: document function parameter (Christian Göttsche) - mm/util: Swap kmemdup_array() arguments (Jean-Philippe Brucker) - kunit/overflow: Adjust for __counted_by with DEFINE_RAW_FLEX() - MAINTAINERS: Update entries for Kees Cook * tag 'hardening-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: MAINTAINERS: Update entries for Kees Cook kunit/overflow: Adjust for __counted_by with DEFINE_RAW_FLEX() yama: document function parameter mm/util: Swap kmemdup_array() arguments
2024-06-14kunit: test: Add vm_mmap() allocation resource managerKees Cook1-0/+3
For tests that need to allocate using vm_mmap() (e.g. usercopy and execve), provide the interface to have the allocation tracked by KUnit itself. This requires bringing up a placeholder userspace mm. This combines my earlier attempt at this with Mark Rutland's version[1]. Normally alloc_mm() and arch_pick_mmap_layout() aren't exported for modules, so export these only for KUnit testing. Link: https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/ [1] Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-06-06mm/util: Swap kmemdup_array() argumentsJean-Philippe Brucker1-2/+2
GCC 14.1 complains about the argument usage of kmemdup_array(): drivers/soc/tegra/fuse/fuse-tegra.c:130:65: error: 'kmemdup_array' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args] 130 | fuse->lookups = kmemdup_array(fuse->soc->lookups, sizeof(*fuse->lookups), | ^ drivers/soc/tegra/fuse/fuse-tegra.c:130:65: note: earlier argument should specify number of elements, later size of each element The annotation introduced by commit 7d78a7773355 ("string: Add additional __realloc_size() annotations for "dup" helpers") lets the compiler think that kmemdup_array() follows the same format as calloc(), with the number of elements preceding the size of one element. So we could simply swap the arguments to __realloc_size() to get rid of that warning, but it seems cleaner to instead have kmemdup_array() follow the same format as krealloc_array(), memdup_array_user(), calloc() etc. Fixes: 7d78a7773355 ("string: Add additional __realloc_size() annotations for "dup" helpers") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20240606144608.97817-2-jean-philippe@linaro.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-05mm: fix xyz_noprof functions calling profiled functionsSuren Baghdasaryan1-5/+5
Grepping /proc/allocinfo for "noprof" reveals several xyz_noprof functions, which means internally they are calling profiled functions. This should never happen as such calls move allocation charge from a higher level location where it should be accounted for into these lower level helpers. Fix this by replacing profiled function calls with noprof ones. Link: https://lkml.kernel.org/r/20240531205350.3973009-1-surenb@google.com Fixes: b951aaff5035 ("mm: enable page allocation tagging") Fixes: e26d8769da6d ("mempool: hook up to memory allocation profiling") Fixes: 88ae5fb755b0 ("mm: vmalloc: enable memory allocation profiling") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25mm: switch mm->get_unmapped_area() to a flagRick Edgecombe1-3/+3
The mm_struct contains a function pointer *get_unmapped_area(), which is set to either arch_get_unmapped_area() or arch_get_unmapped_area_topdown() during the initialization of the mm. Since the function pointer only ever points to two functions that are named the same across all arch's, a function pointer is not really required. In addition future changes will want to add versions of the functions that take additional arguments. So to save a pointers worth of bytes in mm_struct, and prevent adding additional function pointers to mm_struct in future changes, remove it and keep the information about which get_unmapped_area() to use in a flag. Add the new flag to MMF_INIT_MASK so it doesn't get clobbered on fork by mmf_init_flags(). Most MM flags get clobbered on fork. In the pre-existing behavior mm->get_unmapped_area() would get copied to the new mm in dup_mm(), so not clobbering the flag preserves the existing behavior around inheriting the topdown-ness. Introduce a helper, mm_get_unmapped_area(), to easily convert code that refers to the old function pointer to instead select and call either arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the flag. Then drop the mm->get_unmapped_area() function pointer. Leave the get_unmapped_area() pointer in struct file_operations alone. The main purpose of this change is to reorganize in preparation for future changes, but it also converts the calls of mm->get_unmapped_area() from indirect branches into a direct ones. The stress-ng bigheap benchmark calls realloc a lot, which calls through get_unmapped_area() in the kernel. On x86, the change yielded a ~1% improvement there on a retpoline config. In testing a few x86 configs, removing the pointer unfortunately didn't result in any actual size reductions in the compiled layout of mm_struct. But depending on compiler or arch alignment requirements, the change could shrink the size of mm_struct. Link: https://lkml.kernel.org/r/20240326021656.202649-3-rick.p.edgecombe@intel.com Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin (Intel) <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25mm: vmalloc: enable memory allocation profilingKent Overstreet1-9/+9
This wrapps all external vmalloc allocation functions with the alloc_hooks() wrapper, and switches internal allocations to _noprof variants where appropriate, for the new memory allocation profiling feature. [surenb@google.com: arch/um: fix forward declaration for vmalloc] Link: https://lkml.kernel.org/r/20240326073750.726636-1-surenb@google.com [surenb@google.com: undo _noprof additions in the documentation] Link: https://lkml.kernel.org/r/20240326231453.1206227-5-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-31-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25mm/slab: enable slab allocation tagging for kmalloc and friendsSuren Baghdasaryan1-8/+8
Redefine kmalloc, krealloc, kzalloc, kcalloc, etc. to record allocations and deallocations done by these functions. [surenb@google.com: undo _noprof additions in the documentation] Link: https://lkml.kernel.org/r/20240326231453.1206227-7-surenb@google.com [rdunlap@infradead.org: fix kcalloc() kernel-doc warnings] Link: https://lkml.kernel.org/r/20240327044649.9199-1-rdunlap@infradead.org Link: https://lkml.kernel.org/r/20240321163705.3067592-26-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-04mm/util.c: add byte count to __vm_enough_memory failure warningMatthew Cassell1