<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm, branch v6.10-rc6</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>mm/memory: don't require head page for do_set_pmd()</title>
<updated>2024-06-25T03:52:11+00:00</updated>
<author>
<name>Andrew Bresticker</name>
<email>abrestic@rivosinc.com</email>
</author>
<published>2024-06-11T15:32:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ab1ffc86cb5bec1c92387b9811d9036512f8f4eb'/>
<id>ab1ffc86cb5bec1c92387b9811d9036512f8f4eb</id>
<content type='text'>
The requirement that the head page be passed to do_set_pmd() was added in
commit ef37b2ea08ac ("mm/memory: page_add_file_rmap() -&gt;
folio_add_file_rmap_[pte|pmd]()") and prevents pmd-mapping in the
finish_fault() and filemap_map_pages() paths if the page to be inserted is
anything but the head page for an otherwise suitable vma and pmd-sized
page.

Matthew said:

: We're going to stop using PMDs to map large folios unless the fault is
: within the first 4KiB of the PMD.  No idea how many workloads that
: affects, but it only needs to be backported as far as v6.8, so we may
: as well backport it.

Link: https://lkml.kernel.org/r/20240611153216.2794513-1-abrestic@rivosinc.com
Fixes: ef37b2ea08ac ("mm/memory: page_add_file_rmap() -&gt; folio_add_file_rmap_[pte|pmd]()")
Signed-off-by: Andrew Bresticker &lt;abrestic@rivosinc.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The requirement that the head page be passed to do_set_pmd() was added in
commit ef37b2ea08ac ("mm/memory: page_add_file_rmap() -&gt;
folio_add_file_rmap_[pte|pmd]()") and prevents pmd-mapping in the
finish_fault() and filemap_map_pages() paths if the page to be inserted is
anything but the head page for an otherwise suitable vma and pmd-sized
page.

Matthew said:

: We're going to stop using PMDs to map large folios unless the fault is
: within the first 4KiB of the PMD.  No idea how many workloads that
: affects, but it only needs to be backported as far as v6.8, so we may
: as well backport it.

Link: https://lkml.kernel.org/r/20240611153216.2794513-1-abrestic@rivosinc.com
Fixes: ef37b2ea08ac ("mm/memory: page_add_file_rmap() -&gt; folio_add_file_rmap_[pte|pmd]()")
Signed-off-by: Andrew Bresticker &lt;abrestic@rivosinc.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/page_alloc: Separate THP PCP into movable and non-movable categories</title>
<updated>2024-06-25T03:52:11+00:00</updated>
<author>
<name>yangge</name>
<email>yangge1116@126.com</email>
</author>
<published>2024-06-20T00:59:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bf14ed81f571f8dba31cd72ab2e50fbcc877cc31'/>
<id>bf14ed81f571f8dba31cd72ab2e50fbcc877cc31</id>
<content type='text'>
Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations") no longer differentiates the migration type of
pages in THP-sized PCP list, it's possible that non-movable allocation
requests may get a CMA page from the list, in some cases, it's not
acceptable.

If a large number of CMA memory are configured in system (for example, the
CMA memory accounts for 50% of the system memory), starting a virtual
machine with device passthrough will get stuck.  During starting the
virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM,
...) to pin memory.  Normally if a page is present and in CMA area,
pin_user_pages_remote() will migrate the page from CMA area to non-CMA
area because of FOLL_LONGTERM flag.  But if non-movable allocation
requests return CMA memory, migrate_longterm_unpinnable_pages() will
migrate a CMA page to another CMA page, which will fail to pass the check
in check_and_migrate_movable_pages() and cause migration endless.

Call trace:
pin_user_pages_remote
--__gup_longterm_locked // endless loops in this function
----_get_user_pages_locked
----check_and_migrate_movable_pages
------migrate_longterm_unpinnable_pages
--------alloc_migration_target

This problem will also have a negative impact on CMA itself.  For example,
when CMA is borrowed by THP, and we need to reclaim it through cma_alloc()
or dma_alloc_coherent(), we must move those pages out to ensure CMA's
users can retrieve that contigous memory.  Currently, CMA's memory is
occupied by non-movable pages, meaning we can't relocate them.  As a
result, cma_alloc() is more likely to fail.

To fix the problem above, we add one PCP list for THP, which will not
introduce a new cacheline for struct per_cpu_pages.  THP will have 2 PCP
lists, one PCP list is used by MOVABLE allocation, and the other PCP list
is used by UNMOVABLE allocation.  MOVABLE allocation contains GPF_MOVABLE,
and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE.

Link: https://lkml.kernel.org/r/1718845190-4456-1-git-send-email-yangge1116@126.com
Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations")
Signed-off-by: yangge &lt;yangge1116@126.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;21cnbao@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations") no longer differentiates the migration type of
pages in THP-sized PCP list, it's possible that non-movable allocation
requests may get a CMA page from the list, in some cases, it's not
acceptable.

If a large number of CMA memory are configured in system (for example, the
CMA memory accounts for 50% of the system memory), starting a virtual
machine with device passthrough will get stuck.  During starting the
virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM,
...) to pin memory.  Normally if a page is present and in CMA area,
pin_user_pages_remote() will migrate the page from CMA area to non-CMA
area because of FOLL_LONGTERM flag.  But if non-movable allocation
requests return CMA memory, migrate_longterm_unpinnable_pages() will
migrate a CMA page to another CMA page, which will fail to pass the check
in check_and_migrate_movable_pages() and cause migration endless.

Call trace:
pin_user_pages_remote
--__gup_longterm_locked // endless loops in this function
----_get_user_pages_locked
----check_and_migrate_movable_pages
------migrate_longterm_unpinnable_pages
--------alloc_migration_target

This problem will also have a negative impact on CMA itself.  For example,
when CMA is borrowed by THP, and we need to reclaim it through cma_alloc()
or dma_alloc_coherent(), we must move those pages out to ensure CMA's
users can retrieve that contigous memory.  Currently, CMA's memory is
occupied by non-movable pages, meaning we can't relocate them.  As a
result, cma_alloc() is more likely to fail.

To fix the problem above, we add one PCP list for THP, which will not
introduce a new cacheline for struct per_cpu_pages.  THP will have 2 PCP
lists, one PCP list is used by MOVABLE allocation, and the other PCP list
is used by UNMOVABLE allocation.  MOVABLE allocation contains GPF_MOVABLE,
and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE.

Link: https://lkml.kernel.org/r/1718845190-4456-1-git-send-email-yangge1116@126.com
Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations")
Signed-off-by: yangge &lt;yangge1116@126.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;21cnbao@gmail.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/migrate: make migrate_pages_batch() stats consistent</title>
<updated>2024-06-25T03:52:10+00:00</updated>
<author>
<name>Zi Yan</name>
<email>ziy@nvidia.com</email>
</author>
<published>2024-06-18T13:41:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c6408250703530187cc6250dcd702d12a71c44f5'/>
<id>c6408250703530187cc6250dcd702d12a71c44f5</id>
<content type='text'>
As Ying pointed out in [1], stats-&gt;nr_thp_failed needs to be updated to
avoid stats inconsistency between MIGRATE_SYNC and MIGRATE_ASYNC when
calling migrate_pages_batch().

Because if not, when migrate_pages_batch() is called via
migrate_pages(MIGRATE_ASYNC), nr_thp_failed will not be increased and when
migrate_pages_batch() is called via migrate_pages(MIGRATE_SYNC*),
nr_thp_failed will be increase in migrate_pages_sync() by
stats-&gt;nr_thp_failed += astats.nr_thp_split.

[1] https://lore.kernel.org/linux-mm/87msnq7key.fsf@yhuang6-desk2.ccr.corp.intel.com/

Link: https://lkml.kernel.org/r/20240620012712.19804-1-zi.yan@sent.com
Link: https://lkml.kernel.org/r/20240618134151.29214-1-zi.yan@sent.com
Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Zi Yan &lt;ziy@nvidia.com&gt;
Suggested-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yin Fengwei &lt;fengwei.yin@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As Ying pointed out in [1], stats-&gt;nr_thp_failed needs to be updated to
avoid stats inconsistency between MIGRATE_SYNC and MIGRATE_ASYNC when
calling migrate_pages_batch().

Because if not, when migrate_pages_batch() is called via
migrate_pages(MIGRATE_ASYNC), nr_thp_failed will not be increased and when
migrate_pages_batch() is called via migrate_pages(MIGRATE_SYNC*),
nr_thp_failed will be increase in migrate_pages_sync() by
stats-&gt;nr_thp_failed += astats.nr_thp_split.

[1] https://lore.kernel.org/linux-mm/87msnq7key.fsf@yhuang6-desk2.ccr.corp.intel.com/

Link: https://lkml.kernel.org/r/20240620012712.19804-1-zi.yan@sent.com
Link: https://lkml.kernel.org/r/20240618134151.29214-1-zi.yan@sent.com
Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Zi Yan &lt;ziy@nvidia.com&gt;
Suggested-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Yin Fengwei &lt;fengwei.yin@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kasan: fix bad call to unpoison_slab_object</title>
<updated>2024-06-25T03:52:09+00:00</updated>
<author>
<name>Andrey Konovalov</name>
<email>andreyknvl@gmail.com</email>
</author>
<published>2024-06-14T14:32:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1c61990d3762a020817daa353da0a0af6794140b'/>
<id>1c61990d3762a020817daa353da0a0af6794140b</id>
<content type='text'>
Commit 29d7355a9d05 ("kasan: save alloc stack traces for mempool") messed
up one of the calls to unpoison_slab_object: the last two arguments are
supposed to be GFP flags and whether to init the object memory.

Fix the call.

Without this fix, __kasan_mempool_unpoison_object provides the object's
size as GFP flags to unpoison_slab_object, which can cause LOCKDEP reports
(and probably other issues).

Link: https://lkml.kernel.org/r/20240614143238.60323-1-andrey.konovalov@linux.dev
Fixes: 29d7355a9d05 ("kasan: save alloc stack traces for mempool")
Signed-off-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Acked-by: Marco Elver &lt;elver@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 29d7355a9d05 ("kasan: save alloc stack traces for mempool") messed
up one of the calls to unpoison_slab_object: the last two arguments are
supposed to be GFP flags and whether to init the object memory.

Fix the call.

Without this fix, __kasan_mempool_unpoison_object provides the object's
size as GFP flags to unpoison_slab_object, which can cause LOCKDEP reports
(and probably other issues).

Link: https://lkml.kernel.org/r/20240614143238.60323-1-andrey.konovalov@linux.dev
Fixes: 29d7355a9d05 ("kasan: save alloc stack traces for mempool")
Signed-off-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Acked-by: Marco Elver &lt;elver@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: handle profiling for fake memory allocations during compaction</title>
<updated>2024-06-25T03:52:09+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2024-06-14T23:05:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=34a023dc88696afed9ade7825f11f87ba657b133'/>
<id>34a023dc88696afed9ade7825f11f87ba657b133</id>
<content type='text'>
During compaction isolated free pages are marked allocated so that they
can be split and/or freed.  For that, post_alloc_hook() is used inside
split_map_pages() and release_free_list().  split_map_pages() marks free
pages allocated, splits the pages and then lets
alloc_contig_range_noprof() free those pages.  release_free_list() marks
free pages and immediately frees them.  This usage of post_alloc_hook()
affect memory allocation profiling because these functions might not be
called from an instrumented allocator, therefore current-&gt;alloc_tag is
NULL and when debugging is enabled (CONFIG_MEM_ALLOC_PROFILING_DEBUG=y)
that causes warnings.  To avoid that, wrap such post_alloc_hook() calls
into an instrumented function which acts as an allocator which will be
charged for these fake allocations.  Note that these allocations are very
short lived until they are freed, therefore the associated counters should
usually read 0.

Link: https://lkml.kernel.org/r/20240614230504.3849136-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Sourav Panda &lt;souravpanda@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During compaction isolated free pages are marked allocated so that they
can be split and/or freed.  For that, post_alloc_hook() is used inside
split_map_pages() and release_free_list().  split_map_pages() marks free
pages allocated, splits the pages and then lets
alloc_contig_range_noprof() free those pages.  release_free_list() marks
free pages and immediately frees them.  This usage of post_alloc_hook()
affect memory allocation profiling because these functions might not be
called from an instrumented allocator, therefore current-&gt;alloc_tag is
NULL and when debugging is enabled (CONFIG_MEM_ALLOC_PROFILING_DEBUG=y)
that causes warnings.  To avoid that, wrap such post_alloc_hook() calls
into an instrumented function which acts as an allocator which will be
charged for these fake allocations.  Note that these allocations are very
short lived until they are freed, therefore the associated counters should
usually read 0.

Link: https://lkml.kernel.org/r/20240614230504.3849136-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Sourav Panda &lt;souravpanda@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/slab: fix 'variable obj_exts set but not used' warning</title>
<updated>2024-06-25T03:52:09+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2024-06-14T22:59:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=b4601d096aac8ed26afa88ef8b249975b0530ca1'/>
<id>b4601d096aac8ed26afa88ef8b249975b0530ca1</id>
<content type='text'>
slab_post_alloc_hook() uses prepare_slab_obj_exts_hook() to obtain
slabobj_ext object.  Currently the only user of slabobj_ext object in this
path is memory allocation profiling, therefore when it's not enabled this
object is not needed.  This also generates a warning when compiling with
CONFIG_MEM_ALLOC_PROFILING=n.  Move the code under this configuration to
fix the warning.  If more slabobj_ext users appear in the future, the code
will have to be changed back to call prepare_slab_obj_exts_hook().

Link: https://lkml.kernel.org/r/20240614225951.3845577-1-surenb@google.com
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202406150444.F6neSaiy-lkp@intel.com/
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
slab_post_alloc_hook() uses prepare_slab_obj_exts_hook() to obtain
slabobj_ext object.  Currently the only user of slabobj_ext object in this
path is memory allocation profiling, therefore when it's not enabled this
object is not needed.  This also generates a warning when compiling with
CONFIG_MEM_ALLOC_PROFILING=n.  Move the code under this configuration to
fix the warning.  If more slabobj_ext users appear in the future, the code
will have to be changed back to call prepare_slab_obj_exts_hook().

Link: https://lkml.kernel.org/r/20240614225951.3845577-1-surenb@google.com
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202406150444.F6neSaiy-lkp@intel.com/
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>/proc/pid/smaps: add mseal info for vma</title>
<updated>2024-06-25T03:52:09+00:00</updated>
<author>
<name>Jeff Xu</name>
<email>jeffxu@chromium.org</email>
</author>
<published>2024-06-14T23:20:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=399ab86ea55039f9d0a5f621a68cb4631f796f37'/>
<id>399ab86ea55039f9d0a5f621a68cb4631f796f37</id>
<content type='text'>
Add sl in /proc/pid/smaps to indicate vma is sealed

Link: https://lkml.kernel.org/r/20240614232014.806352-2-jeffxu@google.com
Fixes: 8be7258aad44 ("mseal: add mseal syscall")
Signed-off-by: Jeff Xu &lt;jeffxu@chromium.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Adhemerval Zanella &lt;adhemerval.zanella@linaro.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jorge Lucangeli Obes &lt;jorgelo@chromium.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Stephen Röttger &lt;sroettger@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add sl in /proc/pid/smaps to indicate vma is sealed

Link: https://lkml.kernel.org/r/20240614232014.806352-2-jeffxu@google.com
Fixes: 8be7258aad44 ("mseal: add mseal syscall")
Signed-off-by: Jeff Xu &lt;jeffxu@chromium.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Adhemerval Zanella &lt;adhemerval.zanella@linaro.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jorge Lucangeli Obes &lt;jorgelo@chromium.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Stephen Röttger &lt;sroettger@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix incorrect vbq reference in purge_fragmented_block</title>
<updated>2024-06-25T03:52:08+00:00</updated>
<author>
<name>Zhaoyang Huang</name>
<email>zhaoyang.huang@unisoc.com</email>
</author>
<published>2024-06-07T02:31:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8c61291fd8500e3b35c7ec0c781b273d8cc96cde'/>
<id>8c61291fd8500e3b35c7ec0c781b273d8cc96cde</id>
<content type='text'>
xa_for_each() in _vm_unmap_aliases() loops through all vbs.  However,
since commit 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks
xarray") the vb from xarray may not be on the corresponding CPU
vmap_block_queue.  Consequently, purge_fragmented_block() might use the
wrong vbq-&gt;lock to protect the free list, leading to vbq-&gt;free breakage.

Incorrect lock protection can exhaust all vmalloc space as follows:
CPU0                                            CPU1
+--------------------------------------------+
|    +--------------------+     +-----+      |
+--&gt; |                    |----&gt;|     |------+
     | CPU1:vbq free_list |     | vb1 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+

_vm_unmap_aliases()                             vb_alloc()
                                                new_vmap_block()
xa_for_each(&amp;vbq-&gt;vmap_blocks, idx, vb)
--&gt; vb in CPU1:vbq-&gt;freelist

purge_fragmented_block(vb)
spin_lock(&amp;vbq-&gt;lock)                           spin_lock(&amp;vbq-&gt;lock)
--&gt; use CPU0:vbq-&gt;lock                          --&gt; use CPU1:vbq-&gt;lock

list_del_rcu(&amp;vb-&gt;free_list)                    list_add_tail_rcu(&amp;vb-&gt;free_list, &amp;vbq-&gt;free)
    __list_del(vb-&gt;prev, vb-&gt;next)
        next-&gt;prev = prev
    +--------------------+
    |                    |
    | CPU1:vbq free_list |
+---|                    |&lt;--+
|   +--------------------+   |
+----------------------------+
                                                __list_add(new, head-&gt;prev, head)
+--------------------------------------------+
|    +--------------------+     +-----+      |
+--&gt; |                    |----&gt;|     |------+
     | CPU1:vbq free_list |     | vb2 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+

        prev-&gt;next = next
+--------------------------------------------+
|----------------------------+               |
|    +--------------------+  |  +-----+      |
+--&gt; |                    |--+  |     |------+
     | CPU1:vbq free_list |     | vb2 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+
Here’s a list breakdown. All vbs, which were to be added to
‘prev’, cannot be used by list_for_each_entry_rcu(vb, &amp;vbq-&gt;free,
free_list) in vb_alloc(). Thus, vmalloc space is exhausted.

This issue affects both erofs and f2fs, the stacktrace is as follows:
erofs:
[&lt;ffffffd4ffb93ad4&gt;] __switch_to+0x174
[&lt;ffffffd4ffb942f0&gt;] __schedule+0x624
[&lt;ffffffd4ffb946f4&gt;] schedule+0x7c
[&lt;ffffffd4ffb947cc&gt;] schedule_preempt_disabled+0x24
[&lt;ffffffd4ffb962ec&gt;] __mutex_lock+0x374
[&lt;ffffffd4ffb95998&gt;] __mutex_lock_slowpath+0x14
[&lt;ffffffd4ffb95954&gt;] mutex_lock+0x24
[&lt;ffffffd4fef2900c&gt;] reclaim_and_purge_vmap_areas+0x44
[&lt;ffffffd4fef25908&gt;] alloc_vmap_area+0x2e0
[&lt;ffffffd4fef24ea0&gt;] vm_map_ram+0x1b0
[&lt;ffffffd4ff1b46f4&gt;] z_erofs_lz4_decompress+0x278
[&lt;ffffffd4ff1b8ac4&gt;] z_erofs_decompress_queue+0x650
[&lt;ffffffd4ff1b8328&gt;] z_erofs_runqueue+0x7f4
[&lt;ffffffd4ff1b66a8&gt;] z_erofs_read_folio+0x104
[&lt;ffffffd4feeb6fec&gt;] filemap_read_folio+0x6c
[&lt;ffffffd4feeb68c4&gt;] filemap_fault+0x300
[&lt;ffffffd4fef0ecac&gt;] __do_fault+0xc8
[&lt;ffffffd4fef0c908&gt;] handle_mm_fault+0xb38
[&lt;ffffffd4ffb9f008&gt;] do_page_fault+0x288
[&lt;ffffffd4ffb9ed64&gt;] do_translation_fault[jt]+0x40
[&lt;ffffffd4fec39c78&gt;] do_mem_abort+0x58
[&lt;ffffffd4ffb8c3e4&gt;] el0_ia+0x70
[&lt;ffffffd4ffb8c260&gt;] el0t_64_sync_handler[jt]+0xb0
[&lt;ffffffd4fec11588&gt;] ret_to_user[jt]+0x0

f2fs:
[&lt;ffffffd4ffb93ad4&gt;] __switch_to+0x174
[&lt;ffffffd4ffb942f0&gt;] __schedule+0x624
[&lt;ffffffd4ffb946f4&gt;] schedule+0x7c
[&lt;ffffffd4ffb947cc&gt;] schedule_preempt_disabled+0x24
[&lt;ffffffd4ffb962ec&gt;] __mutex_lock+0x374
[&lt;ffffffd4ffb95998&gt;] __mutex_lock_slowpath+0x14
[&lt;ffffffd4ffb95954&gt;] mutex_lock+0x24
[&lt;ffffffd4fef2900c&gt;] reclaim_and_purge_vmap_areas+0x44
[&lt;ffffffd4fef25908&gt;] alloc_vmap_area+0x2e0
[&lt;ffffffd4fef24ea0&gt;] vm_map_ram+0x1b0
[&lt;ffffffd4ff1a3b60&gt;] f2fs_prepare_decomp_mem+0x144
[&lt;ffffffd4ff1a6c24&gt;] f2fs_alloc_dic+0x264
[&lt;ffffffd4ff175468&gt;] f2fs_read_multi_pages+0x428
[&lt;ffffffd4ff17b46c&gt;] f2fs_mpage_readpages+0x314
[&lt;ffffffd4ff1785c4&gt;] f2fs_readahead+0x50
[&lt;ffffffd4feec3384&gt;] read_pages+0x80
[&lt;ffffffd4feec32c0&gt;] page_cache_ra_unbounded+0x1a0
[&lt;ffffffd4feec39e8&gt;] page_cache_ra_order+0x274
[&lt;ffffffd4feeb6cec&gt;] do_sync_mmap_readahead+0x11c
[&lt;ffffffd4feeb6764&gt;] filemap_fault+0x1a0
[&lt;ffffffd4ff1423bc&gt;] f2fs_filemap_fault+0x28
[&lt;ffffffd4fef0ecac&gt;] __do_fault+0xc8
[&lt;ffffffd4fef0c908&gt;] handle_mm_fault+0xb38
[&lt;ffffffd4ffb9f008&gt;] do_page_fault+0x288
[&lt;ffffffd4ffb9ed64&gt;] do_translation_fault[jt]+0x40
[&lt;ffffffd4fec39c78&gt;] do_mem_abort+0x58
[&lt;ffffffd4ffb8c3e4&gt;] el0_ia+0x70
[&lt;ffffffd4ffb8c260&gt;] el0t_64_sync_handler[jt]+0xb0
[&lt;ffffffd4fec11588&gt;] ret_to_user[jt]+0x0

To fix this, introducee cpu within vmap_block to record which this vb
belongs to.

Link: https://lkml.kernel.org/r/20240614021352.1822225-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20240607023116.1720640-1-zhaoyang.huang@unisoc.com
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Zhaoyang Huang &lt;zhaoyang.huang@unisoc.com&gt;
Suggested-by: Hailong.Liu &lt;hailong.liu@oppo.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
xa_for_each() in _vm_unmap_aliases() loops through all vbs.  However,
since commit 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks
xarray") the vb from xarray may not be on the corresponding CPU
vmap_block_queue.  Consequently, purge_fragmented_block() might use the
wrong vbq-&gt;lock to protect the free list, leading to vbq-&gt;free breakage.

Incorrect lock protection can exhaust all vmalloc space as follows:
CPU0                                            CPU1
+--------------------------------------------+
|    +--------------------+     +-----+      |
+--&gt; |                    |----&gt;|     |------+
     | CPU1:vbq free_list |     | vb1 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+

_vm_unmap_aliases()                             vb_alloc()
                                                new_vmap_block()
xa_for_each(&amp;vbq-&gt;vmap_blocks, idx, vb)
--&gt; vb in CPU1:vbq-&gt;freelist

purge_fragmented_block(vb)
spin_lock(&amp;vbq-&gt;lock)                           spin_lock(&amp;vbq-&gt;lock)
--&gt; use CPU0:vbq-&gt;lock                          --&gt; use CPU1:vbq-&gt;lock

list_del_rcu(&amp;vb-&gt;free_list)                    list_add_tail_rcu(&amp;vb-&gt;free_list, &amp;vbq-&gt;free)
    __list_del(vb-&gt;prev, vb-&gt;next)
        next-&gt;prev = prev
    +--------------------+
    |                    |
    | CPU1:vbq free_list |
+---|                    |&lt;--+
|   +--------------------+   |
+----------------------------+
                                                __list_add(new, head-&gt;prev, head)
+--------------------------------------------+
|    +--------------------+     +-----+      |
+--&gt; |                    |----&gt;|     |------+
     | CPU1:vbq free_list |     | vb2 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+

        prev-&gt;next = next
+--------------------------------------------+
|----------------------------+               |
|    +--------------------+  |  +-----+      |
+--&gt; |                    |--+  |     |------+
     | CPU1:vbq free_list |     | vb2 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+
Here’s a list breakdown. All vbs, which were to be added to
‘prev’, cannot be used by list_for_each_entry_rcu(vb, &amp;vbq-&gt;free,
free_list) in vb_alloc(). Thus, vmalloc space is exhausted.

This issue affects both erofs and f2fs, the stacktrace is as follows:
erofs:
[&lt;ffffffd4ffb93ad4&gt;] __switch_to+0x174
[&lt;ffffffd4ffb942f0&gt;] __schedule+0x624
[&lt;ffffffd4ffb946f4&gt;] schedule+0x7c
[&lt;ffffffd4ffb947cc&gt;] schedule_preempt_disabled+0x24
[&lt;ffffffd4ffb962ec&gt;] __mutex_lock+0x374
[&lt;ffffffd4ffb95998&gt;] __mutex_lock_slowpath+0x14
[&lt;ffffffd4ffb95954&gt;] mutex_lock+0x24
[&lt;ffffffd4fef2900c&gt;] reclaim_and_purge_vmap_areas+0x44
[&lt;ffffffd4fef25908&gt;] alloc_vmap_area+0x2e0
[&lt;ffffffd4fef24ea0&gt;] vm_map_ram+0x1b0
[&lt;ffffffd4ff1b46f4&gt;] z_erofs_lz4_decompress+0x278
[&lt;ffffffd4ff1b8ac4&gt;] z_erofs_decompress_queue+0x650
[&lt;ffffffd4ff1b8328&gt;] z_erofs_runqueue+0x7f4
[&lt;ffffffd4ff1b66a8&gt;] z_erofs_read_folio+0x104
[&lt;ffffffd4feeb6fec&gt;] filemap_read_folio+0x6c
[&lt;ffffffd4feeb68c4&gt;] filemap_fault+0x300
[&lt;ffffffd4fef0ecac&gt;] __do_fault+0xc8
[&lt;ffffffd4fef0c908&gt;] handle_mm_fault+0xb38
[&lt;ffffffd4ffb9f008&gt;] do_page_fault+0x288
[&lt;ffffffd4ffb9ed64&gt;] do_translation_fault[jt]+0x40
[&lt;ffffffd4fec39c78&gt;] do_mem_abort+0x58
[&lt;ffffffd4ffb8c3e4&gt;] el0_ia+0x70
[&lt;ffffffd4ffb8c260&gt;] el0t_64_sync_handler[jt]+0xb0
[&lt;ffffffd4fec11588&gt;] ret_to_user[jt]+0x0

f2fs:
[&lt;ffffffd4ffb93ad4&gt;] __switch_to+0x174
[&lt;ffffffd4ffb942f0&gt;] __schedule+0x624
[&lt;ffffffd4ffb946f4&gt;] schedule+0x7c
[&lt;ffffffd4ffb947cc&gt;] schedule_preempt_disabled+0x24
[&lt;ffffffd4ffb962ec&gt;] __mutex_lock+0x374
[&lt;ffffffd4ffb95998&gt;] __mutex_lock_slowpath+0x14
[&lt;ffffffd4ffb95954&gt;] mutex_lock+0x24
[&lt;ffffffd4fef2900c&gt;] reclaim_and_purge_vmap_areas+0x44
[&lt;ffffffd4fef25908&gt;] alloc_vmap_area+0x2e0
[&lt;ffffffd4fef24ea0&gt;] vm_map_ram+0x1b0
[&lt;ffffffd4ff1a3b60&gt;] f2fs_prepare_decomp_mem+0x144
[&lt;ffffffd4ff1a6c24&gt;] f2fs_alloc_dic+0x264
[&lt;ffffffd4ff175468&gt;] f2fs_read_multi_pages+0x428
[&lt;ffffffd4ff17b46c&gt;] f2fs_mpage_readpages+0x314
[&lt;ffffffd4ff1785c4&gt;] f2fs_readahead+0x50
[&lt;ffffffd4feec3384&gt;] read_pages+0x80
[&lt;ffffffd4feec32c0&gt;] page_cache_ra_unbounded+0x1a0
[&lt;ffffffd4feec39e8&gt;] page_cache_ra_order+0x274
[&lt;ffffffd4feeb6cec&gt;] do_sync_mmap_readahead+0x11c
[&lt;ffffffd4feeb6764&gt;] filemap_fault+0x1a0
[&lt;ffffffd4ff1423bc&gt;] f2fs_filemap_fault+0x28
[&lt;ffffffd4fef0ecac&gt;] __do_fault+0xc8
[&lt;ffffffd4fef0c908&gt;] handle_mm_fault+0xb38
[&lt;ffffffd4ffb9f008&gt;] do_page_fault+0x288
[&lt;ffffffd4ffb9ed64&gt;] do_translation_fault[jt]+0x40
[&lt;ffffffd4fec39c78&gt;] do_mem_abort+0x58
[&lt;ffffffd4ffb8c3e4&gt;] el0_ia+0x70
[&lt;ffffffd4ffb8c260&gt;] el0t_64_sync_handler[jt]+0xb0
[&lt;ffffffd4fec11588&gt;] ret_to_user[jt]+0x0

To fix this, introducee cpu within vmap_block to record which this vb
belongs to.

Link: https://lkml.kernel.org/r/20240614021352.1822225-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20240607023116.1720640-1-zhaoyang.huang@unisoc.com
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Zhaoyang Huang &lt;zhaoyang.huang@unisoc.com&gt;
Suggested-by: Hailong.Liu &lt;hailong.liu@oppo.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock</title>
<updated>2024-06-23T14:32:24+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-06-23T14:32:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=0971e82ea34c5e01cd3e68d231caa81780e8cafb'/>
<id>0971e82ea34c5e01cd3e68d231caa81780e8cafb</id>
<content type='text'>
Pull memblock fix from Mike Rapoport:
 "Fix fragility in checks for unset node ID.

  Use numa_valid_node() function to verify that nid is a valid node
  ID instead of inconsistent comparisons with either NUMA_NO_NODE or
  MAX_NUMNODES"

* tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: use numa_valid_node() helper to check for invalid node ID
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull memblock fix from Mike Rapoport:
 "Fix fragility in checks for unset node ID.

  Use numa_valid_node() function to verify that nid is a valid node
  ID instead of inconsistent comparisons with either NUMA_NO_NODE or
  MAX_NUMNODES"

* tag 'fixes-2024-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: use numa_valid_node() helper to check for invalid node ID
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2024-06-17T19:30:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-06-17T19:30:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e6b324fbf2de1797a4756fe2a489442464738dad'/>
<id>e6b324fbf2de1797a4756fe2a489442464738dad</id>
<content type='text'>
Pull misc fixes from Andrew Morton:
 "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"

* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kcov: don't lose track of remote references during softirqs
  mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
  mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
  mm: fix possible OOB in numa_rebuild_large_mapping()
  mm/migrate: fix kernel BUG at mm/compaction.c:2761!
  selftests: mm: make map_fixed_noreplace test names stable
  mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
  mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
  gcov: add support for GCC 14
  zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
  mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
  lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
  MAINTAINERS: remove Lorenzo as vmalloc reviewer
  Revert "mm: init_mlocked_on_free_v3"
  mm/page_table_check: fix crash on ZONE_DEVICE
  gcc: disable '-Warray-bounds' for gcc-9
  ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
  ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull misc fixes from Andrew Morton:
 "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"

* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kcov: don't lose track of remote references during softirqs
  mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
  mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
  mm: fix possible OOB in numa_rebuild_large_mapping()
  mm/migrate: fix kernel BUG at mm/compaction.c:2761!
  selftests: mm: make map_fixed_noreplace test names stable
  mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
  mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
  gcov: add support for GCC 14
  zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
  mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
  lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
  MAINTAINERS: remove Lorenzo as vmalloc reviewer
  Revert "mm: init_mlocked_on_free_v3"
  mm/page_table_check: fix crash on ZONE_DEVICE
  gcc: disable '-Warray-bounds' for gcc-9
  ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
  ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
</pre>
</div>
</content>
</entry>
</feed>
