summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-07-25netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()Baokun Li1-0/+6
[ Upstream commit 85b08b31a22b481ec6528130daf94eee4452e23f ] Export fscache_put_volume() and add fscache_try_get_volume() helper function to allow cachefiles to get/put fscache_volume via linux/fscache-cache.h. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-2-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: 522018a0de6b ("cachefiles: fix slab-use-after-free in fscache_withdraw_volume()") Stable-dep-of: 5d8f80578907 ("cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25minmax: relax check to allow comparison between unsigned arguments and ↵David Laight1-7/+17
signed constants commit 867046cc7027703f60a46339ffde91a1970f2901 upstream. Allow (for example) min(unsigned_var, 20). The opposite min(signed_var, 20u) is still errored. Since a comparison between signed and unsigned never makes the unsigned value negative it is only necessary to adjust the __types_ok() test. Link: https://lkml.kernel.org/r/633b64e2f39e46bb8234809c5595b8c7@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 867046cc7027703f60a46339ffde91a1970f2901) Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25minmax: allow comparisons of 'int' against 'unsigned char/short'David Laight1-2/+3
commit 4ead534fba42fc4fd41163297528d2aa731cd121 upstream. Since 'unsigned char/short' get promoted to 'signed int' it is safe to compare them against an 'int' value. Link: https://lkml.kernel.org/r/8732ef5f809c47c28a7be47c938b28d4@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4ead534fba42fc4fd41163297528d2aa731cd121) Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25minmax: allow min()/max()/clamp() if the arguments have the same signedness.David Laight1-28/+32
commit d03eba99f5bf7cbc6e2fdde3b6fa36954ad58e09 upstream. The type-check in min()/max() is there to stop unexpected results if a negative value gets converted to a large unsigned value. However it also rejects 'unsigned int' v 'unsigned long' compares which are common and never problematc. Replace the 'same type' check with a 'same signedness' check. The new test isn't itself a compile time error, so use static_assert() to report the error and give a meaningful error message. Due to the way builtin_choose_expr() works detecting the error in the 'non-constant' side (where static_assert() can be used) also detects errors when the arguments are constant. Link: https://lkml.kernel.org/r/fe7e6c542e094bfca655abcd323c1c98@AcuMS.aculab.com Signed-off-by: David Laight <david.laight@aculab.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit d03eba99f5bf7cbc6e2fdde3b6fa36954ad58e09) Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25minmax: fix header inclusionsAndy Shevchenko1-0/+2
commit f6e9d38f8eb00ac8b52e6d15f6aa9bcecacb081b upstream. BUILD_BUG_ON*() macros are defined in build_bug.h. Include it. Replace compiler_types.h by compiler.h, which provides the former, to have a definition of the __UNIQUE_ID(). Link: https://lkml.kernel.org/r/20230912092355.79280-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Herve Codina <herve.codina@bootlin.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit f6e9d38f8eb00ac8b52e6d15f6aa9bcecacb081b) Signed-off-by: SeongJae Park <sj@kernel.org> [Fix a conflict due to absence of compiler_types.h include] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25minmax: clamp more efficiently by avoiding extra comparisonJason A. Donenfeld1-1/+1
commit 2122e2a4efc2cd139474079e11939b6e07adfacd upstream. Currently the clamp algorithm does: if (val > hi) val = hi; if (val < lo) val = lo; But since hi > lo by definition, this can be made more efficient with: if (val > hi) val = hi; else if (val < lo) val = lo; So fix up the clamp and clamp_t functions to do this, adding the same argument checking as for min and min_t. For simple cases, code generation on x86_64 and aarch64 stay about the same: before: cmp edi, edx mov eax, esi cmova edi, edx cmp edi, esi cmovnb eax, edi ret after: cmp edi, esi mov eax, edx cmovnb esi, edi cmp edi, edx cmovb eax, esi ret before: cmp w0, w2 csel w8, w0, w2, lo cmp w8, w1 csel w0, w8, w1, hi ret after: cmp w0, w1 csel w8, w0, w1, hi cmp w0, w2 csel w0, w8, w2, lo ret On MIPS64, however, code generation improves, by removing arithmetic in the second branch: before: sltu $3,$6,$4 bne $3,$0,.L2 move $2,$6 move $2,$4 .L2: sltu $3,$2,$5 bnel $3,$0,.L7 move $2,$5 .L7: jr $31 nop after: sltu $3,$4,$6 beq $3,$0,.L13 move $2,$6 sltu $3,$4,$5 bne $3,$0,.L12 move $2,$4 .L13: jr $31 nop .L12: jr $31 move $2,$5 For more complex cases with surrounding code, the effects are a bit more complicated. For example, consider this simplified version of timestamp_truncate() from fs/inode.c on x86_64: struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode) { struct super_block *sb = inode->i_sb; unsigned int gran = sb->s_time_gran; t.tv_sec = clamp(t.tv_sec, sb->s_time_min, sb->s_time_max); if (t.tv_sec == sb->s_time_max || t.tv_sec == sb->s_time_min) t.tv_nsec = 0; return t; } before: mov r8, rdx mov rdx, rsi mov rcx, QWORD PTR [r8] mov rax, QWORD PTR [rcx+8] mov rcx, QWORD PTR [rcx+16] cmp rax, rdi mov r8, rcx cmovge rdi, rax cmp rdi, rcx cmovle r8, rdi cmp rax, r8 je .L4 cmp rdi, rcx jge .L4 mov rax, r8 ret .L4: xor edx, edx mov rax, r8 ret after: mov rax, QWORD PTR [rdx] mov rdx, QWORD PTR [rax+8] mov rax, QWORD PTR [rax+16] cmp rax, rdi jg .L6 mov r8, rax xor edx, edx .L2: mov rax, r8 ret .L6: cmp rdx, rdi mov r8, rdi cmovge r8, rdx cmp rax, r8 je .L4 xor eax, eax cmp rdx, rdi cmovl rax, rsi mov rdx, rax mov rax, r8 ret .L4: xor edx, edx jmp .L2 In this case, we actually gain a branch, unfortunately, because the compiler's replacement axioms no longer as cleanly apply. So all and all, this change is a bit of a mixed bag. Link: https://lkml.kernel.org/r/20220926133435.1333846-2-Jason@zx2c4.com Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 2122e2a4efc2cd139474079e11939b6e07adfacd) Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25minmax: sanity check constant bounds when clampingJason A. Donenfeld1-2/+24
commit 5efcecd9a3b18078d3398b359a84c83f549e22cf upstream. The clamp family of functions only makes sense if hi>=lo. If hi and lo are compile-time constants, then raise a build error. Doing so has already caught buggy code. This also introduces the infrastructure to improve the clamping function in subsequent commits. [akpm@linux-foundation.org: coding-style cleanups] [akpm@linux-foundation.org: s@&&\@&& \@] Link: https://lkml.kernel.org/r/20220926133435.1333846-1-Jason@zx2c4.com Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 5efcecd9a3b18078d3398b359a84c83f549e22cf) Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-18bpf: use bpf_map_kvcalloc in bpf_local_storageYafang Shao1-0/+8
[ Upstream commit ddef81b5fd1da4d7c3cc8785d2043b73b72f38ef ] Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Stable-dep-of: af253aef183a ("bpf: fix order of args in call to bpf_map_kvcalloc") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-18bpf: Refactor some inode/task/sk storage functions for reuseYonghong Song1-10/+7
[ Upstream commit c83597fa5dc6b322e9bdf929e5f4136a3f4aa4db ] Refactor codes so that inode/task/sk storage implementation can maximally share the same code. I also added some comments in new function bpf_local_storage_unlink_nolock() to make codes easy to understand. There is no functionality change. Acked-by: David Vernet <void@manifault.com> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042845.672944-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Stable-dep-of: af253aef183a ("bpf: fix order of args in call to bpf_map_kvcalloc") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-18mm: prevent derefencing NULL ptr in pfn_section_valid()Waiman Long1-1/+2
[ Upstream commit 82f0b6f041fad768c28b4ad05a683065412c226e ] Commit 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage") changed pfn_section_valid() to add a READ_ONCE() call around "ms->usage" to fix a race with section_deactivate() where ms->usage can be cleared. The READ_ONCE() call, by itself, is not enough to prevent NULL pointer dereference. We need to check its value before dereferencing it. Link: https://lkml.kernel.org/r/20240626001639.1350646-1-longman@redhat.com Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage") Signed-off-by: Waiman Long <longman@redhat.com> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-18Compiler Attributes: Add __uninitialized macroHeiko Carstens1-0/+12
commit fd7eea27a3aed79b63b1726c00bde0d50cf207e2 upstream. With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled the kernel will be compiled with -ftrivial-auto-var-init=<...> which causes initialization of stack variables at function entry time. In order to avoid the performance impact that comes with this users can use the "uninitialized" attribute to prevent such initialization. Therefore provide the __uninitialized macro which can be used for cases where INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled, but only selected variables should not be initialized. Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240205154844.3757121-2-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-11ima: Avoid blocking in RCU read-side critical sectionGUO Zihua2-3/+4
commit 9a95c5bfbf02a0a7f5983280fe284a0ff0836c34 upstream. A panic happens in ima_match_policy: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 PGD 42f873067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 5 PID: 1286325 Comm: kubeletmonit.sh Kdump: loaded Tainted: P Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:ima_match_policy+0x84/0x450 Code: 49 89 fc 41 89 cf 31 ed 89 44 24 14 eb 1c 44 39 7b 18 74 26 41 83 ff 05 74 20 48 8b 1b 48 3b 1d f2 b9 f4 00 0f 84 9c 01 00 00 <44> 85 73 10 74 ea 44 8b 6b 14 41 f6 c5 01 75 d4 41 f6 c5 02 74 0f RSP: 0018:ff71570009e07a80 EFLAGS: 00010207 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000200 RDX: ffffffffad8dc7c0 RSI: 0000000024924925 RDI: ff3e27850dea2000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffabfce739 R10: ff3e27810cc42400 R11: 0000000000000000 R12: ff3e2781825ef970 R13: 00000000ff3e2785 R14: 000000000000000c R15: 0000000000000001 FS: 00007f5195b51740(0000) GS:ff3e278b12d40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000626d24002 CR4: 0000000000361ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ima_get_action+0x22/0x30 process_measurement+0xb0/0x830 ? page_add_file_rmap+0x15/0x170 ? alloc_set_pte+0x269/0x4c0 ? prep_new_page+0x81/0x140 ? simple_xattr_get+0x75/0xa0 ? selinux_file_open+0x9d/0xf0 ima_file_check+0x64/0x90 path_openat+0x571/0x1720 do_filp_open+0x9b/0x110 ? page_counter_try_charge+0x57/0xc0 ? files_cgroup_alloc_fd+0x38/0x60 ? __alloc_fd+0xd4/0x250 ? do_sys_open+0x1bd/0x250 do_sys_open+0x1bd/0x250 do_syscall_64+0x5d/0x1d0 entry_SYSCALL_64_after_hwframe+0x65/0xca Commit c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") introduced call to ima_lsm_copy_rule within a RCU read-side critical section which contains kmalloc with GFP_KERNEL. This implies a possible sleep and violates limitations of RCU read-side critical sections on non-PREEMPT systems. Sleeping within RCU read-side critical section might cause synchronize_rcu() returning early and break RCU protection, allowing a UAF to happen. The root cause of this issue could be described as follows: | Thread A | Thread B | | |ima_match_policy | | | rcu_read_lock | |ima_lsm_update_rule | | | synchronize_rcu | | | | kmalloc(GFP_KERNEL)| | | sleep | ==> synchronize_rcu returns early | kfree(entry) | | | | entry = entry->next| ==> UAF happens and entry now becomes NULL (or could be anything). | | entry->action | ==> Accessing entry might cause panic. To fix this issue, we are converting all kmalloc that is called within RCU read-side critical section to use GFP_ATOMIC. Fixes: c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") Cc: stable@vger.kernel.org Signed-off-by: GUO Zihua <guozihua@huawei.com> Acked-by: John Johansen <john.johansen@canonical.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> [PM: fixed missing comment, long lines, !CONFIG_IMA_LSM_RULES case] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-11fsnotify: Do not generate events for O_PATH file descriptorsJan Kara1-1/+7
commit 702eb71fd6501b3566283f8c96d7ccc6ddd662e9 upstream. Currently we will not generate FS_OPEN events for O_PATH file descriptors but we will generate FS_CLOSE events for them. This is asymmetry is confusing. Arguably no fsnotify events should be generated for O_PATH file descriptors as they cannot be used to access or modify file content, they are just convenient handles to file objects like paths. So fix the asymmetry by stopping to generate FS_CLOSE for O_PATH file descriptors. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240617162303.1596-1-jack@suse.cz Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-11locking/mutex: Introduce devm_mutex_init()George Stark1-0/+27
[ Upstream commit 4cd47222e435dec8e3787614924174f53fcfb5ae ] Using of devm API leads to a certain order of releasing resources. So all dependent resources which are not devm-wrapped should be deleted with respect to devm-release order. Mutex is one of such objects that often is bound to other resources and has no own devm wrapping. Since mutex_destroy() actually does nothing in non-debug builds frequently calling mutex_destroy() is just ignored which is safe for now but wrong formally and can lead to a problem if mutex_destroy() will be extended so introduce devm_mutex_init(). Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: George Stark <gnstark@salutedevices.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Marek Behún <kabel@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20240411161032.609544-2-gnstark@salutedevices.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05efi: memmap: Move manipulation routines into x86 arch treeArd Biesheuvel1-9/+1
commit fdc6d38d64a20c542b1867ebeb8dd03b98829336 upstream. The EFI memory map is a description of the memory layout as provided by the firmware, and only x86 manipulates it in various different ways for its own memory bookkeeping. So let's move the memmap routines that are only used by x86 into the x86 arch tree. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05mm/page_alloc: Separate THP PCP into movable and non-movable categoriesyangge1-5/+4
commit bf14ed81f571f8dba31cd72ab2e50fbcc877cc31 upstream. Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") no longer differentiates the migration type of pages in THP-sized PCP list, it's possible that non-movable allocation requests may get a CMA page from the list, in some cases, it's not acceptable. If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual machine with device passthrough will get stuck. During starting the virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But if non-movable allocation requests return CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA page to another CMA page, which will fail to pass the check in check_and_migrate_movable_pages() and cause migration endless. Call trace: pin_user_pages_remote --__gup_longterm_locked // endless loops in this function ----_get_user_pages_locked ----check_and_migrate_movable_pages ------migrate_longterm_unpinnable_pages --------alloc_migration_target This problem will also have a negative impact on CMA itself. For example, when CMA is borrowed by THP, and we need to reclaim it through cma_alloc() or dma_alloc_coherent(), we must move those pages out to ensure CMA's users can retrieve that contigous memory. Currently, CMA's memory is occupied by non-movable pages, meaning we can't relocate them. As a result, cma_alloc() is more likely to fail. To fix the problem above, we add one PCP list for THP, which will not introduce a new cacheline for struct per_cpu_pages. THP will have 2 PCP lists, one PCP list is used by MOVABLE allocation, and the other PCP list is used by UNMOVABLE allocation. MOVABLE allocation contains GPF_MOVABLE, and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE. Link: https://lkml.kernel.org/r/1718845190-4456-1-git-send-email-yangge1116@126.com Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") Signed-off-by: yangge <yangge1116@126.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05syscalls: fix sys_fanotify_mark prototypeArnd Bergmann1-0/+6
[ Upstream commit 63e2f40c9e3187641afacde4153f54b3ee4dbc8c ] My earlier fix missed an incorrect function prototype that shows up on native 32-bit builds: In file included from fs/notify/fanotify/fanotify_user.c:14: include/linux/syscalls.h:248:25: error: conflicting types for 'sys_fanotify_mark'; have 'long int(int, unsigned int, u32, u32, int, const char *)' {aka 'long int(int, unsigned int, unsigned int, unsigned int, int, const char *)'} 1924 | SYSCALL32_DEFINE6(fanotify_mark, | ^~~~~~~~~~~~~~~~~ include/linux/syscalls.h:862:17: note: previous declaration of 'sys_fanotify_mark' with type 'long int(int, unsigned int, u64, int, const char *)' {aka 'long int(int, unsigned int, long long unsigned int, int, const char *)'} On x86 and powerpc, the prototype is also wrong but hidden in an #ifdef, so it never caused problems. Add another alternative declaration that matches the conditional function definition. Fixes: 403f17a33073 ("parisc: use generic sys_fanotify_mark implementation") Cc: stable@vger.kernel.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05ftruncate: pass a signed offsetArnd Bergmann2-2/+2
commit 4b8e88e563b5f666446d002ad0dc1e6e8e7102b0 upstream. The old ftruncate() syscall, using the 32-bit off_t misses a sign extension when called in compat mode on 64-bit architectures. As a result, passing a negative length accidentally succeeds in truncating to file size between 2GiB and 4GiB. Changing the type of the compat syscall to the signed compat_off_t changes the behavior so it instead returns -EINVAL. The native entry point, the truncate() syscall and the corresponding loff_t based variants are all correct already and do not suffer from this mistake. Fixes: 3f6d078d4acc ("fix compat truncate/ftruncate") Reviewed-by: Christian Brauner <brauner@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05nvme: fixup comment for nvme RDMA Provider TypeHannes Reinecke1-2/+2
[ Upstream commit f80a55fa90fa76d01e3fffaa5d0413e522ab9a00 ] PRTYPE is the provider type, not the QP service type. Fixes: eb793e2c9286 ("nvme.h: add NVMe over Fabrics definitions") Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05wifi: ieee80211: check for NULL in ieee80211_mle_size_ok()Johannes Berg1-1/+1
[ Upstream commit b7793a1a2f370c28b17d9554b58e9dc51afcfcbd ] For simplicity, we may want to pass a NULL element, and while we should then pass also a zero length, just be a bit more careful here. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.4d983653cb8d.Ic3ea99b60c61ac2f7d38cb9fd202a03c97a05601@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05bpf: Take return from set_memory_ro() into account with bpf_prog_lock_ro()Christophe Leroy1-2/+3
[ Upstream commit 7d2cc63eca0c993c99d18893214abf8f85d566d8 ] set_memory_ro() can fail, leaving memory unprotected. Check its return and take it into account as an error. Link: https://github.com/KSPP/linux/issues/7 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: linux-hardening@vger.kernel.org <linux-hardening@vger.kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Message-ID: <286def78955e04382b227cb3e4b6ba272a7442e3.1709850515.git.christophe.leroy@csgroup.eu> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTELTony Luck1-0/+2
[ Upstream commit 93022482b2948a9a7e9b5a2bb685f2e1cb4c3348 ] Code in v6.9 arch/x86/kernel/smpboot.c was changed by commit 4db64279bc2b ("x86/cpu: Switch to new Intel CPU model defines") from: static const struct x86_cpu_id intel_cod_cpu[] = { X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0), /* COD */ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0), /* COD */ X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ <--- 443 {} }; static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu); to: static const struct x86_cpu_id intel_cod_cpu[] = { X86_MATCH_VFM(INTEL_HASWELL_X, 0), /* COD */ X86_MATCH_VFM(INTEL_BROADWELL_X, 0), /* COD */ X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ {} }; static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu); On an Intel CPU with SNC enabled this code previously matched the rule on line 443 to avoid printing messages about insane cache configuration. The new code did not match any rules. Expanding the macros for the intel_cod_cpu[] array shows that the old is equivalent to: static const struct x86_cpu_id intel_cod_cpu[] = { [0] = { .vendor = 0, .family = 6, .model = 0x3F, .steppings = 0, .feature = 0, .driver_data = 0 }, [1] = { .vendor = 0, .family = 6, .model = 0x4F, .steppings = 0, .feature = 0, .driver_data = 0 }, [2] = { .vendor = 0, .family = 6, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 1 }, [3] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 0 } } while the new code expands to: static const struct x86_cpu_id intel_cod_cpu[] = { [0] = { .vendor = 0, .family = 6, .model = 0x3F, .steppings = 0, .feature = 0, .driver_data = 0 }, [1] = { .vendor = 0, .family = 6, .model = 0x4F, .steppings = 0, .feature = 0, .driver_data = 0 }, [2] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 1 }, [3] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 0 } } Looking at the code for x86_match_cpu(): const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match) { const struct x86_cpu_id *m; struct cpuinfo_x86 *c = &boot_cpu_data; for (m = match; m->vendor | m->family | m->model | m->steppings | m->feature; m++) { ... } return NULL; it is clear that there was no match because the ANY entry in the table (array index 2) is now the loop termination condition (all of vendor, family, model, steppings, and feature are zero). So this code was working before because the "ANY" check was looking for any Intel CPU in family 6. But fails now because the family is a wild card. So the root cause is that x86_match_cpu() has never been able to match on a rule with just X86_VENDOR_INTEL and all other fields set to wildcards. Add a new flags field to struct x86_cpu_id that has a bit set to indicate that this entry in the array is valid. Update X86_MATCH*() macros to set that bit. Change the end-marker check in x86_match_cpu() to just check the flags field for this bit. Backporter notes: The commit in Fixes is really the one that is broken: you can't have m->vendor as part of the loop termination conditional in x86_match_cpu() because it can happen - as it has happened above - that that whole conditional is 0 albeit vendor == 0 is a valid case - X86_VENDOR_INTEL is 0. However, the only case where the above happens is the SNC check added by 4db64279bc2b1 so you only need this fix if you have backported that other commit 4db64279bc2b ("x86/cpu: Switch to new Intel CPU model defines") Fixes: 644e9cbbe3fc ("Add driver auto probing for x86 features v4") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable+noautosel@kernel.org> # see above Link: https://lore.kernel.org/r/20240517144312.GBZkdtAOuJZCvxhFbJ@fat_crate.local Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27kcov: don't lose track of remote references during softirqsAleksandr Nogikh1-0/+2
commit 01c8f9806bde438ca1c8cbbc439f0a14a6694f6c upstream. In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV metadata of the current task into a per-CPU variable. However, the kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote KCOV objects. If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens to get interrupted and kcov_remote_start() is called, it ultimately leads to kcov_remote_stop() NOT restoring the original KCOV reference. So when the task exits, all registered remote KCOV handles remain active forever. The most uncomfortable effect (at least for syzkaller) is that the bug prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If we obtain it in the parent process and then e.g. drop some capabilities and continuously fork to execute individual programs, at some point current->kcov of the forked process is lost, kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls calls from subsequent forks fail. And, yes, the efficiency is also affected if we keep on losing remote kcov objects. a) kcov_remote_map keeps on growing forever. b) (If I'm not mistaken), we're also not freeing the memory referenced by kcov->area. Fix it by introducing a special kcov_mode that is assigned to the task that owns a KCOV remote object. It makes kcov_mode_enabled() return true and yet does not trigger coverage collection in __sanitizer_cov_trace_pc() and write_comp_data(). [nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment] Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27tty: add the option to have a tty reject a new ldiscLinus Torvalds1-0/+8
[ Upstream commit 6bd23e0c2bb6c65d4f5754d1456bc9a4427fc59b ] ... and use it to limit the virtual terminals to just N_TTY. They are kind of special, and in particular, the "con_write()" routine violates the "writes cannot sleep" rule that some ldiscs rely on. This avoids the BUG: sleeping function called from invalid context at kernel/printk/printk.c:2659 when N_GSM has been attached to a virtual console, and gsmld_write() calls con_write() while holding a spinlock, and con_write() then tries to get the console lock. Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Starke <daniel.starke@siemens.com> Reported-by: syzbot <syzbot+dbac96d8e73b61aa559c@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=dbac96d8e73b61aa559c Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240423163339.59780-1-torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-21serial: core: Add UPIO_UNKNOWN constant for unknown port typeAndy Shevchenko1-0/+1
[ Upstream commit 79d713baf63c8f23cc58b304c40be33d64a12aaf ] In some APIs we would like to assign the special value to iotype and compare against it in another places. Introduce UPIO_UNKNOWN for this purpose. Note, we can't use 0, because it's a valid value for IO port access. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240304123035.758700-3-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 87d80bfbd577 ("serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-21net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPPKory Maincent1-2/+2
[ Upstream commit 144ba8580bcb82b2686c3d1a043299d844b9a682 ] ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by checkpatch script. Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://lore.kernel.org/r/20240610083426.740660-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-21i2c: add fwnode APIsRussell King (Oracle)1-3/+21
[ Upstream commit 373c612d72461ddaea223592df31e62c934aae61 ] Add fwnode APIs for finding and getting I2C adapters, which will be used by the SFP code. These are passed the fwnode corresponding to the adapter, and return the I2C adapter. It is the responsibility of the caller to find the appropriate fwnode. We keep the DT and ACPI interfaces, but where appropriate, recode them to use the fwnode interfaces internally. Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Wolfram Sang <wsa@kernel.org> Stable-dep-of: 3f858bbf04db ("i2c: acpi: Unbind mux adapters before delete") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-16smp: Provide 'setup_max_cpus' definition on UP tooIngo Molnar1-0/+2
commit 3c2f8859ae1ce53f2a89c8e4ca4092101afbff67 upstream. This was already defined locally by init/main.c, but let's make it generic, as arch/x86/kernel/cpu/topology.c is going to make use of it to have more uniform code. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16mmc: core: Add mmc_gpiod_set_cd_config() functionHans de Goede1-0/+1
commit 63a7cd660246aa36af263b85c33ecc6601bf04be upstream. Some mmc host drivers may need to fixup a card-detection GPIO's config to e.g. enable the GPIO controllers builtin pull-up resistor on devices where the firmware description of the GPIO is broken (e.g. GpioInt with PullNone instead of PullUp in ACPI DSDT). Since this is the exception rather then the rule adding a config parameter to mmc_gpiod_request_cd() seems undesirable, so instead add a new mmc_gpiod_set_cd_config() function. This is simply a wrapper to call gpiod_set_config() on the card-detect GPIO acquired through mmc_gpiod_request_cd(). Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410191639.526324-2-hdegoede@redhat.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12fpga: region: add owner module and take its refcountMarco Pagani1-3/+10
[ Upstream commit b7c0e1ecee403a43abc89eb3e75672b01ff2ece9 ] The current implementation of the fpga region assumes that the low-level module registers a driver for the parent device and uses its owner pointer to take the module's refcount. This approach is problematic since it can lead to a null pointer dereference while attempting to get the region during programming if the parent device does not have a driver. To address this problem, add a module owner pointer to the fpga_region struct and use it to take the module's refcount. Modify the functions for registering a region to take an additional owner module parameter and rename them to avoid conflicts. Use the old function names for helper macros that automatically set the module that registers the region as the owner. This ensures compatibility with existing low-level control modules and reduces the chances of registering a region without setting the owner. Also, update the documentation to keep it consistent with the new interface for registering an fpga region. Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA") Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Xu Yilun <yilun.xu@intel.com> Reviewed-by: Russ Weight <russ.weight@linux.dev> Signed-off-by: Marco Pagani <marpagan@redhat.com> Acked-by: Xu Yilun <yilun.xu@intel.com> Link: https://lore.kernel.org/r/20240419083601.77403-1-marpagan@redhat.com Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12counter: linux/counter.h: fix Excess kernel-doc description warningRandy Dunlap1-1/+0
[ Upstream commit 416bdb89605d960405178b9bf04df512d1ace1a3 ] Remove the @priv: line to prevent the kernel-doc warning: include/linux/counter.h:400: warning: Excess struct member 'priv' description in 'counter_device' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Fixes: f2ee4759fb70 ("counter: remove old and now unused registration API") Link: https://lore.kernel.org/r/20231223050511.13849-1-rdunlap@infradead.org Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12net: add pskb_may_pull_reason() helperEric Dumazet1-4/+15
[ Upstream commit 1fb2d41501f38192d8a19da585cd441cf8845697 ] pskb_may_pull() can fail for two different reasons. Provide pskb_may_pull_reason() helper to distinguish between these reasons. It returns: SKB_NOT_DROPPED_YET : Success SKB_DROP_REASON_PKT_TOO_SMALL : packet too small SKB_DROP_REASON_NOMEM : skb->head could not be resized Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 8bd67ebb50c0 ("net: bridge: xmit: make sure we have at least eth header len bytes") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12dev_printk: Add and use dev_no_printk()Geert Uytterhoeven1-12/+13
[ Upstream commit c26ec799042a3888935d59b599f33e41efedf5f8 ] When printk-indexing is enabled, each dev_printk() invocation emits a pi_entry structure. This is even true when the dev_printk() is protected by an always-false check, as is typically the case for debug messages: while the actual code to print the message is optimized out by the compiler, the pi_entry structure is still emitted. Avoid emitting pi_entry structures for unavailable dev_printk() kernel messages by: 1. Introducing a dev_no_printk() helper, mimicked after the existing no_printk() helper, which calls _dev_printk() instead of dev_printk(), 2. Replacing all "if (0) dev_printk(...)" constructs by calls to the new helper. This reduces the size of an arm64 defconfig kernel with CONFIG_PRINTK_INDEX=y by 957 KiB. Fixes: ad7d61f159db7397 ("printk: index: Add indexing support to dev_printk") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Chris Down <chris@chrisdown.name> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/8583d54f1687c801c6cda8edddf2cf0344c6e883.1709127473.git.geert+renesas@glider.be Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12printk: Let no_printk() use _printk()Geert Uytterhoeven1-1/+1
[ Upstream commit 8522f6b760ca588928eede740d5d69dd1e936b49 ] When printk-indexing is enabled, each printk() invocation emits a pi_entry structure, containing the format string and other information related to its location in the kernel sources. This is even true for no_printk(): while the actual code to print the message is optimized out by the compiler due to the always-false check, the pi_entry structure is still emitted. As the main purpose of no_printk() is to provide a helper to maintain printf()-style format checking when debugging is disabled, this leads to the inclusion in the index of lots of printk formats that cannot be emitted by the current kernel. Fix this by switching no_printk() from printk() to _printk(). This reduces the size of an arm64 defconfig kernel with CONFIG_PRINTK_INDEX=y by 576 KiB. Fixes: 337015573718b161 ("printk: Userspace format indexing support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Chris Down <chris@chrisdown.name> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/56cf92edccffea970e1f40a075334dd6cf5bb2a4.1709127473.git.geert+renesas@glider.be Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12net/mlx5: Add a timeout to acquire the command queue semaphoreAkiva Goldberger1-0/+1
[ Upstream commit 485d65e1357123a697c591a5aeb773994b247ad7 ] Prevent forced completion handling on an entry that has not yet been assigned an index, causing an out of bounds access on idx = -22. Instead of waiting indefinitely for the sem, blocking flow now waits for index to be allocated or a sem acquisition timeout before beginning the timer for FW completion. Kernel log example: mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()Robert Richter1-6/+1
[ Upstream commit f9f67e5adc8dc2e1cc51ab2d3d6382fa97f074d4 ] For configurations that have the kconfig option NUMA_KEEP_MEMINFO disabled, numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1). SRAT lookup fails then because an existing SRAT memory range cannot be found for a CFMWS address range. This causes the addition of a duplicate numa_memblk with a different node id and a subsequent page fault and kernel crash during boot. Fix this by making numa_fill_memblks() always available regardless of NUMA_KEEP_MEMINFO. As Dan suggested, the fix is implemented to remove numa_fill_memblks() from sparsemem.h and alos using __weak for the funct