Age | Commit message (Collapse) | Author | Files | Lines |
|
commit dc783ba4b9df3fb3e76e968b2cbeb9960069263c upstream.
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
[akpm@linux-foundation.org: add comment]
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
|
|
[ Upstream commit 7424fc6b86c8980a87169e005f5cd4438d18efe6 ]
Currently ARM64 extracts which specific sanitizer has caused a trap via
encoded data in the trap instruction. Clang on x86 currently encodes the
same data in the UD1 instruction but x86 handle_bug() and
is_valid_bugaddr() currently only look at UD2.
Bring x86 to parity with ARM64, similar to commit 25b84002afb9 ("arm64:
Support Clang UBSAN trap codes for better reporting"). See the llvm
links for information about the code generation.
Enable the reporting of UBSAN sanitizer details on x86 compiled with clang
when CONFIG_UBSAN_TRAP=y by analysing UD1 and retrieving the type immediate
which is encoded by the compiler after the UD1.
[ tglx: Simplified it by moving the printk() into handle_bug() ]
Signed-off-by: Gatlin Newhouse <gatlin.newhouse@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20240724000206.451425-1-gatlin.newhouse@gmail.com
Link: https://github.com/llvm/llvm-project/commit/c5978f42ec8e9#diff-bb68d7cd885f41cfc35843998b0f9f534adb60b415f647109e597ce448e92d9f
Link: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86InstrSystem.td#L27
Stable-dep-of: 1db272864ff2 ("x86/traps: move kmsan check after instrumentation_begin")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c749d9b7ebbc5716af7a95f7768634b30d9446ec ]
generic/077 on x86_32 CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y with highmem,
on huge=always tmpfs, issues a warning and then hangs (interruptibly):
WARNING: CPU: 5 PID: 3517 at mm/highmem.c:622 kunmap_local_indexed+0x62/0xc9
CPU: 5 UID: 0 PID: 3517 Comm: cp Not tainted 6.12.0-rc4 #2
...
copy_page_from_iter_atomic+0xa6/0x5ec
generic_perform_write+0xf6/0x1b4
shmem_file_write_iter+0x54/0x67
Fix copy_page_from_iter_atomic() by limiting it in that case
(include/linux/skbuff.h skb_frag_must_loop() does similar).
But going forward, perhaps CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is too
surprising, has outlived its usefulness, and should just be removed?
Fixes: 908a1ad89466 ("iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Link: https://lore.kernel.org/r/dd5f0c89-186e-18e1-4f43-19a60f5a9774@google.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2b059d0d1e624adc6e69a754bc48057f8bf459dc ]
'modprobe slub_kunit' will have a warning as shown below. The root cause
is that __kmalloc_cache_noprof was directly used, which resulted in no
alloc_tag being allocated. This caused current->alloc_tag to be null,
leading to a warning in alloc_tag_add_check.
Let's add an alloc_hook layer to __kmalloc_cache_noprof specifically
within lib/slub_kunit.c, which is the only user of this internal slub
function outside kmalloc implementation itself.
[58162.947016] WARNING: CPU: 2 PID: 6210 at
./include/linux/alloc_tag.h:125 alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.957721] Call trace:
[58162.957919] alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.958286] __kmalloc_cache_noprof+0x14c/0x344
[58162.958615] test_kmalloc_redzone_access+0x50/0x10c [slub_kunit]
[58162.959045] kunit_try_run_case+0x74/0x184 [kunit]
[58162.959401] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[58162.959841] kthread+0x10c/0x118
[58162.960093] ret_from_fork+0x10/0x20
[58162.960363] ---[ end trace 0000000000000000 ]---
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: a0a44d9175b3 ("mm, slab: don't wrap internal functions with alloc_hooks()")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2865baf54077aa98fcdb478cefe6a42c417b9374 upstream.
The Spectre-v1 mitigations made "access_ok()" much more expensive, since
it has to serialize execution with the test for a valid user address.
All the normal user copy routines avoid this by just masking the user
address with a data-dependent mask instead, but the fast
"unsafe_user_read()" kind of patterms that were supposed to be a fast
case got slowed down.
This introduces a notion of using
src = masked_user_access_begin(src);
to do the user address sanity using a data-dependent mask instead of the
more traditional conditional
if (user_read_access_begin(src, len)) {
model.
This model only works for dense accesses that start at 'src' and on
architectures that have a guard region that is guaranteed to fault in
between the user space and the kernel space area.
With this, the user access doesn't need to be manually checked, because
a bad address is guaranteed to fault (by some architecture masking
trick: on x86-64 this involves just turning an invalid user address into
all ones, since we don't map the top of address space).
This only converts a couple of examples for now. Example x86-64 code
generation for loading two words from user space:
stac
mov %rax,%rcx
sar $0x3f,%rcx
or %rax,%rcx
mov (%rcx),%r13
mov 0x8(%rcx),%r14
clac
where all the error handling and -EFAULT is now purely handled out of
line by the exception path.
Of course, if the micro-architecture does badly at 'clac' and 'stac',
the above is still pitifully slow. But at least we did as well as we
could.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit aff1871bfc81e9dffa7d2a77e67cc5441cc37f81 ]
objpool intends to use vmalloc for default (non-atomic) allocations of
percpu slots and objects. However, the condition checking if GFP flags
set any bit of GFP_ATOMIC is wrong b/c GFP_ATOMIC is a combination of bits
(__GFP_HIGH|__GFP_KSWAPD_RECLAIM) and so `pool->gfp & GFP_ATOMIC` will
be true if either bit is set. Since GFP_ATOMIC and GFP_KERNEL share the
___GFP_KSWAPD_RECLAIM bit, kmalloc will be used in cases when GFP_KERNEL
is specified, i.e. in all current usages of objpool.
This may lead to unexpected OOM errors since kmalloc cannot allocate
large amounts of memory.
For instance, objpool is used by fprobe rethook which in turn is used by
BPF kretprobe.multi and kprobe.session probe types. Trying to attach
these to all kernel functions with libbpf using
SEC("kprobe.session/*")
int kprobe(struct pt_regs *ctx)
{
[...]
}
fails on objpool slot allocation with ENOMEM.
Fix the condition to truly use vmalloc by default.
Link: https://lore.kernel.org/all/20240826060718.267261-1-vmalik@redhat.com/
Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ab8851431bef5cc44f0f3f0da112e883fd4d0df5 ]
Just a grammar fix in lib/Kconfig.debug, under the config option
RUST_BUILD_ASSERT_ALLOW.
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://github.com/Rust-for-Linux/linux/issues/1006
Fixes: ecaa6ddff2fd ("rust: add `build_error` crate")
Signed-off-by: Timo Grautstueck <timo.grautstueck@web.de>
Link: https://lore.kernel.org/r/20241006140244.5509-1-timo.grautstueck@web.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit bea07fd63192b61209d48cbb81ef474cc3ee4c62 upstream.
Patch series "maple_tree: correct tree corruption on spanning store", v3.
There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.
This bug seems far more likely to happen since commit f8d112a4e657
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.
We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.
The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.
This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.
The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.
The fix patch commit message goes into great detail as to how this occurs.
This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.
Bert has kindly tested the fix and confirmed it resolved his issues. Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.
This patch (of 2):
There has been a subtle bug present in the maple tree implementation from
its inception.
This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.
A range may always ultimately span two leaf nodes. In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.
This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().
In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.
When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.
When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.
We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.
We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().
After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e.
r_mas.offset <= r_mas.end).
Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.
Consider the following tree:
Height
0 Root Node
/ \
pivot = 0xffff / \ pivot = ULONG_MAX
/ \
1 A [-----] ...
/ \
pivot = 0x4fff / \ pivot = 0xffff
/ \
2 (LEAVES) B [-----] [-----] C
^--- Last pivot 0xffff.
Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):
1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
determines that this is a spanning store across nodes B and C. The mas
state is set such that the current node from which we traverse further
is node A.
2. In mas_wr_spanning_store() we try to find elements to the right of pivot
0xffff by searching for an index of 0x10000:
- mas_wr_walk_index() invokes mas_wr_walk_descend() and
mas_wr_node_walk() in turn.
- mas_wr_node_walk() loops over entries in node A until EITHER it
finds an entry whose pivot equals or exceeds 0x10000 OR it
reaches the final entry.
- Since no entry has a pivot equal to or exceeding 0x10000, pivot
0xffff is selected, leading to node C.
- mas_wr_walk_traverse() resets the mas state to traverse node C. We
loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
in turn once again.
- Again, we reach the last entry in node C, which has a pivot of
0xffff.
3. We then copy the elements to the left of 0x4000 in node B to the big
node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
too.
4. We determine whether we have any entries to copy from the right of the
end of the range via - and with r_mas set up at the entry at pivot
0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
pivot 0xffff.
5. BUG! The maple tree is corrupted with a duplicate entry.
This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.
spanning store across two leaf nodes with a range that ends at that shared
pivot.
A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.
Instead, this patch detects if the right hand node is populated, i.e. has
anything we need to copy.
We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.
The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().
The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.
The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.
After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.
Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 905415ff3ffb1d7e5afa62bacabd79776bd24606 upstream.
Harden build ID parsing logic, adding explicit READ_ONCE() where it's
important to have a consistent value read and validated just once.
Also, as pointed out by Andi Kleen, we need to make sure that entire ELF
note is within a page bounds, so move the overflow check up and add an
extra note_size boundaries validation.
Fixes tag below points to the code that moved this code into
lib/buildid.c, and then subsequently was used in perf subsystem, making
this code exposed to perf_event_open() users in v5.12+.
Cc: stable@vger.kernel.org
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 684d28feb8546d1e9597aa363c3bfcf52fe250b7 upstream.
fill_pool() uses 'obj_pool_min_free' to decide whether objects should be
handed back to the kmem cache. But 'obj_pool_min_free' records the lowest
historical value of the number of objects in the object pool and not the
minimum number of objects which should be kept in the pool.
Use 'debug_objects_pool_min_level' instead, which holds the minimum number
which was scaled to the number of CPUs at boot time.
[ tglx: Massage change log ]
Fixes: d26bf5056fc0 ("debugobjects: Reduce number of pool_lock acquisitions in fill_pool()")
Fixes: 36c4ead6f6df ("debugobjects: Add global free list and the counter")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240904133944.2124-3-thunder.leizhen@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 65f666c6203600053478ce8e34a1db269a8701c9 ]
When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run
with preempt disabled. In RT kernel, spin_lock() can sleep, then warning
of "BUG: sleeping function called from invalid context" can be triggered.
Fix it by replacing it with raw_spin_lock.
Cc: Yang Yang <yang.yang@vivo.com>
Fixes: 72d04bdcf3f7 ("sbitmap: fix io hung due to race on sbitmap_word::cleared")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yang Yang <yang.yang@vivo.com>
Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2ee96abef214550d9e92f5143ee3ac1fd1323e67 ]
In 2018, a dependency on <linux/crc32poly.h> was added to avoid
duplicating the same constant in multiple files. Two months later it was
found to be a bad idea and the definition of CRC32_POLY_LE macro was moved
into xz_private.h to avoid including <linux/crc32poly.h>.
xz_private.h is a wrong place for it too. Revert back to the upstream
version which has the poly in xz_crc32_init() in xz_crc32.c.
Link: https://lkml.kernel.org/r/20240721133633.47721-10-lasse.collin@tukaani.org
Fixes: faa16bc404d7 ("lib: Use existing define with polynomial")
Fixes: 242cdad873a7 ("lib/xz: Put CRC32_POLY_LE in xz_private.h")
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Reviewed-by: Sam James <sam@gentoo.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Jules Maselbas <jmaselbas@zdiv.net>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rui Li <me@lirui.org>
Cc: Simon Glass <sjg@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fix fromShuah Khan:
"One single fix to a use-after-free bug resulting from
kunit_driver_create() failing to copy the driver name leaving it on
the stack or freeing it"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Device wrappers should also manage driver name
|
|
codetag_module_init() is used to initialize sections containing allocation
tags. This function is used to initialize module sections as well as core
kernel sections, in which case the module parameter is set to NULL. This
function has to be called even when CONFIG_MODULES=n to initialize core
kernel allocation tag sections. When CONFIG_MODULES=n, this function is a
NOP, which is wrong. This leads to /proc/allocinfo reported as empty.
Fix this by making it independent of CONFIG_MODULES.
Link: https://lkml.kernel.org/r/20240828231536.1770519-1-surenb@google.com
Fixes: 916cc5167cc6 ("lib: code tagging framework")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The write lock should be held when validating the tree to avoid updates
racing with checks. Holding the rcu read lock during a large tree
validation may also cause a prolonged rcu read window and "rcu_preempt
detected stalls" warnings.
Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/
Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reported-by: syzbot+036af2f0c7338a33b0cd@syzkaller.appspotmail.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fix from Jason Donenfeld:
"Reject invalid flags passed to vgetrandom() in the same way that
getrandom() does, so that the behavior is the same, from Yann.
The flags argument to getrandom() only has a behavioral effect on the
function if the RNG isn't initialized yet, so vgetrandom() falls back
to the syscall in that case. But if the RNG is initialized, all of the
flags behave the same way, so vgetrandom() didn't bother checking
them, and just ignored them entirely.
But that doesn't account for invalid flags passed in, which need to be
rejected so we can use them later"
* tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: vDSO: reject unknown getrandom() flags
|
|
kunit_driver_create() accepts a name for the driver, but does not copy
it, so if that name is either on the stack, or otherwise freed, we end
up with a use-after-free when the driver is cleaned up.
Instead, strdup() the name, and manage it as another KUnit allocation.
As there was no existing kunit_kstrdup(), we add one. Further, add a
kunit_ variant of strdup_const() and kfree_const(), so we don't need to
allocate and manage the string in the majority of cases where it's a
constant.
However, these are inline functions, and is_kernel_rodata() only works
for built-in code. This causes problems in two cases:
- If kunit is built as a module, __{start,end}_rodata is not defined.
- If a kunit test using these functions is built as a module, it will
suffer the same fate.
This fixes a KASAN splat with overflow.overflow_allocation_test, when
built as a module.
Restrict the is_kernel_rodata() case to when KUnit is built as a module,
which fixes the first case, at the cost of losing the optimisation.
Also, make kunit_{kstrdup,kfree}_const non-inline, so that other modules
using them will not accidentally depend on is_kernel_rodata(). If KUnit
is built-in, they'll benefit from the optimisation, if KUnit is not,
they won't, but the string will be properly duplicated.
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Like the getrandom() syscall, vDSO getrandom() must also reject unknown
flags. [1]
It would be possible to return -EINVAL from vDSO itself, but in the
possible case that a new flag is added to getrandom() syscall in the
future, it would be easier to get the behavior from the syscall, instead
of erroring until the vDSO is extended to support the new flag or
explicitly falling back.
[1] Designing the API: Planning for Extension
https://docs.kernel.org/process/adding-syscalls.html#designing-the-api-planning-for-extension
Signed-off-by: Yann Droneaud <yann@droneaud.fr>
[Jason: reworded commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Pull bcachefs fixes from Kent OverstreetL
- New on disk format version, bcachefs_metadata_version_disk_accounting_inum
This adds one more disk accounting counter, which counts disk usage
and number of extents per inode number. This lets us track
fragmentation, for implementing defragmentation later, and it also
counts disk usage per inode in all snapshots, which will be a useful
thing to expose to users.
- One performance issue we've observed is threads spinning when they
should be waiting for dirty keys in the key cache to be flushed by
journal reclaim, so we now have hysteresis for the waiting thread, as
well as improving the tracepoint and a new time_stat, for tracking
time blocked waiting on key cache flushing.
... and various assorted smaller fixes.
* tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix locking in __bch2_trans_mark_dev_sb()
bcachefs: fix incorrect i_state usage
bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru
bcachefs: Fix forgetting to pass trans to fsck_err()
bcachefs: Increase size of cuckoo hash table on too many rehashes
bcachefs: bcachefs_metadata_version_disk_accounting_inum
bcachefs: Kill __bch2_accounting_mem_mod()
bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in
bcachefs: Add a time_stat for blocked on key cache flush
bcachefs: Improve trans_blocked_journal_reclaim tracepoint
bcachefs: Add hysteresis to waiting on btree key cache flush
lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()
bcachefs: Convert for_each_btree_node() to lockrestart_do()
bcachefs: Add missing downgrade table entry
bcachefs: disk accounting: ignore unknown types
bcachefs: bch2_accounting_invalid() fixup
bcachefs: Fix bch2_trigger_alloc when upgrading from old versions
bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
|
|
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used as
a driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN
enabled.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
If we need to increase the tree depth, allocate a new node, and then
race with another thread that increased the tree depth before us, we'll
still have a preallocated node that might be used later.
If we then use that node for a new non-root node, it'll still have a
pointer to the old root instead of being zeroed - fix this by zeroing it
in the cmpxchg failure path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We only had a couple of array[] declarations, and changing them to just
use 'MAX()' instead of 'max()' fixes the issue.
This will allow us to simplify our min/max macros enormously, since they
can now unconditionally use temporary variables to avoid using the
argument values multiple times.
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This just standardizes the use of MIN() and MAX() macros, with the very
traditional semantics. The goal is to use these for C constant
expressions and for top-level / static initializers, and so be able to
simplify the min()/max() macros.
These macro names were used by various kernel code - they are very
traditional, after all - and all such users have been fixed up, with a
few different approaches:
- trivial duplicated macro definitions have been removed
Note that 'trivial' here means that it's obviously kernel code that
already included all the major kernel headers, and thus gets the new
generic MIN/MAX macros automatically.
- non-trivial duplicated macro definitions are guarded with #ifndef
This is the "yes, they define their own versions, but no, the include
situation is not entirely obvious, and maybe they don't get the
generic version automatically" case.
- strange use case #1
A couple of drivers decided that the way they want to describe their
versioning is with
#define MAJ 1
#define MIN 2
#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)
which adds zero value and I just did my Alexander the Great
impersonation, and rewrote that pointless Gordian knot as
#define DRV_VERSION "1.2"
instead.
- strange use case #2
A couple of drivers thought that it's a good idea to have a random
'MIN' or 'MAX' define for a value or index into a table, rather than
the traditional macro that takes arguments.
These values were re-written as C enum's instead. The new
function-line macros only expand when followed by an open
parenthesis, and thus don't clash with enum use.
Happily, there weren't really all that many of these cases, and a lot of
users already had the pattern of using '#ifndef' guarding (or in one
case just using '#undef MIN') before defining their own private version
that does the same thing. I left such cases alone.
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull Rust updates from Miguel Ojeda:
"The highlight is the establishment of a minimum version for the Rust
toolchain, including 'rustc' (and bundled tools) and 'bindgen'.
The initial minimum will be the pinned version we currently have, i.e.
we are just widening the allowed versions. That covers three stable
Rust releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow),
plus beta, plus nightly.
This should already be enough for kernel developers in distributions
that provide recent Rust compiler versions routinely, such as Arch
Linux, Debian Unstable (outside the freeze period), Fedora Linux,
Gentoo Linux (especially the testing channel), Nix (unstable) and
openSUSE Slowroll and Tumbleweed.
In addition, the kernel is now being built-tested by Rust's pre-merge
CI. That is, every change that is attempting to land into the Rust
compiler is tested against the kernel, and it is merged only if it
passes. Similarly, the bindgen tool has agreed to build the kernel in
their CI too.
Thus, with the pre-merge CI in place, both projects hope to avoid
unintentional changes to Rust that break the kernel. This means that,
in general, apart from intentional changes on their side (that we will
need to workaround conditionally on our side), the upcoming Rust
compiler versions should generally work.
In addition, the Rust project has proposed getting the kernel into
stable Rust (at least solving the main blockers) as one of its three
flagship goals for 2024H2 [1].
I would like to thank Niko, Sid, Emilio et al. for their help
promoting the collaboration between Rust and the kernel.
Toolchain and infrastructure:
- Support several Rust toolchain versions.
- Support several bindgen versions.
- Remove 'cargo' requirement and simplify 'rusttest', thanks to
'alloc' having been dropped last cycle.
- Provide proper error reporting for the 'rust-analyzer' target.
'kernel' crate:
- Add 'uaccess' module with a safe userspace pointers abstraction.
- Add 'page' module with a 'struct page' abstraction.
- Support more complex generics in workqueue's 'impl_has_work!'
macro.
'macros' crate:
- Add 'firmware' field support to the 'module!' macro.
- Improve 'module!' macro documentation.
Documentation:
- Provide instructions on what packages should be installed to build
the kernel in some popular Linux distributions.
- Introduce the new kernel.org LLVM+Rust toolchains.
- Explain '#[no_std]'.
And a few other small bits"
Link: https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals [1]
* tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux: (26 commits)
docs: rust: quick-start: add section on Linux distributions
rust: warn about `bindgen` versions 0.66.0 and 0.66.1
rust: start supporting several `bindgen` versions
rust: work around `bindgen` 0.69.0 issue
rust: avoid assuming a particular `bindgen` build
rust: start supporting several compiler versions
rust: simplify Clippy warning flags set
rust: relax most deny-level lints to warnings
rust: allow `dead_code` for never constructed bindings
rust: init: simplify from `map_err` to `inspect_err`
rust: macros: indent list item in `paste!`'s docs
rust: add abstraction for `struct page`
rust: uaccess: add typed accessors for userspace pointers
uaccess: always export _copy_[from|to]_user with CONFIG_RUST
rust: uaccess: add userspace pointers
kbuild: rust-analyzer: improve comment documentation
kbuild: rust-analyzer: better error handling
docs: rust: no_std is used
rust: alloc: add __GFP_HIGHMEM flag
rust: alloc: fix typo in docs for GFP_NOWAIT
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"11 hotfixes, 7 of which are cc:stable. 7 are MM, 4 are other"
* tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: handle inconsistent state in nilfs_btnode_create_block()
selftests/mm: skip test for non-LPA2 and non-LVA systems
mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node
alloc_tag: outline and export free_reserved_page()
decompress_bunzip2: fix rare decompression failure
mm/huge_memory: avoid PMD-size page cache if needed
mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines
mm: fix old/young bit handling in the faulting path
dt-bindings: arm: update James Clark's email address
MAINTAINERS: mailmap: update James Clark's email address
|
|
The decompression code parses a huffman tree and counts the number of
symbols for a given bit length. In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.
Since the maximum number of symbols is 258, use unsigned short instead.
Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com
Fixes: bc22c17e12c1 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull bitmap updates from Yury Norov:
"Random fixes"
* tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux:
riscv: Remove unnecessary int cast in variable_fls()
radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
bitops: Add a comment explaining the double underscore macros
lib: bitmap: add missing MODULE_DESCRIPTION() macros
cpumask: introduce assign_cpu() macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- trivial printk changes
The bigger "real" printk work is still being discussed.
* tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
vsprintf: add missing MODULE_DESCRIPTION() macro
printk: Rename console_replay_all() and update context
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"This adds getrandom() support to the vDSO.
First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which
lets the kernel zero out pages anytime under memory pressure, which
enables allocating memory that never gets swapped to disk but also
doesn't count as being mlocked.
Then, the vDSO implementation of getrandom() is introduced in a
generic manner and hooked into random.c.
Next, this is implemented on x86. (Also, though it's not ready for
this pull, somebody has begun an arm64 implementation already)
Finally, two vDSO selftests are added.
There are also two housekeeping cleanup commits"
* tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
MAINTAINERS: add random.h headers to RNG subsection
random: note that RNDGETPOOL was removed in 2.6.9-rc2
selftests/vDSO: add tests for vgetrandom
x86: vdso: Wire up getrandom() vDSO implementation
random: introduce generic vDSO getrandom() implementation
mm: add MAP_DROPPABLE for designating always lazily freeable mappings
|
|
Pull more block updates from Jens Axboe:
- MD fixes via Song:
- md-cluster fixes (Heming Zhao)
- raid1 fix (Mateusz Jończyk)
- s390/dasd module description (Jeff)
- Series cleaning up and hardening the blk-mq debugfs flag handling
(John, Christoph)
- blk-cgroup cleanup (Xiu)
- Error polled IO attempts if backend doesn't support it (hexue)
- Fix for an sbitmap hang (Yang)
* tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits)
blk-cgroup: move congestion_count to struct blkcg
sbitmap: fix io hung due to race on sbitmap_word::cleared
block: avoid polling configuration errors
block: Catch possible entries missing from rqf_name[]
block: Simplify definition of RQF_NAME()
block: Use enum to define RQF_x bit indexes
block: Catch possible entries missing from cmd_flag_name[]
block: Catch possible entries missing from alloc_policy_name[]
block: Catch possible entries missing from hctx_flag_name[]
block: Catch possible entries missing from hctx_state_name[]
block: Catch possible entries missing from blk_queue_flag_name[]
block: Make QUEUE_FLAG_x as an enum
block: Relocate BLK_MQ_MAX_DEPTH
block: Relocate BLK_MQ_CPU_WORK_BATCH
block: remove QUEUE_FLAG_STOPPED
block: Add missing entry to hctx_flag_name[]
block: Add zone write plugging entry to rqf_name[]
block: Add missing entries from cmd_flag_name[]
s390/dasd: fix error checks in dasd_copy_pair_store()
s390/dasd: add missing MODULE_DESCRIPTION() macros
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- In the series "treewide: Refactor heap related implementation",
Kuan-Wei Chiu has significantly reworked the min_heap library code
and has taught bcachefs to use the new more generic implementation.
- Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
reworks the cpumask and nodemask headers to make things generally
more rational.
- Kuan-Wei Chiu has sent along some maintenance work against our
sorting library code in the series "lib/sort: Optimizations and
cleanups".
- More library maintainance work from Christophe Jaillet in the series
"Remove usage of the deprecated ida_simple_xx() API".
- Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
series "nilfs2: eliminate the call to inode_attach_wb()".
- Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
GDB command error".
- Plus the usual shower of singleton patches all over the place. Please
see the relevant changelogs for details.
* tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
ia64: scrub ia64 from poison.h
watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
tsacct: replace strncpy() with strscpy()
lib/bch.c: use swap() to improve code
test_bpf: convert comma to semicolon
init/modpost: conditionally check section mismatch to __meminit*
init: remove unused __MEMINIT* macros
nilfs2: Constify struct kobj_type
n |