summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-04-14smb: client: sambaXP 2025 modssambaXP-2025Enzo Matsumiya10-117/+616
iakerb + compression patches Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2025-04-07Revert "smb: client: fix TCP timers deadlock after rmmod"Kuniyuki Iwashima1-26/+10
This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801. Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is mentioned as CVE-2024-54680, but is actually did not fix anything; The issue can be reproduced on top of it. [0] Also, it reverted the change by commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") and introduced a real issue by reviving the kernel TCP socket. When a reconnect happens for a CIFS connection, the socket state transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync() in tcp_close() stops all timers for the socket. If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1 forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans. Usually, FIN can be retransmitted by the peer, but if the peer aborts the connection, the issue comes into reality. I warned about this privately by pointing out the exact report [1], but the bogus fix was finally merged. So, we should not stop the timers to finally kill the connection on our side in that case, meaning we must not use a kernel socket for TCP whose sk->sk_net_refcnt is 0. The kernel socket does not have a reference to its netns to make it possible to tear down netns without cleaning up every resource in it. For example, tunnel devices use a UDP socket internally, but we can destroy netns without removing such devices and let it complete during exit. Otherwise, netns would be leaked when the last application died. However, this is problematic for TCP sockets because TCP has timers to close the connection gracefully even after the socket is close()d. The lifetime of the socket and its netns is different from the lifetime of the underlying connection. If the socket user does not maintain the netns lifetime, the timer could be fired after the socket is close()d and its netns is freed up, resulting in use-after-free. Actually, we have seen so many similar issues and converted such sockets to have a reference to netns. That's why I converted the CIFS client socket to have a reference to netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right fix**. Regarding the LOCKDEP issue, we can prevent the module unload by bumping the module refcount when switching the LOCKDDEP key in sock_lock_init_class_and_name(). [2] For a while, let's revert the bogus fix. Note that now we can use sk_net_refcnt_upgrade() for the socket conversion, but I'll do so later separately to make backport easy. Link: https://lore.kernel.org/all/20250402020807.28583-1-kuniyu@amazon.com/ #[0] Link: https://lore.kernel.org/netdev/c08bd5378da647a2a4c16698125d180a@huawei.com/ #[1] Link: https://lore.kernel.org/lkml/20250402005841.19846-1-kuniyu@amazon.com/ #[2] Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-07Revert "smb: client: Fix netns refcount imbalance causing leaks and ↵Kuniyuki Iwashima1-8/+8
use-after-free" This reverts commit 4e7f1644f2ac6d01dc584f6301c3b1d5aac4eaef. The commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") is not only a bogus fix for LOCKDEP null-ptr-deref but also introduces a real issue, TCP sockets leak, which will be explained in detail in the next revert. Also, CNA assigned CVE-2024-54680 to it but is rejecting it. [0] Thus, we are reverting the commit and its follow-up commit 4e7f1644f2ac ("smb: client: Fix netns refcount imbalance causing leaks and use-after-free"). Link: https://lore.kernel.org/all/2025040248-tummy-smilingly-4240@gregkh/ #[0] Fixes: 4e7f1644f2ac ("smb: client: Fix netns refcount imbalance causing leaks and use-after-free") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-07smb: client: optimize pathname checkingPaulo Alcantara18-651/+384
The default mapping for CIFS mounts (SFM) allows mapping backslash, so fix it by mapping them to 0xf026 in UCS-2 and then allow files and directories to be created with backslash in their names. This will optimize cifs_check_name() as it won't need to check for backslashes anymore when using default SFM mapping. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504021549.evbuYKS3-lkp@intel.com/ Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-07smb311 client: fix missing tcon check when mounting with linux/posix extensionsSteve French1-0/+2
When mounting the same share twice, once with the "linux" mount parameter (or equivalently "posix") and then once without (or e.g. with "nolinux"), we were incorrectly reusing the same tree connection for both mounts. This meant that the first mount of the share on the client, would cause subsequent mounts of that same share on the same client to ignore that mount parm ("linux" vs. "nolinux") and incorrectly reuse the same tcon. Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-07cifs: Ensure that all non-client-specific reparse points are processed by ↵Pali Rohár2-4/+10
the server Fix regression in mounts to e.g. onedrive shares. Generally, reparse points are processed by the SMB server during the SMB OPEN request, but there are few reparse points which do not have OPEN-like meaning for the SMB server and has to be processed by the SMB client. Those are symlinks and special files (fifo, socket, block, char). For Linux SMB client, it is required to process also name surrogate reparse points as they represent another entity on the SMB server system. Linux client will mark them as separate mount points. Examples of name surrogate reparse points are NTFS junction points (e.g. created by the "mklink" tool on Windows servers). So after processing the name surrogate reparse points, clear the -EOPNOTSUPP error code returned from the parse_reparse_point() to let SMB server to process reparse points. And remove printing misleading error message "unhandled reparse tag:" as reparse points are handled by SMB server and hence unhandled fact is normal operation. Fixes: cad3fc0a4c8c ("cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point()") Fixes: b587fd128660 ("cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes") Cc: stable@vger.kernel.org Reported-by: Junwen Sun <sunjw8888@gmail.com> Tested-by: Junwen Sun <sunjw8888@gmail.com> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-06Merge tag 'sched-urgent-2025-04-06' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a nonsensical Kconfig combination - Remove an unnecessary rseq-notification * tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Eliminate useless task_work on execve sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner10-11/+11
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-04Merge tag '6.15-rc-part2-smb3-client-fixes' of ↵Linus Torvalds19-86/+438
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - reconnect fixes: three for updating rsize/wsize and an SMB1 reconnect fix - RFC1001 fixes: fixing connections to nonstandard ports, and negprot retries - fix mfsymlinks to old servers - make mapping of open flags for SMB1 more accurate - permission fixes: adding retry on open for write, and one for stat to workaround unexpected access denied - add two new xattrs, one for retrieving SACL and one for retrieving owner (without having to retrieve the whole ACL) - fix mount parm validation for echo_interval - minor cleanup (including removing now unneeded cifs_truncate_page) * tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal version number cifs: Implement is_network_name_deleted for SMB1 cifs: Remove cifs_truncate_page() as it should be superfluous cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES cifs: Fix querying and creating MF symlinks over SMB1 cifs: Fix access_flags_to_smbopen_mode cifs: Fix negotiate retry functionality cifs: Improve handling of NetBIOS packets cifs: Allow to disable or force initialization of NetBIOS session cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs smb: client: Update IO sizes after reconnection smb: client: Store original IO parameters and prevent zero IO sizes smb:client: smb: client: Add reverse mapping from tcon to superblocks cifs: remove unreachable code in cifs_get_tcp_session() cifs: fix integer overflow in match_server()
2025-04-03Merge tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds5-20/+55
Pull smb server fixes from Steve French: "Four ksmbd SMB3 server fixes, all also for stable" * tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix null pointer dereference in alloc_preauth_hash() ksmbd: validate zero num_subauth before sub_auth is accessed ksmbd: fix overflow in dacloffset bounds check ksmbd: fix session use-after-free in multichannel connection
2025-04-03fs: actually hold the namespace semaphoreChristian Brauner1-1/+2
Don't use a scoped guard that only protects the next statement. Use a regular guard to make sure that the namespace semaphore is held across the whole function. Signed-off-by: Christian Brauner <brauner@kernel.org> Reported-by: Leon Romanovsky <leon@kernel.org> Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/ Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-03Merge tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefsLinus Torvalds58-549/+710
Pull more bcachefs updates from Kent Overstreet: "More notable fixes: - Fix for striping behaviour on tiering filesystems where replicas exceeds durability on destination target - Fix a race in device removal where deleting alloc info races with the discard worker - Some small stack usage improvements: this is just enough for KMSAN builds to not blow the stack, more is queued up for 6.16" * tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefs: bcachefs: Fix "journal stuck" during recovery bcachefs: backpointer_get_key: check for null from peek_slot() bcachefs: Fix null ptr deref in invalidate_one_bucket() bcachefs: Fix check_snapshot_exists() restart handling bcachefs: use nonblocking variant of print_string_as_lines in error path bcachefs: Fix scheduling while atomic from logging changes bcachefs: Add error handling for zlib_deflateInit2() bcachefs: add missing selection of XARRAY_MULTI bcachefs: bch_dev_usage_full bcachefs: Kill btree_iter.trans bcachefs: do_trace_key_cache_fill() bcachefs: Split up bch_dev.io_ref bcachefs: fix ref leak in btree_node_read_all_replicas bcachefs: Fix null ptr deref in bch2_write_endio() bcachefs: Fix field spanning write warning bcachefs: Fix striping behaviour
2025-04-03Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linuxLinus Torvalds1-1/+1
Pull 9p updates from Dominique Martinet: - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup * tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux: docs: fs/9p: Add missing "not" in cache documentation 9p: Use hashtable.h for hash_errmap Documentation/fs/9p: fix broken link 9p/trans_fd: mark concurrent read and writes to p9_conn->err 9p/net: return error on bogus (longer than requested) replies 9p/net: fix improper handling of bogus negative read/write replies fs/9p: fix NULL pointer dereference on mkdir net/9p/fd: support ipv6 for trans=tcp
2025-04-03Merge tag 'mm-hotfixes-stable-2025-04-02-21-57' of ↵Linus Torvalds1-26/+25
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM hotfixes from Andrew Morton: "Five hotfixes. Three are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. All patches are for MM" * tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead() mm/hugetlb: move hugetlb_sysctl_init() to the __init section mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock mm/hugetlb_vmemmap: fix memory loads ordering mm/userfaultfd: fix release hang over concurrent GUP
2025-04-03bcachefs: Fix "journal stuck" during recoveryKent Overstreet1-0/+8
If we crash when the journal pin fifo is completely full - i.e. we're at the maximum number of dirty journal entries - that may put us in a sticky situation in recovery, as journal replay will need to be able to open new journal entries in order to get going. bch2_fs_journal_start() already had provisions for resizing the journal pin fifo if needed, but it needs a fudge factor to ensure there's room for journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03bcachefs: backpointer_get_key: check for null from peek_slot()Kent Overstreet1-0/+12
peek_slot() doesn't normally return bkey_s_c_null - except when we ask for a key at a btree level that doesn't exist, which can happen here. We might want to revisit this, but we'll have to look over all the places where we use peek_slot() on interior nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03bcachefs: Fix null ptr deref in invalidate_one_bucket()Kent Overstreet1-0/+3
bch2_backpointer_get_key() returns bkey_s_c_null when the target isn't found. backpointer_get_key() flags the error, so there's nothing else to do here - just skip it and move on. Link: https://github.com/koverstreet/bcachefs/issues/847 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03bcachefs: Fix check_snapshot_exists() restart handlingKent Overstreet1-3/+0
Codepaths that create entries in the snapshots btree currently call bch2_mark_snapshot(), which updates the in-memory snapshot table, before transaction commit. This is because bch2_mark_snapshot() is an atomic trigger, run with btree write locks held, and isn't allowed to fail - but it might need to reallocate the table, hence we call it early when we're still allowed to fail. This is generally harmless - if we fail, we'll have left an entry in the snapshots table around, but nothing will reference it and it'll get overwritten if reused by another transaction. But check_snapshot_exists(), which reconstructs snapshots when the snapshots btree has been corrupted or lost, was erronously rechecking if the snapshot exists inside the transaction commit loop - so on transaction restart (in this case mem_realloced), the second iteration would return without repairing. This code needs some cleanup: splitting out a "maybe realloc snapshots table" helper would have avoided this, that will be in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03bcachefs: use nonblocking variant of print_string_as_lines in error pathBharadwaj Raju1-2/+2
The inconsistency error path calls print_string_as_lines, which calls console_lock, which is a potentially-sleeping function and so can't be called in an atomic context. Replace calls to it with the nonblocking variant which is safe to call. Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03bcachefs: Fix scheduling while atomic from logging changesKent Overstreet2-0/+4
Two fixes from the recent logging changes: bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt context, or with rcu_read_lock() held. The one syzbot found is in bch2_bkey_pick_read_device bch2_dev_rcu bch2_fs_inconsistent We're starting to switch to lift the printbufs up to higher levels so we can emit better log messages and print them all in one go (avoid garbling), so that conversion will help with spotting these in the future; when we declare a printbuf it must be flagged if we're in an atomic context. Secondly, in btree_node_write_endio: 00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321 00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6 00085 preempt_count: 10001, expected: 0 00085 RCU nest depth: 0, expected: 0 00085 4 locks held by bch-reclaim/fa6/618: 00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198 00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440 00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440 00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130 00085 irq event stamp: 328 00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0 00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60 00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118 00085 softirqs last disabled at (0): [<0000000000000000>] 0x0 00085 Preemption disabled at: 00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90 00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798 00085 Hardware name: linux,dummy-virt (DT) 00085 Call trace: 00085 show_stack+0x1c/0x30 (C) 00085 dump_stack_lvl+0x84/0xc0 00085 dump_stack+0x14/0x20 00085 __might_resched+0x180/0x288 00085 __might_sleep+0x4c/0x88 00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0 00085 krealloc_noprof+0x1a0/0x2d8 00085 bch2_printbuf_make_room+0x9c/0x120 00085 bch2_prt_printf+0x60/0x1b8 00085 btree_node_write_endio+0x1b0/0x2d8 00085 bio_endio+0x138/0x1f0 00085 btree_node_write_endio+0xe8/0x2d8 00085 bio_endio+0x138/0x1f0 00085 blk_update_request+0x220/0x4c0 00085 blk_mq_end_request+0x28/0x148 00085 virtblk_request_done+0x64/0xe8 00085 blk_mq_complete_request+0x34/0x40 00085 virtblk_done+0x78/0x130 00085 vring_interrupt+0x6c/0xb0 00085 __handle_irq_event_percpu+0x8c/0x2e0 00085 handle_irq_event+0x50/0xb0 00085 handle_fasteoi_irq+0xc4/0x250 00085 handle_irq_desc+0x44/0x60 00085 generic_handle_domain_irq+0x20/0x30 00085 gic_handle_irq+0x54/0xc8 00085 call_on_irq_stack+0x24/0x40 Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03bcachefs: Add error handling for zlib_deflateInit2()Wentao Liang1-2/+3
In attempt_compress(), the return value of zlib_deflateInit2() needs to be checked. A proper implementation can be found in pstore_compress(). Add an error check and return 0 immediately if the initialzation fails. Fixes: 986e9842fb68 ("bcachefs: Compression levels") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03rseq: Eliminate useless task_work on execveMathieu Desnoyers1-1/+2
Eliminate a useless task_work on execve by moving the call to rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error path of bprm_execve(). The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is pointless in the success case, because rseq_execve() will clear the rseq pointer before returning to userspace. sched_mm_cid_after_execve() is called from both the success and error paths of bprm_execve(). The call to rseq_set_notify_resume() is needed on error because the mm_cid may have changed. Also move the rseq_execve() to right after sched_mm_cid_after_execve() in bprm_execve(). [ mingo: Merged to a recent upstream kernel, extended the changelog. ] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
2025-04-02cifs: update internal version numberSteve French1-2/+2
To 2.52 Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02cifs: Implement is_network_name_deleted for SMB1Pali Rohár1-0/+44
This change allows Linux SMB1 client to autoreconnect the share when it is modified on server by admin operation which removes and re-adds it. Implementation is reused from SMB2+ is_network_name_deleted callback. There are just adjusted checks for error codes and access to struct smb_hdr. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02cifs: Remove cifs_truncate_page() as it should be superfluousDavid Howells3-22/+0
The calls to cifs_truncate_page() should be superfluous as the places that call it also call truncate_setsize() or cifs_setsize() and therefore truncate_pagecache() which should also clear the tail part of the folio containing the EOF marker. Further, smb3_simple_falloc() calls both cifs_setsize() and truncate_setsize() in addition to cifs_truncate_page(). Remove the superfluous calls. This gets rid of another function referring to struct page. [Should cifs_setsize() also set inode->i_blocks?] Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02Merge tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds19-58/+537
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Three fixes for looping in the NFSv4 state manager delegation code - Fix for the NFSv4 state XDR code (Neil Brown) - Fix a leaked reference in nfs_lock_and_join_requests() - Fix a use-after-free in the delegation return code Features: - Implement the NFSv4.2 copy offload OFFLOAD_STATUS operation to allow monitoring of an in-progress copy - Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a getdents() call - SUNRPC now allows some basic management of an existing RPC client's connections using sysfs - Improvements to the automated teardown of a NFS client when the container it was initiated from gets killed - Improvements to prevent tasks from getting stuck in a killable wait state after calling exit_signals()" * tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits) nfs: Add missing release on error in nfs_lock_and_join_requests() NFSv4: Check for delegation validity in nfs_start_delegation_return_locked() NFS: Don't allow waiting for exiting tasks SUNRPC: Don't allow waiting for exiting tasks NFSv4: Treat ENETUNREACH errors as fatal for state recovery NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client NFSv4: Further cleanups to shutdown loops NFS: Shut down the nfs_client only after all the superblocks SUNRPC: rpc_clnt_set_transport() must not change the autobind setting SUNRPC: rpcbind should never reset the port to the value '0' pNFS/flexfiles: Report ENETDOWN as a connection error pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers NFS: Treat ENETUNREACH errors as fatal in containers NFS: Add a mount option to make ENETUNREACH errors fatal sunrpc: Add a sysfs file for one-step xprt deletion sunrpc: Add a sysfs file for adding a new xprt sunrpc: Add a sysfs files for rpc_clnt information sunrpc: Add a sysfs attr for xprtsec NFS: Add implid to sysfs NFS: Extend rdirplus mount option with "force|none" ...
2025-04-02Merge tag 'fuse-update-6.15' of ↵Linus Torvalds8-38/+322
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Allow connection to server to time out (Joanne Koong) - If server doesn't support creating a hard link, return EPERM rather than ENOSYS (Matt Johnston) - Allow file names longer than 1024 chars (Bernd Schubert) - Fix a possible race if request on io_uring queue is interrupted (Bernd Schubert) - Misc fixes and cleanups * tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: remove unneeded atomic set in uring creation fuse: fix uring race condition for null dereference of fc fuse: Increase FUSE_NAME_MAX to PATH_MAX fuse: Allocate only namelen buf memory in fuse_notify_ fuse: add default_request_timeout and max_request_timeout sysctls fuse: add kernel-enforced timeout option for requests fuse: optmize missing FUSE_LINK support fuse: Return EPERM rather than ENOSYS from link() fuse: removed unused function fuse_uring_create() from header fuse: {io-uring} Fix a possible req cancellation race
2025-04-02Merge tag 'ntfs3_for_6.15' of ↵Linus Torvalds9-181/+96
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: - Fix integer overflows on 32-bit systems and in hdr_first_de() - Fix 'proc_info_root' leak on NTFS initialization failure - Remove unused functions ni_load_attr, ntfs_sb_read, ntfs_flush_inodes - update inode->i_mapping->a_ops on compression state - ensure atomicity of write operations - refactor ntfs_{create/remove}_{procdir,proc_root}() * tag 'ntfs3_for_6.15' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: Remove unused ntfs_flush_inodes fs/ntfs3: Remove unused ntfs_sb_read fs/ntfs3: Remove unused ni_load_attr fs/ntfs3: Prevent integer overflow in hdr_first_de() fs/ntfs3: Fix a couple integer overflows on 32bit systems fs/ntfs3: Update inode->i_mapping->a_ops on compression state fs/ntfs3: Fix WARNING in ntfs_extend_initialized_size fs/ntfs3: Fix 'proc_info_root' leak when init ntfs failed fs/ntfs3: Factor out ntfs_{create/remove}_proc_root() fs/ntfs3: Factor out ntfs_{create/remove}_procdir() fs/ntfs3: Keep write operations atomic
2025-04-02Merge tag 'vfs-6.15-rc1.fixes' of ↵Linus Torvalds4-10/+17
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Add a new maintainer for configfs - Fix exportfs module description - Place flexible array memeber at the end of an internal struct in the mount code - Add new maintainer for netfslib as Jeff Layton is stepping down as current co-maintainer - Fix error handling in cachefiles_get_directory() - Cleanup do_notify_pidfd() - Fix syscall number definitions in pidfd selftests - Fix racy usage of fs_struct->in exec during multi-threaded exec - Ensure correct exit code is reported when pidfs_exit() is called from release_task() for a delayed thread-group leader exit - Fix conflicting iomap flag definitions * tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: Fix conflicting values of iomap flags fs: namespace: Avoid -Wflex-array-member-not-at-end warning MAINTAINERS: configfs: add Andreas Hindborg as maintainer exportfs: add module description exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit() netfs: add Paulo as maintainer and remove myself as Reviewer cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory exec: fix the racy usage of fs_struct->in_exec selftests/pidfd: fixes syscall number defines pidfs: cleanup the usage of do_notify_pidfd()
2025-04-02Merge tag 'uml-for-linux-6.15-rc1' of ↵Linus Torvalds3-27/+41
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: - proper nofault accesses and read-only rodata - hostfs fix for host inode number reuse - fixes for host errno handling - various cleanups/small fixes * tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Rewrite the sigio workaround based on epoll and tgkill um: Prohibit the VM_CLONE flag in run_helper_thread() um: Switch to the pthread-based helper in sigio workaround um: ubd: Switch to the pthread-based helper um: Add pthread-based helper support um: x86: clean up elf specific definitions um: Store full CSGSFS and SS register from mcontext um: virt-pci: Refactor virtio_pcidev into its own module um: work around sched_yield not yielding in time-travel mode um/locking: Remove semicolon from "lock" prefix um: Update min_low_pfn to match changes in uml_reserved um: use str_yes_no() to remove hardcoded "yes" and "no" um: hostfs: avoid issues on inode number reuse by host um: Allocate vdso page pointer statically um: remove copy_from_kernel_nofault_allowed um: mark rodata read-only and implement _nofault accesses um: Pass the correct Rust target and options with gcc
2025-04-02bcachefs: add missing selection of XARRAY_MULTIEric Biggers1-0/+1
When CONFIG_XARRAY_MULTI is not set, reading from a bcachefs file hits the 'BUG_ON(order > 0);' in xas_set_order(), because it tries to insert a large folio in the page cache. Fix this by making bcachefs select XARRAY_MULTI. Fixes: be212d86b19c ("bcachefs: bs > ps support") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: bch_dev_usage_fullKent Overstreet8-25/+51
All the fastpaths that need device usage don't need the sector totals or fragmentation, just bucket counts. Split bch_dev_usage up into two different versions, the normal one with just bucket counts. This is also a stack usage improvement, since we have a bch_dev_usage on the stack in the allocation path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Kill btree_iter.transKent Overstreet41-402/+405
This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: do_trace_key_cache_fill()Kent Overstreet1-9/+15
Reducing stack frame usage; this moves the printbuf out of the main stack frame. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Split up bch_dev.io_refKent Overstreet19-87/+142
We now have separate per device io_refs for read and write access. This fixes a device removal bug where the discard workers were still running while we're removing alloc info for that device. It's also a bit of hardening; we no longer allow writes to devices that are read-only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02nfs: Add missing release on error in nfs_lock_and_join_requests()Dan Carpenter1-1/+3
Call nfs_release_request() on this error path before returning. Fixes: c3f2235782c3 ("nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/3aaaa3d5-1c8a-41e4-98c7-717801ddd171@stanley.mountain Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-01ksmbd: fix null pointer dereference in alloc_preauth_hash()Namjae Jeon3-5/+24
The Client send malformed smb2 negotiate request. ksmbd return error response. Subsequently, the client can send smb2 session setup even thought conn->preauth_info is not allocated. This patch add KSMBD_SESS_NEED_SETUP status of connection to ignore session setup request if smb2 negotiate phase is not complete. Cc: stable@vger.kernel.org Tested-by: Steve French <stfrench@microsoft.com> Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-26505 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01mm/userfaultfd: fix release hang over concurrent GUPPeter Xu1-26/+25
This patch should fix a possible userfaultfd release() hang during concurrent GUP. This problem was initially reported by Dimitris Siakavaras in July 2023 [1] in a firecracker use case. Firecracker has a separate process handling page faults remotely, and when the process releases the userfaultfd it can race with a concurrent GUP from KVM trying to fault in a guest page during the secondary MMU page fault process. A similar problem was reported recently again by Jinjiang Tu in March 2025 [2], even though the race happened this time with a mlockall() operation, which does GUP in a similar fashion. In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS") was trying to fix this issue. AFAIU, that fixes well the fault paths but may not work yet for GUP. In GUP, the issue is NOPAGE will be almost treated the same as "page fault resolved" in faultin_page(), then the GUP will follow page again, seeing page missing, and it'll keep going into a live lock situation as reported. This change makes core mm return RETRY instead of NOPAGE for both the GUP and fault paths, proactively releasing the mmap read lock. This should guarantee the other release thread make progress on taking the write lock and avoid the live lock even for GUP. When at it, rearrange the comments to make sure it's uptodate. [1] https://lore.kernel.org/r/79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr [2] https://lore.kernel.org/r/20250307072133.3522652-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20250312145131.1143062-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Dimitris Siakavaras <jimsiak@cslab.ece.ntua.gr> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01Merge tag 'driver-core-6.15-rc1' of ↵Linus Torvalds7-126/+226
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updatesk from Greg KH: "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates" * tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits) rust: platform: require Send for Driver trait implementers rust: pci: require Send for Driver trait implementers rust: platform: impl Send + Sync for platform::Device rust: pci: impl Send + Sync for pci::Device rust: platform: fix unrestricted &mut platform::Device rust: pci: fix unrestricted &mut pci::Device rust: device: implement device context marker rust: pci: use to_result() in enable_device_mem() MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers rust/kernel/faux: mark Registration methods inline driver core: faux: only create the device if probe() succeeds rust/faux: Add missing parent argument to Registration::new() rust/faux: Drop #[repr(transparent)] from faux::Registration rust: io: fix devres test with new io accessor functions rust: io: rename `io::Io` accessors kernfs: Move dput() outside of the RCU section. efi: rci2: mark bin_attribute as __ro_after_init rapidio: constify 'struct bin_attribute' firmware: qemu_fw_cfg: constify 'struct bin_attribute' powerpc/perf/hv-24x7: Constify 'struct bin_attribute' ...
2025-04-01Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of ↵Linus Torvalds7-22/+23
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The series "resource: Split and use DEFINE_RES*() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. * tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) mailmap: consolidate email addresses of Alexander Sverdlin fs/procfs: fix the comment above proc_pid_wchan() relay: use kasprintf() instead of fixed buffer formatting resource: replace open coded variant of DEFINE_RES() resource: replace open coded variants of DEFINE_RES_*_NAMED() resource: replace open coded variant of DEFINE_RES_NAMED_DESC() resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED() samples: add hung_task detector mutex blocking sample hung_task: show the blocker task if the task is hung on mutex kexec_core: accept unaccepted kexec segments' destination addresses watchdog/perf: optimize bytes copied and remove manual NUL-termination lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap() lib/interval_tree: skip the check before go to the right subtree lib/interval_tree: add test case for span iteration lib/interval_tree: add test case for interval_tree_iter_xxx() helpers lib/rbtree: add random seed lib/rbtree: split tests lib/rbtree: enable userland test suite for rbtree related data structure checkpatch: describe --min-conf-desc-length scripts/gdb/symbols: determine KASLR offset on s390 ...
2025-04-01Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds23-209/+433
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some co