| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 2fc9feff45d92a92cd5f96487655d5be23fb7e2b upstream.
The sess->user object can currently be in use by another thread, for
example if another connection has sent a session setup request to
bind to the session being free'd. The handler for that connection could
be in the smb2_sess_setup function which makes use of sess->user.
Signed-off-by: Sean Heelan <seanheelan@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Nazar Kalashnikov <sivartiwe@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A recent change to clamp_t() in 6.1.y caused fs/nfsd/nfs4state.c to fail
to compile with gcc-9. The code in nfsd4_get_drc_mem() was written with
the assumption that when "max < min",
clamp(val, min, max)
would return max. This assumption is not documented as an API promise
and the change caused a compile failure if it could be statically
determined that "max < min".
The relevant code was no longer present upstream when commit 1519fbc8832b
("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()")
landed there, so there is no upstream change to nfsd4_get_drc_mem() to
backport.
There is no clear case that the existing code in nfsd4_get_drc_mem()
is functioning incorrectly. The goal of this patch is to permit the clean
application of commit 1519fbc8832b ("minmax.h: use BUILD_BUG_ON_MSG() for
the lo < hi test in clamp()"), and any commits that depend on it, to LTS
kernels without affecting the ability to compile those kernels. This is
done by open-coding the __clamp() macro sans the built-in type checking.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220745#c0
Signed-off-by: NeilBrown <neil@brown.name>
Stable-dep-of: 1519fbc8832b ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed_by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3184b6a5a24ec9ee74087b2a550476f386df7dc2 upstream.
When having a multiuser mount with domain= specified and using
cifscreds, cifs_set_cifscreds() will end up setting @ctx->domainname,
so it needs to be freed before leaving cifs_construct_tcon().
This fixes the following memory leak reported by kmemleak:
mount.cifs //srv/share /mnt -o domain=ZELDA,multiuser,...
su - testuser
cifscreds add -d ZELDA -u testuser
...
ls /mnt/1
...
umount /mnt
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8881203c3f08 (size 8):
comm "ls", pid 5060, jiffies 4307222943
hex dump (first 8 bytes):
5a 45 4c 44 41 00 cc cc ZELDA...
backtrace (crc d109a8cf):
__kmalloc_node_track_caller_noprof+0x572/0x710
kstrdup+0x3a/0x70
cifs_sb_tlink+0x1209/0x1770 [cifs]
cifs_get_fattr+0xe1/0xf50 [cifs]
cifs_get_inode_info+0xb5/0x240 [cifs]
cifs_revalidate_dentry_attr+0x2d1/0x470 [cifs]
cifs_getattr+0x28e/0x450 [cifs]
vfs_getattr_nosec+0x126/0x180
vfs_statx+0xf6/0x220
do_statx+0xab/0x110
__x64_sys_statx+0xd5/0x130
do_syscall_64+0xbb/0x380
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: f2aee329a68f ("cifs: set domainName when a domain-key is used in multiuser")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Cc: stable@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f765fdfcd8b5bce92c6aa1a517ff549529ddf590 ]
Fix typo in description of enable_gcm_256 module parameter
Suggested-by: Thomas Spear <speeddymon@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f2c1f631630e01821fe4c3fdf6077bc7a8284f82 upstream.
sb_min_blocksize() may return 0. Check its return value to avoid
accessing the filesystem super block when sb->s_blocksize is 0.
Cc: stable@vger.kernel.org # v6.15
Fixes: 719c1e1829166d ("exfat: add super block operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Link: https://patch.msgid.link/20251104125009.2111925-3-yangyongpeng.storage@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd ]
The ready event list of an epoll object is protected by read-write
semaphore:
- The consumer (waiter) acquires the write lock and takes items.
- the producer (waker) takes the read lock and adds items.
The point of this design is enabling epoll to scale well with large number
of producers, as multiple producers can hold the read lock at the same
time.
Unfortunately, this implementation may cause scheduling priority inversion
problem. Suppose the consumer has higher scheduling priority than the
producer. The consumer needs to acquire the write lock, but may be blocked
by the producer holding the read lock. Since read-write semaphore does not
support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y),
we have a case of priority inversion: a higher priority consumer is blocked
by a lower priority producer. This problem was reported in [1].
Furthermore, this could also cause stall problem, as described in [2].
Fix this problem by replacing rwlock with spinlock.
This reduces the event bandwidth, as the producers now have to contend with
each other for the spinlock. According to the benchmark from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c:
On 12 x86 CPUs:
Before After Diff
threads events/ms events/ms
8 7162 4956 -31%
16 8733 5383 -38%
32 7968 5572 -30%
64 10652 5739 -46%
128 11236 5931 -47%
On 4 riscv CPUs:
Before After Diff
threads events/ms events/ms
8 2958 2833 -4%
16 3323 3097 -7%
32 3451 3240 -6%
64 3554 3178 -11%
128 3601 3235 -10%
Although the numbers look bad, it should be noted that this benchmark
creates multiple threads who do nothing except constantly generating new
epoll events, thus contention on the spinlock is high. For real workload,
the event rate is likely much lower, and the performance drop is not as
bad.
Using another benchmark (perf bench epoll wait) where spinlock contention
is lower, improvement is even observed on x86:
On 12 x86 CPUs:
Before: Averaged 110279 operations/sec (+- 1.09%), total secs = 8
After: Averaged 114577 operations/sec (+- 2.25%), total secs = 8
On 4 riscv CPUs:
Before: Averaged 175767 operations/sec (+- 0.62%), total secs = 8
After: Averaged 167396 operations/sec (+- 0.23%), total secs = 8
In conclusion, no one is likely to be upset over this change. After all,
spinlock was used originally for years, and the commit which converted to
rwlock didn't mention a real workload, just that the benchmark numbers are
nice.
This patch is not exactly the revert of commit a218cc491420 ("epoll: use
rwlock in order to reduce ep_poll_callback() contention"), because git
revert conflicts in some places which are not obvious on the resolution.
This patch is intended to be backported, therefore go with the obvious
approach:
- Replace rwlock_t with spinlock_t one to one
- Delete list_add_tail_lockless() and chain_epi_lockless(). These were
introduced to allow producers to concurrently add items to the list.
But now that spinlock no longer allows producers to touch the event
list concurrently, these two functions are not necessary anymore.
Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/ec92458ea357ec503c737ead0f10b2c6e4c37d47.1752581388.git.namcao@linutronix.de
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: stable@vger.kernel.org
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1]
Reported-by: Valentin Schneider <vschneid@redhat.com>
Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/ [2]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bfe3d755ef7cec71aac6ecda34a107624735aac7 upstream.
When logging that a new name exists, we skip updating the inode's
last_log_commit field to prevent a later explicit fsync against the inode
from doing nothing (as updating last_log_commit makes btrfs_inode_in_log()
return true). We are detecting, at btrfs_log_inode(), that logging a new
name is happening by checking the logging mode is not LOG_INODE_EXISTS,
but that is not enough because we may log parent directories when logging
a new name of a file in LOG_INODE_ALL mode - we need to check that the
logging_new_name field of the log context too.
An example scenario where this results in an explicit fsync against a
directory not persisting changes to the directory is the following:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ touch /mnt/foo
$ sync
$ mkdir /mnt/dir
# Write some data to our file and fsync it.
$ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/foo
# Add a new link to our file. Since the file was logged before, we
# update it in the log tree by calling btrfs_log_new_name().
$ ln /mnt/foo /mnt/dir/bar
# fsync the root directory - we expect it to persist the dentry for
# the new directory "dir".
$ xfs_io -c "fsync" /mnt
<power fail>
After mounting the fs the entry for directory "dir" does not exists,
despite the explicit fsync on the root directory.
Here's why this happens:
1) When we fsync the file we log the inode, so that it's present in the
log tree;
2) When adding the new link we enter btrfs_log_new_name(), and since the
inode is in the log tree we proceed to updating the inode in the log
tree;
3) We first set the inode's last_unlink_trans to the current transaction
(early in btrfs_log_new_name());
4) We then eventually enter btrfs_log_inode_parent(), and after logging
the file's inode, we call btrfs_log_all_parents() because the inode's
last_unlink_trans matches the current transaction's ID (updated in the
previous step);
5) So btrfs_log_all_parents() logs the root directory by calling
btrfs_log_inode() for the root's inode with a log mode of LOG_INODE_ALL
so that new dentries are logged;
6) At btrfs_log_inode(), because the log mode is LOG_INODE_ALL, we
update root inode's last_log_commit to the last transaction that
changed the inode (->last_sub_trans field of the inode), which
corresponds to the current transaction's ID;
7) Then later when user space explicitly calls fsync against the root
directory, we enter btrfs_sync_file(), which calls skip_inode_logging()
and that returns true, since its call to btrfs_inode_in_log() returns
true and there are no ordered extents (it's a directory, never has
ordered extents). This results in btrfs_sync_file() returning without
syncing the log or committing the current transaction, so all the
updates we did when logging the new name, including logging the root
directory, are not persisted.
So fix this by but updating the inode's last_log_commit if we are sure
we are not logging a new name (if ctx->logging_new_name is false).
A test case for fstests will follow soon.
Reported-by: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/03c5d7ec-5b3d-49d1-95bc-8970a7f82d87@gmail.com/
Fixes: 130341be7ffa ("btrfs: always update the logged transaction when logging new names")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 895b4c0c79b092d732544011c3cecaf7322c36a1 upstream.
Pde is erased from subdir rbtree through rb_erase(), but not set the node
to EMPTY, which may result in uaf access. We should use RB_CLEAR_NODE()
set the erased node to EMPTY, then pde_subdir_next() will return NULL to
avoid uaf access.
We found an uaf issue while using stress-ng testing, need to run testcase
getdent and tun in the same time. The steps of the issue is as follows:
1) use getdent to traverse dir /proc/pid/net/dev_snmp6/, and current
pde is tun3;
2) in the [time windows] unregister netdevice tun3 and tun2, and erase
them from rbtree. erase tun3 first, and then erase tun2. the
pde(tun2) will be released to slab;
3) continue to getdent process, then pde_subdir_next() will return
pde(tun2) which is released, it will case uaf access.
CPU 0 | CPU 1
-------------------------------------------------------------------------
traverse dir /proc/pid/net/dev_snmp6/ | unregister_netdevice(tun->dev) //tun3 tun2
sys_getdents64() |
iterate_dir() |
proc_readdir() |
proc_readdir_de() | snmp6_unregister_dev()
pde_get(de); | proc_remove()
read_unlock(&proc_subdir_lock); | remove_proc_subtree()
| write_lock(&proc_subdir_lock);
[time window] | rb_erase(&root->subdir_node, &parent->subdir);
| write_unlock(&proc_subdir_lock);
read_lock(&proc_subdir_lock); |
next = pde_subdir_next(de); |
pde_put(de); |
de = next; //UAF |
rbtree of dev_snmp6
|
pde(tun3)
/ \
NULL pde(tun2)
Link: https://lkml.kernel.org/r/20251025024233.158363-1-albin_yang@163.com
Signed-off-by: Wei Yang <albinwyang@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: wangzijie <wangzijie1@honor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 98a5fd31cbf72d46bf18e50b3ab0ce86d5f319a9 upstream.
When the per-IP connection limit is exceeded in ksmbd_kthread_fn(),
the code sets ret = -EAGAIN and continues the accept loop without
closing the just-accepted socket. That leaks one socket per rejected
attempt from a single IP and enables a trivial remote DoS.
Release client_sk before continuing.
This bug was found with ZeroPath.
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4aa17144d5abc3c756883e3a010246f0dba8b468 upstream.
Typically copynotify stateid is freed either when parent's stateid
is being close/freed or in nfsd4_laundromat if the stateid hasn't
been used in a lease period.
However, in case when the server got an OPEN (which created
a parent stateid), followed by a COPY_NOTIFY using that stateid,
followed by a client reboot. New client instance while doing
CREATE_SESSION would force expire previous state of this client.
It leads to the open state being freed thru release_openowner->
nfs4_free_ol_stateid() and it finds that it still has copynotify
stateid associated with it. We currently print a warning and is
triggerred
WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd]
This patch, instead, frees the associated copynotify stateid here.
If the parent stateid is freed (without freeing the copynotify
stateids associated with it), it leads to the list corruption
when laundromat ends up freeing the copynotify state later.
[ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink
[ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary)
[ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN
[ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024
[ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd]
[ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200
[ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200
[ 1626.861182] sp : ffff8000881d7a40
[ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200
[ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20
[ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8
[ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000
[ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065
[ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3
[ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000
[ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001
[ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000
[ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d
[ 1626.868167] Call trace:
[ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P)
[ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd]
[ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd]
[ 1626.869813] laundromat_main+0x24/0x60 [nfsd]
[ 1626.870231] process_one_work+0x584/0x1050
[ 1626.870595] worker_thread+0x4c4/0xc60
[ 1626.870893] kthread+0x2f8/0x398
[ 1626.871146] ret_from_fork+0x10/0x20
[ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000)
[ 1626.871892] SMP: stopping secondary CPUs
Reported-by: rtm@csail.mit.edu
Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t
Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1f214e9c3aef2d0936be971072e991d78a174d71 ]
The Smatch static checker noted that in _nfs4_proc_lookupp(), the flag
RPC_TASK_TIMEOUT is being passed as an argument to nfs4_init_sequence(),
which is clearly incorrect.
Since LOOKUPP is an idempotent operation, nfs4_init_sequence() should
not ask the server to cache the result. The RPC_TASK_TIMEOUT flag needs
to be passed down to the RPC layer.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Fixes: 76998ebb9158 ("NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 379510a815cb2e64eb0a379cb62295d6ade65df0 ]
Reference count of ksmbd_session will leak when session need reconnect.
Fix this by adding the missing ksmbd_user_session_put().
Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6fced056d2cc8d01b326e6fcfabaacb9850b71a4 ]
Memory leak occurs when ksmbd_vfs_read() fails.
Fix this by adding the missing kvfree().
Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9ff022f3820a31507cb93be6661bf5f3ca0609a4 ]
I noticed xfstests generic/193 and generic/355 started failing against
knfsd after commit e7a8ebc305f2 ("NFSD: Offer write delegation for OPEN
with OPEN4_SHARE_ACCESS_WRITE").
I ran those same tests against ONTAP (which has had write delegation
support for a lot longer than knfsd) and they fail there too... so
while it's a new failure against knfsd, it isn't an entirely new
failure.
Add the NFS_INO_REVAL_FORCED flag so that the presence of a delegation
doesn't keep the inode from being revalidated to fetch the updated mode.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9bb3baa9d1604cd20f49ae7dac9306b4037a0e7a ]
Since the last renewal time was initialized to 0 and jiffies start
counting at -5 minutes, any clients connected in the first 5 minutes
after a reboot would have their renewal timer set to a very long
interval. If the connection was idle, this would result in the client
state timing out on the server and the next call to the server would
return NFS4ERR_BADSESSION.
Fix this by initializing the last renewal time to the current jiffies
instead of 0.
Signed-off-by: Joshua Watt <jpewhacker@gmail.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b540de9e3b4fab3b9e10f30714a6f5c1b2a50ec3 ]
Fix refcount leak in `smb2_set_path_attr` when path conversion fails.
Function `cifs_get_writable_path` returns `cfile` with its reference
counter `cfile->count` increased on success. Function `smb2_compound_op`
would decrease the reference counter for `cfile`, as stated in its
comment. By calling `smb2_rename_path`, the reference counter of `cfile`
would leak if `cifs_convert_path_to_utf16` fails in `smb2_set_path_attr`.
Fixes: 8de9e86c67ba ("cifs: create a helper to find a writeable handle by path name")
Acked-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 4012abe8a78fbb8869634130024266eaef7081fe upstream.
SMB2_change_notify called smb2_validate_iov() but ignored the return
code, then kmemdup()ed using server provided OutputBufferOffset/Length.
Check the return of smb2_validate_iov() and bail out on error.
Discovered with help from the ZeroPath security tooling.
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: stable@vger.kernel.org
Fixes: e3e9463414f61 ("smb3: improve SMB3 change notification support")
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b7ed1e29cfe773d648ca09895b92856bd3a2092d ]
The Coverity Scan service has detected the calling of
wait_for_completion_killable() without checking the return
value in ceph_lock_wait_for_completion() [1]. The CID 1636232
defect contains explanation: "If the function returns an error
value, the error value may be mistaken for a normal value.
In ceph_lock_wait_for_completion(): Value returned from
a function is not checked for errors before being used. (CWE-252)".
The patch adds the checking of wait_for_completion_killable()
return value and return the error code from
ceph_lock_wait_for_completion().
[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1636232
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 528f218b31aac4bbfc58914d43766a22ab545d48 ]
v9fs_sysfs_init() always returned -ENOMEM on failure;
return the actual sysfs_create_group() error instead.
Signed-off-by: Randall P. Embry <rpembry@gmail.com>
Message-ID: <20250926-v9fs_misc-v1-3-a8b3907fc04d@codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 86db0c32f16c5538ddb740f54669ace8f3a1f3d7 ]
caches_show() overwrote its buffer on each iteration,
so only the last cache tag was visible in sysfs output.
Properly append with snprintf(buf + count, …).
Signed-off-by: Randall P. Embry <rpembry@gmail.com>
Message-ID: <20250926-v9fs_misc-v1-2-a8b3907fc04d@codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 32058c38d3b79a28963a59ac0353644dc24775cd ]
The function call new_inode() is a primitive for allocating an inode in memory,
rather than planning disk space for it. Therefore, -ENOMEM should be returned
as the error code rather than -ENOSPC.
To be specific, new_inode()'s call path looks like this:
new_inode
new_inode_pseudo
alloc_inode
ops->alloc_inode (hpfs_alloc_inode)
alloc_inode_sb
kmem_cache_alloc_lru
Therefore, the failure of new_inode() indicates a memory presure issue (-ENOMEM),
not a lack of disk space. However, the current implementation of
hpfs_mkdir/create/mknod/symlink incorrectly returns -ENOSPC when new_inode() fails.
This patch fix this by set err to -ENOMEM before the goto statement.
BTW, we also noticed that other nested calls within these four functions,
like hpfs_alloc_f/dnode and hpfs_add_dirent, might also fail due to memory presure.
But similarly, only -ENOSPC is returned. Addressing these will involve code
modifications in other functions, and we plan to submit dedicated patches for these
issues in the future. For this patch, we focus on new_inode().
Signed-off-by: Yikang Yue <yikangy2@illinois.edu>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3b1a4a59a2086badab391687a6a0b86e03048393 ]
In btrfs_fallocate(), when the allocated range overlaps with a prealloc
extent and the extent starts after i_size, the range doesn't get marked
dirty in file_extent_tree. This results in persisting an incorrect
disk_i_size for the inode when not using the no-holes feature.
This is reproducible since commit 41a2ee75aab0 ("btrfs: introduce
per-inode file extent tree"), then became hidden since commit 3d7db6e8bd22
("btrfs: don't allocate file extent tree for non regular files") and then
visible again after commit 8679d2687c35 ("btrfs: initialize
inode::file_extent_tree after i_mode has been set"), which fixes the
previous commit.
The following reproducer triggers the problem:
$ cat test.sh
MNT=/mnt/test
DEV=/dev/vdb
mkdir -p $MNT
mkfs.btrfs -f -O ^no-holes $DEV
mount $DEV $MNT
touch $MNT/file1
fallocate -n -o 1M -l 2M $MNT/file1
umount $MNT
mount $DEV $MNT
len=$((1 * 1024 * 1024))
fallocate -o 1M -l $len $MNT/file1
du --bytes $MNT/file1
umount $MNT
mount $DEV $MNT
du --bytes $MNT/file1
umount $MNT
Running the reproducer gives the following result:
$ ./test.sh
(...)
2097152 /mnt/test/file1
1048576 /mnt/test/file1
The difference is exactly 1048576 as we assigned.
Fix by adding a call to btrfs_inode_set_file_extent_range() in
btrfs_fallocate_update_isize().
Fixes: 41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
Signed-off-by: austinchang <austinchang@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 00be6f26a2a7c671f1402d74c4d3c30a5844660a ]
When io_uring is used in the same task as CIFS, there might be
unnecessary reconnects, causing issues in user-space applications
like QEMU with a log like:
> CIFS: VFS: \\10.10.100.81 Error -512 sending data on socket to server
Certain io_uring completions might be added to task_work with
notify_method being TWA_SIGNAL and thus TIF_NOTIFY_SIGNAL is set for
the task.
In __smb_send_rqst(), signals are masked before calling
smb_send_kvec(), but the masking does not apply to TIF_NOTIFY_SIGNAL.
If sk_stream_wait_memory() is reached via sock_sendmsg() while
TIF_NOTIFY_SIGNAL is set, signal_pending(current) will evaluate to
true there, and -EINTR will be propagated all the way from
sk_stream_wait_memory() to sock_sendmsg() in smb_send_kvec().
Afterwards, __smb_send_rqst() will see that not everything was written
and reconnect.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3677ca67b9791481af16d86e47c3c7d1f2442f95 ]
we should use sock_create_kern() if the socket resides in kernel space.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 025e880759c279ec64d0f754fe65bf45961da864 ]
Willy Tarreau <w@1wt.eu> forwarded me a message from
Disclosure <disclosure@aisle.com> with the following
warning:
> The helper `xattr_key()` uses the pointer variable in the loop condition
> rather than dereferencing it. As `key` is incremented, it remains non-NULL
> (until it runs into unmapped memory), so the loop does not terminate on
> valid C strings and will walk memory indefinitely, consuming CPU or hanging
> the thread.
I easily reproduced this with setfattr and getfattr, causing a kernel
oops, hung user processes and corrupted orangefs files. Disclosure
sent along a diff (not a patch) with a suggested fix, which I based
this patch on.
After xattr_key started working right, xfstest generic/069 exposed an
xattr related memory leak that lead to OOM. xattr_key returns
a hashed key. When adding xattrs to the orangefs xattr cache, orangefs
used hash_add, a kernel hashing macro. hash_add also hashes the key using
hash_log which resulted in additions to the xattr cache going to the wrong
hash bucket. generic/069 tortures a single file and orangefs does a
getattr for the xattr "security.capability" every time. Orangefs
negative caches on xattrs which includes a kmalloc. Since adds to the
xattr cache were going to the wrong bucket, every getattr for
"security.capability" resulted in another kmalloc, none of which were
ever freed.
I changed the two uses of hash_add to hlist_add_head instead
and the memory leak ceased and generic/069 quit throwing furniture.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Stanislav Fort of Aisle Research <stanislav.fort@aisle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6dfba108387bf4e71411b3da90b2d5cce48ba054 ]
For exFAT filesystems with 4MB read_ahead_size, removing the storage device
when the read operation is in progress, which cause the last read syscall
spent 150s [1]. The main reason is that exFAT generates excessive log
messages [2].
After applying this patch, approximately 300,000 lines of log messages
were suppressed, and the delay of the last read() syscall was reduced
to about 4 seconds.
[1]:
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000120>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000032>
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000119>
read(4, 0x7fccf28ae000, 131072) = -1 EIO (Input/output error) <150.186215>
[2]:
[ 333.696603] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
[ 333.697378] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
[ 333.698156] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 46e75c56dfeafb6756773b71cabe187a6886859a ]
The following code paths may result in high latency or even task hangs:
1. fastcommit io is throttled by wbt.
2. jbd2_fc_wait_bufs() might wait for a long time while
JBD2_FAST_COMMIT_ONGOING is set in journal->flags, and then
jbd2_journal_commit_transaction() waits for the
JBD2_FAST_COMMIT_ONGOING bit for a long time while holding the write
lock of j_state_lock.
3. start_this_handle() waits for read lock of j_state_lock which
results in high latency or task hang.
Given the fact that ext4_fc_commit() already modifies the current
process' IO priority to match that of the jbd2 thread, it should be
reasonable to match jbd2's IO submission flags as well.
Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20250827121812.1477634-1-sunjunchao@bytedance.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1534f72dc2a11ded38b0e0268fbcc0ca24e9fd4a ]
The parent function ext4_xattr_inode_lookup_create already uses GFP_NOFS for memory alloction, so the function ext4_xattr_inode_cache_find should use same gfp_flag.
Signed-off-by: chuguangqing <chuguangqing@inspur.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a890a2e339b929dbd843328f9a92a1625404fe63 ]
Theoretically it's an oopsable race, but I don't believe one can manage
to hit it on real hardware; might become doable on a KVM, but it still
won't be easy to attack.
Anyway, it's easy to deal with - since xdr_encode_hyper() is just a call of
put_unaligned_be64(), we can put that under ->d_lock and be done with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bf75ad096820fee5da40e671ebb32de725a1c417 ]
When client initialization goes through server trunking discovery, it
schedules the state manager and then sleeps waiting for nfs_client
initialization completion.
The state manager can fail during state recovery, and specifically in
lease establishment as nfs41_init_clientid() will bail out in case of
errors returned from nfs4_proc_create_session(), without ever marking
the client ready. The session creation can fail for a variety of reasons
e.g. during backchannel parameter negotiation, with status -EINVAL.
The error status will propagate all the way to the nfs4_state_manager
but the client status will not be marked, and thus the mount process
will remain blocked waiting.
Fix it by adding -EINVAL error handling to nfs4_state_manager().
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit be390f95242785adbf37d7b8a5101dd2f2ba891b ]
RFC7530 states that clients should be prepared for the return of
NFS4ERR_GRACE errors for non-reclaim lock and I/O requests.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 300b072df72694ea330c4c673c035253e07827b8 ]
The transaction manager initialization in txInit() was not properly
initializing TxBlock[0].waitor waitqueue, causing a crash when
txEnd(0) is called on read-only filesystems.
When a filesystem is mounted read-only, txBegin() returns tid=0 to
indicate no transaction. However, txEnd(0) still gets called and
tries to access TxBlock[0].waitor via tid_to_tblock(0), but this
waitqueue was never initialized because the initialization loop
started at index 1 instead of 0.
This causes a 'non-static key' lockdep warning and system crash:
INFO: trying to register non-static key in txEnd
Fix by ensuring all transaction blocks including TxBlock[0] have
their waitqueues properly initialized during txInit().
Reported-by: syzbot+c4f3462d8b2ad7977bea@syzkaller.appspotmail.com
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7a5aa54fba2bd591b22b9b624e6baa9037276986 ]
The inode mode loaded from corrupted disk can be invalid. Do like what
commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk")
does.
Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fe91e078b60d1beabf5cef4a37c848457a6d2dfb ]
... allowing any ->lookup() return value to be passed to it.
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4e8011ffec79717e5fdac43a7e79faf811a384b7 ]
Since commit af153bb63a33 ("vfs: catch invalid modes in may_open()")
requires any inode be one of S_IFDIR/S_IFLNK/S_IFREG/S_IFCHR/S_IFBLK/
S_IFIFO/S_IFSOCK type, use S_IFREG for $Extend records.
Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit fb881cd7604536b17a1927fb0533f9a6982ffcc5 upstream.
After commit c0e473a0d226 ("block: fix race between set_blocksize and read
paths") was merged, set_blocksize() called by sb_set_blocksize() now locks
the inode of the backing device file. As a result of this change, syzbot
started reporting deadlock warnings due to a circular dependency involving
the semaphore "ns_sem" of the nilfs object, the inode lock of the backing
device file, and the locks that this inode lock is transitively dependent
on.
This is caused by a new lock dependency added by the above change, since
init_nilfs() calls sb_set_blocksize() in the lock section of "ns_sem".
However, these warnings are false positives because init_nilfs() is called
in the early stage of the mount operation and the filesystem has not yet
started.
The reason why "ns_sem" is locked in init_nilfs() was to avoid a race
condition in nilfs_fill_super() caused by sharing a nilfs object among
multiple filesystem instances (super block structures) in the early
implementation. However, nilfs objects and super block structures have
long ago become one-to-one, and there is no longer any need to use the
semaphore there.
So, fix this issue by removing the use of the semaphore "ns_sem" in
init_nilfs().
Link: https://lkml.kernel.org/r/20250503053327.12294-1-konishi.ryusuke@gmail.com
Fixes: c0e473a0d226 ("block: fix race between set_blocksize and read paths")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00f7f5b884b117ee6773
Tested-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Reported-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f30591e72bfc24d4715b
Tested-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8287474aa5ffb41df52552c4ae4748e791d2faf2 upstream.
If we fail filemap_write_and_wait_range() on the range the buffered write went
into, we only report the "number of bytes which we direct-written", to quote
the comment in there. Which is fine, but buffered write has already advanced
iocb->ki_pos, so we need to roll that back. Otherwise we end up with e.g.
write(2) advancing position by more than the amount it reports having written.
Fixes: 182c25e9c157 "filemap: update ki_pos in generic_perform_write"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Message-Id: <20230827214518.GU3390869@ZenIV>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 44fff0fa08ec5a6d9d5fb05443a36d854d0ece4d upstream.
Add a helper dealing with handling the syncing of a buffered write
fallback for direct I/O.
Link: https://lkml.kernel.org/r/20230601145904.1385409-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[backing_dev_info still being used here. do small changes to the patch
to keep the out label. Which means replacing all returns to goto out.]
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 182c25e9c157f37bd0ab5a82fe2417e2223df459 upstream.
All callers of generic_perform_write need to updated ki_pos, move it into
common code.
Link: https://lkml.kernel.org/r/20230601145904.1385409-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c402a9a9430b670926decbb284b756ee6f47c1ec upstream.
Add a helper to invalidate page cache after a dio write.
Link: https://lkml.kernel.org/r/20230601145904.1385409-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit abb1f08a2121dd270193746e43b2a9373db9ad84 upstream.
When tracing is enabled, the trace_nfsd_read_done trace point
crashes during the pynfs read.testNoFh test.
Fixes: 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns an error")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 45c222468d33202c07c41c113301a4b9c8451b8f ]
After setting the BTRFS_ROOT_FORCE_COW flag on the root we are doing a
full write barrier, smp_wmb(), but we don't need to, all we need is a
smp_mb__after_atomic(). The use of the smp_wmb() is from the old days
when we didn't use a bit and used instead an int field in the root to
signal if cow is forced. After the int field was changed to a bit in
the root's state (flags field), we forgot to update the memory barrier
in create_pending_snapshot() to smp_mb__after_atomic(), but we did the
change in commit_fs_roots() after clearing BTRFS_ROOT_FORCE_COW. That
happened in commit 27cdeb7096b8 ("Btrfs: use bitfield instead of integer
data type for the some variants in btrfs_root"). On the reader side, in
should_cow_block(), we also use the counterpart smp_mb__before_atomic()
which generates further confusion.
So change the smp_wmb() to smp_mb__after_atomic(). In fact we don't
even need any barrier at all since create_pending_snapshot() is called
in the critical section of a transaction commit and therefore no one
can concurrently join/attach the transaction, or start a new one, until
the transaction is unblocked. By the time someone starts a new transaction
and enters should_cow_block(), a lot of implicit memory barriers already
took place by having acquired several locks such as fs_info->trans_lock
and extent buffer locks on the root node at least. Nevertlheless, for
consistency use smp_mb__after_atomic() after setting the force cow bit
in create_pending_snapshot().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2f5b8095ea47b142c56c09755a8b1e14145a2d30 ]
Currently we have this odd behaviour:
1) At btrfs_replay_log() we drop the reference of the log root tree if
the call to btrfs_recover_log_trees() failed;
2) But if the call to btrfs_recover_log_trees() did not fail, we don't
drop the reference in btrfs_replay_log() - we expect that
btrfs_recover_log_trees() does it in case it returns success.
Let's simplify this and make btrfs_replay_log() always drop the reference
on the log root tree, not only this simplifies code as it's what makes
sense since it's btrfs_replay_log() who grabbed the reference in the first
place.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a7f3dfb8293c4cee99743132d69863a92e8f4875 ]
Replace max_t() followed by min_t() with a single clamp().
As was pointed by David Laight in
https://lore.kernel.org/linux-btrfs/20250906122458.75dfc8f0@pumpkin/
the calculation may overflow u32 when the input value is too large, so
clamp_t() is not used. In practice the expected values are in range of
megabytes to gigabytes (throughput limit) so the bug would not happen.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: David Sterba <dsterba@suse.com>
[ Use clamp() and add explanation. ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0d703963d297964451783e1a0688ebdf74cd6151 ]
The hint block group selection in the extent allocator is wrong in the
first place, as it can select the dedicated data relocation block group for
the normal data allocation.
Since we separated the normal data space_info and the data relocation
space_info, we can easily identify a block group is for data relocation or
not. Do not choose it for the normal data allocation.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6f40e50ceb99fc8ef37e5c56e2ec1d162733fef0 upstream.
handle_response() dereferences the payload as a 4-byte handle without
verifying that the declared payload size is at least 4 bytes. A malformed
or truncated message from ksmbd.mountd can lead to a 4-byte read past the
declared payload size. Validate the size before dereferencing.
This is a minimal fix to guard the initial handle read.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reported-by: Qianchang Zhao <pioooooooooip@gmail.com>
Signed-off-by: Qianchang Zhao <pioooooooooip@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b2d99376c5d61eb60ffdb6c503e4b6c8f9712ddd ]
ksmbd.mount will give each interfaces list and bind_interfaces_only flags
to ksmbd server. Previously, the interfaces list was sent only
when bind_interfaces_only was enabled.
ksmbd server browse only interfaces list given from ksmbd.conf on
FSCTL_QUERY_INTERFACE_INFO IOCTL.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 630785bfbe12c3ee3ebccd8b530a98d632b7e39d ]
The deprecation of the 'attr2' mount option in 6.18 wasn't entirely
successful because nobody noticed that the kernel never printed a
warning about attr2 being set in fstab if the only xfs filesystem is the
root fs; the initramfs mounts the root fs with no mount options; and the
init scripts only conveyed the fstab options by remounting the root fs.
Fix this by making it complain all the time.
Cc: stable@vger.kernel.org # v5.13
Fixes: 92cf7d36384b99 ("xfs: Skip repetitive warnings about mount options")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
[ Update existing xfs_fs_warn_deprecated() callers ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9d5c4f5c7a2c7677e1b3942772122b032c265aae ]
Assuming the disk layout as below,
disk0: 0 --- 0x00035abfff
disk1: 0x00035ac000 --- 0x00037abfff
disk2: 0x00037ac000 --- 0x00037ebfff
and we want to read data from offset=13568 having len=128 across the block
devices, we can illustrate the block addresses like below.
0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 -------
| ^ ^ ^
| fofs 0 13568 13568+128
| ------------------------------------------------------
| LBA 0x37e8aa9 0x37ebfa9 0x37ec029
--- map 0x3caa9 0x3ffa9
In this example, we should give the relative map of the target block device
ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by
0x37ebfff + 1 - 0x37ebfa9.
In the below equation, however, map->m_pblk was supposed to be the original
address instead of the one from the target block address.
- map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);
Cc: stable@vger.kernel.org
Fixes: 71f2c8206202 ("f2fs: multidevice: support direct IO")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0094e98bd1477a6b7d97c25b47b19a7317c35279 ]
Add a helper to deal with everything needed to return a f2fs_map_blocks
structure based on a lookup in the extent cache.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: 9d5c4f5c7a2c ("f2fs: fix wrong block mapping for multi-devices")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|