Age | Commit message (Collapse) | Author | Files | Lines |
|
After installing the anonymous fd, we can now see it in userland and close
it. However, at this point we may not have gotten the reference count of
the cache, but we will put it during colse fd, so this may cause a cache
UAF.
So grab the cache reference count before fd_install(). In addition, by
kernel convention, fd is taken over by the user land after fd_install(),
and the kernel should not call close_fd() after that, i.e., it should call
fd_install() after everything is ready, thus fd_install() is called after
copy_to_user() succeeds.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-10-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now every time the daemon reads an open request, it gets a new anonymous fd
and ondemand_id. With the introduction of "restore", it is possible to read
the same open request more than once, and therefore an object can have more
than one anonymous fd.
If the anonymous fd is not unique, the following concurrencies will result
in an fd leak:
t1 | t2 | t3
------------------------------------------------------------
cachefiles_ondemand_init_object
cachefiles_ondemand_send_req
REQ_A = kzalloc(sizeof(*req) + data_len)
wait_for_completion(&REQ_A->done)
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
REQ_A = cachefiles_ondemand_select_req
cachefiles_ondemand_get_fd
load->fd = fd0
ondemand_id = object_id0
------ restore ------
cachefiles_ondemand_restore
// restore REQ_A
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
REQ_A = cachefiles_ondemand_select_req
cachefiles_ondemand_get_fd
load->fd = fd1
ondemand_id = object_id1
process_open_req(REQ_A)
write(devfd, ("copen %u,%llu", msg->msg_id, size))
cachefiles_ondemand_copen
xa_erase(&cache->reqs, id)
complete(&REQ_A->done)
kfree(REQ_A)
process_open_req(REQ_A)
// copen fails due to no req
// daemon close(fd1)
cachefiles_ondemand_fd_release
// set object closed
-- umount --
cachefiles_withdraw_cookie
cachefiles_ondemand_clean_object
cachefiles_ondemand_init_close_req
if (!cachefiles_ondemand_object_is_open(object))
return -ENOENT;
// The fd0 is not closed until the daemon exits.
However, the anonymous fd holds the reference count of the object and the
object holds the reference count of the cookie. So even though the cookie
has been relinquished, it will not be unhashed and freed until the daemon
exits.
In fscache_hash_cookie(), when the same cookie is found in the hash list,
if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new
cookie waits for the old cookie to be unhashed, while the old cookie is
waiting for the leaked fd to be closed, if the daemon does not exit in time
it will trigger a hung task.
To avoid this, allocate a new anonymous fd only if no anonymous fd has
been allocated (ondemand_id == 0) or if the previously allocated anonymous
fd has been closed (ondemand_id == -1). Moreover, returns an error if
ondemand_id is valid, letting the daemon know that the current userland
restore logic is abnormal and needs to be checked.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-9-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The following concurrency may cause a read request to fail to be completed
and result in a hung:
t1 | t2
---------------------------------------------------------
cachefiles_ondemand_copen
req = xa_erase(&cache->reqs, id)
// Anon fd is maliciously closed.
cachefiles_ondemand_fd_release
xa_lock(&cache->reqs)
cachefiles_ondemand_set_object_close(object)
xa_unlock(&cache->reqs)
cachefiles_ondemand_set_object_open
// No one will ever close it again.
cachefiles_ondemand_daemon_read
cachefiles_ondemand_select_req
// Get a read req but its fd is already closed.
// The daemon can't issue a cread ioctl with an closed fd, then hung.
So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and
state, thus we can avoid the above problem in cachefiles_ondemand_copen()
by using ondemand_id to determine if fd has been closed.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-8-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This prevents malicious processes from completing random copen/cread
requests and crashing the system. Added checks are listed below:
* Generic, copen can only complete open requests, and cread can only
complete read requests.
* For copen, ondemand_id must not be 0, because this indicates that the
request has not been read by the daemon.
* For cread, the object corresponding to fd and req should be the same.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-7-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The err_put_fd label is only used once, so remove it to make the code
more readable. In addition, the logic for deleting error request and
CLOSE request is merged to simplify the code.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-6-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We got the following issue in a fuzz test of randomly issuing the restore
command:
==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60
Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564
Call Trace:
kasan_report+0x93/0xc0
cachefiles_ondemand_daemon_read+0xb41/0xb60
vfs_read+0x169/0xb50
ksys_read+0xf5/0x1e0
Allocated by task 116:
kmem_cache_alloc+0x140/0x3a0
cachefiles_lookup_cookie+0x140/0xcd0
fscache_cookie_state_machine+0x43c/0x1230
[...]
Freed by task 792:
kmem_cache_free+0xfe/0x390
cachefiles_put_object+0x241/0x480
fscache_cookie_state_machine+0x5c8/0x1230
[...]
==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------
cachefiles_withdraw_cookie
cachefiles_ondemand_clean_object(object)
cachefiles_ondemand_send_req
REQ_A = kzalloc(sizeof(*req) + data_len)
wait_for_completion(&REQ_A->done)
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
REQ_A = cachefiles_ondemand_select_req
msg->object_id = req->object->ondemand->ondemand_id
------ restore ------
cachefiles_ondemand_restore
xas_for_each(&xas, req, ULONG_MAX)
xas_set_mark(&xas, CACHEFILES_REQ_NEW)
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
REQ_A = cachefiles_ondemand_select_req
copy_to_user(_buffer, msg, n)
xa_erase(&cache->reqs, id)
complete(&REQ_A->done)
------ close(fd) ------
cachefiles_ondemand_fd_release
cachefiles_put_object
cachefiles_put_object
kmem_cache_free(cachefiles_object_jar, object)
REQ_A->object->ondemand->ondemand_id
// object UAF !!!
When we see the request within xa_lock, req->object must not have been
freed yet, so grab the reference count of object before xa_unlock to
avoid the above issue.
Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-5-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We got the following issue in a fuzz test of randomly issuing the restore
command:
==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
Call Trace:
kasan_report+0x94/0xc0
cachefiles_ondemand_daemon_read+0x609/0xab0
vfs_read+0x169/0xb50
ksys_read+0xf5/0x1e0
Allocated by task 626:
__kmalloc+0x1df/0x4b0
cachefiles_ondemand_send_req+0x24d/0x690
cachefiles_create_tmpfile+0x249/0xb30
cachefiles_create_file+0x6f/0x140
cachefiles_look_up_object+0x29c/0xa60
cachefiles_lookup_cookie+0x37d/0xca0
fscache_cookie_state_machine+0x43c/0x1230
[...]
Freed by task 626:
kfree+0xf1/0x2c0
cachefiles_ondemand_send_req+0x568/0x690
cachefiles_create_tmpfile+0x249/0xb30
cachefiles_create_file+0x6f/0x140
cachefiles_look_up_object+0x29c/0xa60
cachefiles_lookup_cookie+0x37d/0xca0
fscache_cookie_state_machine+0x43c/0x1230
[...]
==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------
cachefiles_ondemand_init_object
cachefiles_ondemand_send_req
REQ_A = kzalloc(sizeof(*req) + data_len)
wait_for_completion(&REQ_A->done)
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
REQ_A = cachefiles_ondemand_select_req
cachefiles_ondemand_get_fd
copy_to_user(_buffer, msg, n)
process_open_req(REQ_A)
------ restore ------
cachefiles_ondemand_restore
xas_for_each(&xas, req, ULONG_MAX)
xas_set_mark(&xas, CACHEFILES_REQ_NEW);
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
REQ_A = cachefiles_ondemand_select_req
write(devfd, ("copen %u,%llu", msg->msg_id, size));
cachefiles_ondemand_copen
xa_erase(&cache->reqs, id)
complete(&REQ_A->done)
kfree(REQ_A)
cachefiles_ondemand_get_fd(REQ_A)
fd = get_unused_fd_flags
file = anon_inode_getfile
fd_install(fd, file)
load = (void *)REQ_A->msg.data;
load->fd = fd;
// load UAF !!!
This issue is caused by issuing a restore command when the daemon is still
alive, which results in a request being processed multiple times thus
triggering a UAF. So to avoid this problem, add an additional reference
count to cachefiles_req, which is held while waiting and reading, and then
released when the waiting and reading is over.
Note that since there is only one reference count for waiting, we need to
avoid the same request being completed multiple times, so we can only
complete the request if it is successfully removed from the xarray.
Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-4-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Even with CACHEFILES_DEAD set, we can still read the requests, so in the
following concurrency the request may be used after it has been freed:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------
cachefiles_ondemand_init_object
cachefiles_ondemand_send_req
REQ_A = kzalloc(sizeof(*req) + data_len)
wait_for_completion(&REQ_A->done)
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
// close dev fd
cachefiles_flush_reqs
complete(&REQ_A->done)
kfree(REQ_A)
xa_lock(&cache->reqs);
cachefiles_ondemand_select_req
req->msg.opcode != CACHEFILES_OP_READ
// req use-after-free !!!
xa_unlock(&cache->reqs);
xa_destroy(&cache->reqs)
Hence remove requests from cache->reqs when flushing them to avoid
accessing freed requests.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240522114308.2402121-3-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We were accidentally returning -EROFS during recovery on filesystem
inconsistency - since this is what the journal returns on emergency
shutdown.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Can't actually be used uninitialized, but gcc was being silly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Compatibility fix - we no longer have a separate table for which order
gc walks btrees in, and special case the stripes btree directly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/mean_and_variance_test.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_check_version_downgrade() was setting c->sb.version, which
bch2_sb_set_downgrade() expects to be at the previous version; and it
shouldn't even have been set directly because c->sb.version is updated
by write_super().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
delete_dead_snapshots now runs before the main fsck.c passes which check
for keys for invalid snapshots; thus, it needs those checks as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Consolidate per-key work into delete_dead_snapshots_process_key(), so we
now walk all keys once, not twice.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now track whether a transaction is locked, and verify that we don't
have nodes locked when the transaction isn't locked; reorder relocks to
not pop the new assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This function is used for finding the hash seed (which is the same in
all versions of an inode in different snapshots): ff an inode has been
deleted in a child snapshot we need to iterate until we find a live
version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It can be useful to know the exact byte offset within a btree node where
an error occured.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If a write path in COW mode fails, either before submitting a bio for the
new extents or an actual IO error happens, we can end up allowing a fast
fsync to log file extent items that point to unwritten extents.
This is because dropping the extent maps happens when completing ordered
extents, at btrfs_finish_one_ordered(), and the completion of an ordered
extent is executed in a work queue.
This can result in a fast fsync to start logging file extent items based
on existing extent maps before the ordered extents complete, therefore
resulting in a log that has file extent items that point to unwritten
extents, resulting in a corrupt file if a crash happens after and the log
tree is replayed the next time the fs is mounted.
This can happen for both direct IO writes and buffered writes.
For example consider a direct IO write, in COW mode, that fails at
btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
error:
1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
set to false, meaning an error happened;
2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
flag;
3) btrfs_finish_ordered_extent() queues the completion of the ordered
extent - so that btrfs_finish_one_ordered() will be executed later in
a work queue. That function will drop extent maps in the range when
it's executed, since the extent maps point to unwritten locations
(signaled by the BTRFS_ORDERED_IOERR flag);
4) After calling btrfs_finish_ordered_extent() we keep going down the
write path and unlock the inode;
5) After that a fast fsync starts and locks the inode;
6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
task sees the extent maps that point to the unwritten locations and
logs file extent items based on them - it does not know they are
unwritten, and the fast fsync path does not wait for ordered extents
to complete, which is an intentional behaviour in order to reduce
latency.
For the buffered write case, here's one example:
1) A fast fsync begins, and it starts by flushing delalloc and waiting for
the writeback to complete by calling filemap_fdatawait_range();
2) Flushing the dellaloc created a new extent map X;
3) During the writeback some IO error happened, and at the end io callback
(end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
completion;
4) After queuing the ordered extent completion, the end io callback clears
the writeback flag from all pages (or folios), and from that moment the
fast fsync can proceed;
5) The fast fsync proceeds sees extent map X and logs a file extent item
based on extent map X, resulting in a log that points to an unwritten
data extent - because the ordered extent completion hasn't run yet, it
happens only after the logging.
To fix this make btrfs_finish_ordered_extent() set the inode flag
BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
so that a fast fsync will wait for ordered extent completion.
Note that this issues of using extent maps that point to unwritten
locations can not happen for reads, because in read paths we start by
locking the extent range and wait for any ordered extents in the range
to complete before looking for extent maps.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Wolfram reported that debugfs remained empty on some of his boards
triggering the message "debugfs: Unknown parameter 'auto'".
The root of the issue is that we ignored unknown mount options in the
old mount api but we started rejecting unknown mount options in the new
mount api. Continue to ignore unknown mount options to not regress
userspace.
Fixes: a20971c18752 ("vfs: Convert debugfs to use the new mount API")
Link: https://lore.kernel.org/r/20240527100618.np2wqiw5mz7as3vk@ninjato
Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Move ovl_copy_up() call outside of ovl_want_write()/ovl_drop_write()
region, since copy up may also call ovl_want_write() resulting in recursive
locking on sb->s_writers.
Reported-and-tested-by: syzbot+85e58cdf5b3136471d4b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f6865106191c3e58@google.com/
Fixes: 9a87907de359 ("ovl: implement tmpfile")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
An async dio write to a sparse file can generate a lot of extents
and when we unlink this file (using rm), the kernel can be busy in umapping
and freeing those extents as part of transaction processing.
Similarly xfs reflink remapping path can also iterate over a million
extent entries in xfs_reflink_remap_blocks().
Since we can busy loop in these two functions, so let's add cond_resched()
to avoid softlockup messages like these.
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1
Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
Call Trace:
xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
xfs_alloc_compute_aligned+0x5c/0x144
xfs_alloc_ag_vextent_size+0x238/0x8d4
xfs_alloc_fix_freelist+0x540/0x694
xfs_free_extent_fix_freelist+0x84/0xe0
__xfs_free_extent+0x74/0x1ec
xfs_extent_free_finish_item+0xcc/0x214
xfs_defer_finish_one+0x194/0x388
xfs_defer_finish_noroll+0x1b4/0x5c8
xfs_defer_finish+0x2c/0xc4
xfs_bunmapi_range+0xa4/0x100
xfs_itruncate_extents_flags+0x1b8/0x2f4
xfs_inactive_truncate+0xe0/0x124
xfs_inactive+0x30c/0x3e0
xfs_inodegc_worker+0x140/0x234
process_scheduled_works+0x240/0x57c
worker_thread+0x198/0x468
kthread+0x138/0x140
start_kernel_thread+0x14/0x18
run fstests generic/175 at 2024-02-02 04:40:21
[ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0
NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
Call Trace:
0xc000000014107c00 (unreliable)
__xfs_btree_updkeys+0x8c/0x2c0 [xfs]
xfs_btree_update_keys+0x150/0x170 [xfs]
xfs_btree_lshift+0x534/0x660 [xfs]
xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
xfs_btree_insrec+0x4e4/0x630 [xfs]
xfs_btree_insert+0x104/0x2d0 [xfs]
xfs_rmap_insert+0xc4/0x260 [xfs]
xfs_rmap_map_shared+0x228/0x630 [xfs]
xfs_rmap_finish_one+0x2d4/0x350 [xfs]
xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
__xfs_trans_commit+0x1f4/0x400 [xfs]
xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
xfs_reflink_remap_blocks+0x154/0x320 [xfs]
xfs_file_remap_range+0x138/0x3a0 [xfs]
do_clone_file_range+0x11c/0x2f0
vfs_clone_file_range+0x60/0x1c0
ioctl_file_clone+0x78/0x140
sys_ioctl+0x934/0x1270
system_call_exception+0x158/0x320
system_call_vectored_common+0x15c/0x2ec
Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix io_uring based write-through after converting cifs to use the
netfs library
- Fix aio error handling when doing write-through via netfs library
- Fix performance regression in iomap when used with non-large folio
mappings
- Fix signalfd error code
- Remove obsolete comment in signalfd code
- Fix async request indication in netfs_perform_write() by raising
BDP_ASYNC when IOCB_NOWAIT is set
- Yield swap device immediately to prevent spurious EBUSY errors
- Don't cross a .backup mountpoint from backup volumes in afs to avoid
infinite loops
- Fix a race between umount and async request completion in 9p after 9p
was converted to use the netfs library
* tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs, 9p: Fix race between umount and async request completion
afs: Don't cross .backup mountpoint from backup volume
swap: yield device immediately
netfs: Fix setting of BDP_ASYNC from iocb flags
signalfd: drop an obsolete comment
signalfd: fix error return code
iomap: fault in smaller chunks for non-large folio mappings
filemap: add helper mapping_max_folio_size()
netfs: Fix AIO error handling when doing write-through
netfs: Fix io_uring based write-through
|
|
There's a problem in 9p's interaction with netfslib whereby a crash occurs
because the 9p_fid structs get forcibly destroyed during client teardown
(without paying attention to their refcounts) before netfslib has finished
with them. However, it's not a simple case of deferring the clunking that
p9_fid_put() does as that requires the p9_client record to still be
present.
The problem is that netfslib has to unlock pages and clear the IN_PROGRESS
flag before destroying the objects involved - including the fid - and, in
any case, nothing checks to see if writeback completed barring looking at
the page flags.
Fix this by keeping a count of outstanding I/O requests (of any type) and
waiting for it to quiesce during inode eviction.
Reported-by: syzbot+df038d463cca332e8414@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000005be0aa061846f8d6@google.com/
Reported-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000b86c5e06130da9c6@google.com/
Reported-by: syzbot+1527696d41a634cc1819@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000041f960618206d7e@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/755891.1716560771@warthog.procyon.org.uk
Tested-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Hillf Danton <hdanton@sina.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Reported-and-tested-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Don't open-code what the kernel already provides.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
An internal user complained about log recovery failing on a symlink
("Bad dinode after recovery") with the following (excerpted) format:
core.magic = 0x494e
core.mode = 0120777
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.nextents = 1
core.size = 297
core.nblocks = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,12,1,0]
This is a symbolic link with a 297-byte target stored in a disk block,
which is to say this is a symlink with a remote target. The forkoff is
0, which is to say that there's 512 - 176 == 336 bytes in the inode core
to store the data fork.
Eventually, testing of generic/388 failed with the same inode corruption
message during inode recovery. In writing a debugging patch to call
xfs_dinode_verify on dirty inode log items when we're committing
transactions, I observed that xfs/298 can reproduce the problem quite
quickly.
xfs/298 creates a symbolic link, adds some extended attributes, then
deletes them all. The test failure occurs when the final removexattr
also deletes the attr fork because that does not convert the remote
symlink back into a shortform symlink. That is how we trip this test.
The only reason why xfs/298 only triggers with the debug patch added is
that it deletes the symlink, so the final iflush shows the inode as
free.
I wrote a quick fstest to emulate the behavior of xfs/298, except that
it leaves the symlinks on the filesystem after inducing the "corrupt"
state. Kernels going back at least as far as 4.18 have written out
symlink inodes in this manner and prior to 1eb70f54c445f they did not
object to reading them back in.
Because we've been writing out inodes this way for quite some time, the
only way to fix this is to relax the check for symbolic links.
Directories don't have this problem because di_size is bumped to
blocksize during the sf->data conversion.
Fixes: 1eb70f54c445f ("xfs: validate inode fork size against fork format")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
When we were converting the attr code to use an explicit operation code
instead of keying off of attr->value being null, we forgot to change the
code that initializes the transaction reservation. Split the function
into two helpers that handle the !remove and remove cases, then fix both
callsites to handle this correctly.
Fixes: c27411d4c640 ("xfs: make attr removal an explicit operation")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Chandan Babu reports the following livelock in xfs/708:
run fstests xfs/708 at 2024-05-04 15:35:29
XFS (loop16): EXPERIMENTAL online scrub feature in use. Use at your own risk!
XFS (loop5): Mounting V5 Filesystem e96086f0-a2f9-4424-a1d5-c75d53d823be
XFS (loop5): Ending clean mount
XFS (loop5): Quotacheck needed: Please wait.
XFS (loop5): Quotacheck: Done.
XFS (loop5): EXPERIMENTAL online scrub feature in use. Use at your own risk!
INFO: task xfs_io:143725 blocked for more than 122 seconds.
Not tainted 6.9.0-rc4+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:xfs_io state:D stack:0 pid:143725 tgid:143725 ppid:117661 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x69c/0x17a0
schedule+0x74/0x1b0
io_schedule+0xc4/0x140
folio_wait_bit_common+0x254/0x650
shmem_undo_range+0x9d5/0xb40
shmem_evict_inode+0x322/0x8f0
evict+0x24e/0x560
__dentry_kill+0x17d/0x4d0
dput+0x263/0x430
__fput+0x2fc/0xaa0
task_work_run+0x132/0x210
get_signal+0x1a8/0x1910
arch_do_signal_or_restart+0x7b/0x2f0
syscall_exit_to_user_mode+0x1c2/0x200
do_syscall_64+0x72/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The shmem code is trying to drop all the folios attached to a shmem
file and gets stuck on a locked folio after a bnobt repair. It looks
like the process has a signal pending, so I started looking for places
where we lock an xfile folio and then deal with a fatal signal.
I found a bug in xfarray_sort_scan via code inspection. This function
is called to set up the scanning phase of a quicksort operation, which
may involve grabbing a locked xfile folio. If we exit the function with
an error code, the caller does not call xfarray_sort_scan_done to put
the xfile folio. If _sort_scan returns an error code while si->folio is
set, we leak the reference and never unlock the folio.
Therefore, change xfarray_sort to call _scan_done on exit. This is safe
to call multiple times because it sets si->folio to NULL and ignores a
NULL si->folio. Also change _sort_scan to use an intermediate variable
so that we never pollute si->folio with an errptr.
Fixes: 232ea052775f9 ("xfs: enable sorting of xfile-backed arrays")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
In both xfs_alloc_cur_finish() and xfs_alloc_ag_vextent_exact(), local
variable @afg is tagged as __maybe_unused. Otherwise an unused variable
warning would be generated for when building with W=1 and CONFIG_XFS_DEBUG
unset. In both cases, the variable is unused as it is only referenced in
an ASSERT() call, which is compiled out (in this config).
It is generally a poor programming style to use __maybe_unused for
variables.
The ASSERT() call is to verify that agbno of the end of the extent is
within bounds for both functions. @afg is used as an intermediate variable
to find the AG length.
However xfs_verify_agbext() already exists to verify a valid extent range.
The arguments for calling xfs_verify_agbext() are already available, so use
that instead.
An advantage of using xfs_verify_agbext() is that it verifies that both the
start and the end of the extent are within the bounds of the AG and
catches overflows.
Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
For CONFIG_XFS_DEBUG unset, xfs_iwalk_run_callbacks() generates the
following warning for when building with W=1:
fs/xfs/xfs_iwalk.c: In function ‘xfs_iwalk_run_callbacks’:
fs/xfs/xfs_iwalk.c:354:42: error: variable ‘irec’ set but not used [-Werror=unused-but-set-variable]
354 | struct xfs_inobt_rec_incore *irec;
| ^~~~
cc1: all warnings being treated as errors
Drop @irec, as it is only an intermediate variable.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Fix the 'make W=1' warnings:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/smb/common/cifs_arc4.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/smb/common/cifs_md4.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
copy_page_from_iter_atomic() will be removed at some point.
Also fixup a comment for folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Superblock downgrade entries are only two byte aligned, but section
sizes are 8 byte aligned, which means we have to be careful about
overrun checks; an entry that crosses the end of the section is allowed
(and ignored) as long as it has zero errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+a8074a75b8d73328751e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull smb client fixes from Steve French:
- two important netfs integration fixes - including for a data
corruption and also fixes for multiple xfstests
- reenable swap support over SMB3
* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix missing set of remote_i_size
cifs: Fix smb3_insert_range() to move the zero_point
cifs: update internal version number
smb3: reenable swapfiles over SMB3 mounts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests
fixes, various singletons fixing various issues in various parts"
* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/ksm: fix possible UAF of stable_node
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
nilfs2: fix potential hang in nilfs_detach_log_writer()
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
nilfs2: fix use-after-free of timer for log writer thread
selftests/mm: fix build warnings on ppc64
arm64: patching: fix handling of execmem addresses
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
selftests/mm: compaction_test: fix bogus test success on Aarch64
mailmap: update email address for Satya Priya
mm/huge_memory: don't unpoison huge_zero_folio
kasan, fortify: properly rename memintrinsics
lib: add version into /proc/allocinfo output
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
|
|
Pull ceph updates from Ilya Dryomov:
"A series from Xiubo that adds support for additional access checks
based on MDS auth caps which were recently made available to clients.
This is needed to prevent scenarios where the MDS quietly discards
updates that a UID-restricted client previously (wrongfully) acked to
the user.
Other than that, just a documentation fixup"
* tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
doc: ceph: update userspace command to get CephFS metadata
ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
ceph: check the cephx mds auth access for async dirop
ceph: check the cephx mds auth access for open
ceph: check the cephx mds auth access for setattr
ceph: add ceph_mds_check_access() helper
ceph: save cap_auths in MDS client when session is opened
|
|
https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Fixes:
- reusing of the file index (could cause the file to be trimmed)
- infinite dir enumeration
- taking DOS names into account during link counting
- le32_to_cpu conversion, 32 bit overflow, NULL check
- some code was refactored
Changes:
- removed max link count info display during driver init
Remove:
- atomic_open has been removed for lack of use"
* tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Break dir enumeration if directory contents error
fs/ntfs3: Fix case when index is reused during tree transformation
fs/ntfs3: Mark volume as dirty if xattr is broken
fs/ntfs3: Always make file nonresident on fallocate call
fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
fs/ntfs3: Use variable length array instead of fixed size
fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
fs/ntfs3: Check 'folio' pointer for NULL
fs/ntfs3: Missed le32_to_cpu conversion
fs/ntfs3: Remove max link count info display du |