Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit e739b12042b6b079a397a3c234f96c09d1de0b40 ]
This patch fixes Dan Carpenter's report that the static checker
found a problem where memcpy() was copying into too small of a buffer.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit eb162e1772f85231dabc789fb4bfea63d2d9df79 ]
linux/fs/nfsd/nfs4proc.c:1542:24: warning: incorrect type in assignment (different base types)
linux/fs/nfsd/nfs4proc.c:1542:24: expected restricted __be32 [assigned] [usertype] status
linux/fs/nfsd/nfs4proc.c:1542:24: got int
Clean-up: The dup_copy_fields() function returns only zero, so make
it return void for now, and get rid of the return code check.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7b279bbfd2b230c7a210ff8f405799c7e46bbf48 upstream.
Smatch complains about missing that the ovl_override_creds() doesn't
have a matching revert_creds() if the dentry is disconnected. Fix this
by moving the ovl_override_creds() until after the disconnected check.
Fixes: aa3ff3c152ff ("ovl: copy up of disconnected dentries")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1f82808877bb10d3deee7cf3374a4eb3fb582db upstream.
Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on
that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS.
Truncate those lengths when doing io_add_buffers, so buffer addresses still
use the uncapped length.
Also, take the chance and change struct io_buffer len member to __u32, so
it matches struct io_provide_buffer len member.
This fixes CVE-2021-3491, also reported as ZDI-CAN-13546.
Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Reported-by: Billy Jheng Bing-Jhong (@st424204)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5899593f51e63dde2f07c67358bd65a641585abb upstream.
Eric has noticed that after pagecache read rework, generic/418 is
occasionally failing for ext4 when blocksize < pagesize. In fact, the
pagecache rework just made hard to hit race in ext4 more likely. The
problem is that since ext4 conversion of direct IO writes to iomap
framework (commit 378f32bab371), we update inode size after direct IO
write only after invalidating page cache. Thus if buffered read sneaks
at unfortunate moment like:
CPU1 - write at offset 1k CPU2 - read from offset 0
iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT);
ext4_readpage();
ext4_handle_inode_extension()
the read will zero out tail of the page as it still sees smaller inode
size and thus page cache becomes inconsistent with on-disk contents with
all the consequences.
Fix the problem by moving inode size update into end_io handler which
gets called before the page cache is invalidated.
Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com>
Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4811d9929cdae4238baf5b2522247bd2f9fa7b50 upstream.
This is needed to allow generic/607 to pass for file systems with the
inline data_feature enabled, and it allows the use of file systems
where the directories use inline_data, while the files are accessed
via DAX.
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1262cd2e68a0870fb9fc95eb202d22e8f0074b7 upstream.
In case of if not ext4_fc_add_tlv branch, an error return code is missing.
Cc: stable@kernel.org
Fixes: aa75f4d3daae ("ext4: main fast-commit commit path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Yihang <xuyihang@huawei.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6810fad956df9e5467e8e8a5ac66fda0836c71fa upstream.
Fix As write_mmp_block() so that it returns -EIO instead of 1, so that
the correct error gets saved into the superblock.
Cc: stable@kernel.org
Fixes: 54d3adbc29f0 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()")
Reported-by: Liu Zhi Qiang <liuzhiqiang26@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f88f1466e2a2e5ca17dfada436d3efa1b03a3972 upstream.
We should set the error code when ext4_commit_super check argument failed.
Found in code review.
Fixes: c4be0c1dc4cdc ("filesystem freeze: add error handling of write_super_lockfs/unlockfs").
Cc: stable@kernel.org
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 72ffb49a7b623c92a37657eda7cc46a06d3e8398 upstream.
When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due
to some error happens behind ext4_orphan_cleanup(), it will end up
triggering a after free issue of super_block. The problem is that
ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is
enabled, after we cleanup the truncated inodes, the last iput() will put
them into the lru list, and these inodes' pages may probably dirty and
will be write back by the writeback thread, so it could be raced by
freeing super_block in the error path of mount_bdev().
After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it
was used to ensure updating the quota file properly, but evict inode and
trash data immediately in the last iput does not affect the quotafile,
so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by
just remove the SB_ACTIVE setting.
[1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a149d2a5cabbf6507a7832a1c4fd2593c55fd450 upstream.
Commit <50122847007> ("ext4: fix check to prevent initializing reserved
inodes") check the block group zero and prevent initializing reserved
inodes. But in some special cases, the reserved inode may not all belong
to the group zero, it may exist into the second group if we format
filesystem below.
mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda
So, it will end up triggering a false positive report of a corrupted
file system. This patch fix it by avoid check reserved inodes if no free
inode blocks will be zeroed.
Cc: stable@kernel.org
Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 83fe6b18b8d04c6c849379005e1679bac9752466 upstream.
Assertion checks in jbd2_journal_dirty_metadata() are known to be racy
but we don't want to be grabbing locks just for them. We thus recheck
them under b_state_lock only if it looks like they would fail. Annotate
the checks with data_race().
Cc: stable@kernel.org
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3b1833e92baba135923af4a07e73fe6e54be5a2f upstream.
Access to journal->j_running_transaction is not protected by appropriate
lock and thus is racy. We are well aware of that and the code handles
the race properly. Just add a comment and data_race() annotation.
Cc: stable@kernel.org
Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9c2dc11df50d1c8537075ff6b98472198e24438e upstream.
We were ignoring CAP_MULTI_CHANNEL in the server response - if the
server doesn't support multichannel we should not be attempting it.
See MS-SMB2 section 3.2.5.2
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 679971e7213174efb56abc8fab1299d0a88db0e8 upstream.
In the SMB3/SMB3.1.1 negotiate protocol request, we are supposed to
advertise CAP_MULTICHANNEL capability when establishing multiple
channels has been requested by the user doing the mount. See MS-SMB2
sections 2.2.3 and 3.2.5.2
Without setting it there is some risk that multichannel could fail
if the server interpreted the field strictly.
Reviewed-By: Tom Talpey <tom@talpey.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90ada91f4610c5ef11bc52576516d96c496fc3f1 upstream.
KASAN reports a BUG when download file in jffs2 filesystem.It is
because when dstlen == 1, cpage_out will write array out of bounds.
Actually, data will not be compressed in jffs2_zlib_compress() if
data's length less than 4.
[ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281
[ 393.809166] Write of size 1 by task tftp/2918
[ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1
[ 393.823173] Hardware name: LS1043A RDB Board (DT)
[ 393.827870] Call trace:
[ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0
[ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20
[ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0
[ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80
[ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8
[ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40
[ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58
[ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0
[ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0
[ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478
[ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450
[ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0
[ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280
[ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228
[ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288
[ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238
[ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238
[ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110
[ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4
[ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64
[ 393.944197] Allocated:
[ 393.946552] PID = 2918
[ 393.948913] save_stack_trace_tsk+0x0/0x220
[ 393.953096] save_stack_trace+0x18/0x20
[ 393.956932] kasan_kmalloc+0xd8/0x188
[ 393.960594] __kmalloc+0x144/0x238
[ 393.963994] jffs2_selected_compress+0x48/0x2a0
[ 393.968524] jffs2_compress+0x58/0x478
[ 393.972273] jffs2_write_inode_range+0x13c/0x450
[ 393.976889] jffs2_write_end+0x2a8/0x4a0
[ 393.980810] generic_perform_write+0x1c0/0x280
[ 393.985251] __generic_file_write_iter+0x1c4/0x228
[ 393.990040] generic_file_write_iter+0x138/0x288
[ 393.994655] __vfs_write+0x1b4/0x238
[ 393.998228] vfs_write+0xd0/0x238
[ 394.001543] SyS_write+0xa0/0x110
[ 394.004856] __sys_trace_return+0x0/0x4
[ 394.008684] Freed:
[ 394.010691] PID = 2918
[ 394.013051] save_stack_trace_tsk+0x0/0x220
[ 394.017233] save_stack_trace+0x18/0x20
[ 394.021069] kasan_slab_free+0x88/0x188
[ 394.024902] kfree+0x6c/0x1d8
[ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880
[ 394.032486] jffs2_do_reserve_space+0x198/0x598
[ 394.037016] jffs2_reserve_space+0x3f8/0x4d8
[ 394.041286] jffs2_write_inode_range+0xf0/0x450
[ 394.045816] jffs2_write_end+0x2a8/0x4a0
[ 394.049737] generic_perform_write+0x1c0/0x280
[ 394.054179] __generic_file_write_iter+0x1c4/0x228
[ 394.058968] generic_file_write_iter+0x138/0x288
[ 394.063583] __vfs_write+0x1b4/0x238
[ 394.067157] vfs_write+0xd0/0x238
[ 394.070470] SyS_write+0xa0/0x110
[ 394.073783] __sys_trace_return+0x0/0x4
[ 394.077612] Memory state around the buggy address:
[ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 394.104056] ^
[ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 394.121718] ==================================================================
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77edfc6e51055b61cae2f54c8e6c3bb7c762e4fe upstream.
If mounted with discard option, exFAT issues discard command when clear
cluster bit to remove file. But the input parameter of cluster-to-sector
calculation is abnormally added by reserved cluster size which is 2,
leading to discard unrelated sectors included in target+2 cluster.
With fixing this, remove the wrong comments in set/clear/find bitmap
functions.
Fixes: 1e49a94cf707 ("exfat: add bitmap operations")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 upstream.
There are two modes for write(2) and friends in fuse:
a) write through (update page cache, send sync WRITE request to userspace)
b) buffered write (update page cache, async writeout later)
The write through method kept all the page cache pages locked that were
used for the request. Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.
The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.
For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.
Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.
If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data. After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data. While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.
In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.
Partial write of a not uptodate page still needs to be handled. One way
would be to read the complete page before doing the write. This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.
The other solution is to serialize the synchronous write with reads from
the partial pages. The easiest way to do this is to keep the partial pages
locked. The problem is that a write() may involve two such pages (one head
and one tail). This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 42984af09afc414d540fcc8247f42894b0378a91 upstream.
overlayfs using jffs2 as the upper filesystem would fail in some cases
since moving to v5.10. The test case used was to run 'touch' on a file
that exists in the lower fs, causing the modification time to be
updated. It returns EINVAL when the bug is triggered.
A bisection showed this was introduced in v5.9-rc1, with commit
36e2c7421f02 ("fs: don't allow splice read/write without explicit ops").
Reverting that commit restores the expected behaviour.
Some digging showed that this was due to jffs2 lacking an implementation
of splice_write. (For unknown reasons the warn_unsupported that should
trigger was not displaying any output).
Adding this patch resolved the issue and the test now passes.
Cc: stable@vger.kernel.org
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Lei YU <yulei.sh@bytedance.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 960b9a8a7676b9054d8b46a2c7db52a0c8766b56 upstream.
KASAN report a slab-out-of-bounds problem. The logs are listed below.
It is because in function jffs2_scan_dirent_node, we alloc "checkedlen+1"
bytes for fd->name and we check crc with length rd->nsize. If checkedlen
is less than rd->nsize, it will cause the slab-out-of-bounds problem.
jffs2: Dirent at *** has zeroes in name. Truncating to %d char
==================================================================
BUG: KASAN: slab-out-of-bounds in crc32_le+0x1ce/0x260 at addr ffff8800842cf2d1
Read of size 1 by task test_JFFS2/915
=============================================================================
BUG kmalloc-64 (Tainted: G B O ): kasan: bad access detected
-----------------------------------------------------------------------------
INFO: Allocated in jffs2_alloc_full_dirent+0x2a/0x40 age=0 cpu=1 pid=915
___slab_alloc+0x580/0x5f0
__slab_alloc.isra.24+0x4e/0x64
__kmalloc+0x170/0x300
jffs2_alloc_full_dirent+0x2a/0x40
jffs2_scan_eraseblock+0x1ca4/0x3b64
jffs2_scan_medium+0x285/0xfe0
jffs2_do_mount_fs+0x5fb/0x1bbc
jffs2_do_fill_super+0x245/0x6f0
jffs2_fill_super+0x287/0x2e0
mount_mtd_aux.isra.0+0x9a/0x144
mount_mtd+0x222/0x2f0
jffs2_mount+0x41/0x60
mount_fs+0x63/0x230
vfs_kern_mount.part.6+0x6c/0x1f4
do_mount+0xae8/0x1940
SyS_mount+0x105/0x1d0
INFO: Freed in jffs2_free_full_dirent+0x22/0x40 age=27 cpu=1 pid=915
__slab_free+0x372/0x4e4
kfree+0x1d4/0x20c
jffs2_free_full_dirent+0x22/0x40
jffs2_build_remove_unlinked_inode+0x17a/0x1e4
jffs2_do_mount_fs+0x1646/0x1bbc
jffs2_do_fill_super+0x245/0x6f0
jffs2_fill_super+0x287/0x2e0
mount_mtd_aux.isra.0+0x9a/0x144
mount_mtd+0x222/0x2f0
jffs2_mount+0x41/0x60
mount_fs+0x63/0x230
vfs_kern_mount.part.6+0x6c/0x1f4
do_mount+0xae8/0x1940
SyS_mount+0x105/0x1d0
entry_SYSCALL_64_fastpath+0x1e/0x97
Call Trace:
[<ffffffff815befef>] dump_stack+0x59/0x7e
[<ffffffff812d1d65>] print_trailer+0x125/0x1b0
[<ffffffff812d82c8>] object_err+0x34/0x40
[<ffffffff812dadef>] kasan_report.part.1+0x21f/0x534
[<ffffffff81132401>] ? vprintk+0x2d/0x40
[<ffffffff815f1ee2>] ? crc32_le+0x1ce/0x260
[<ffffffff812db41a>] kasan_report+0x26/0x30
[<ffffffff812d9fc1>] __asan_load1+0x3d/0x50
[<ffffffff815f1ee2>] crc32_le+0x1ce/0x260
[<ffffffff814764ae>] ? jffs2_alloc_full_dirent+0x2a/0x40
[<ffffffff81485cec>] jffs2_scan_eraseblock+0x1d0c/0x3b64
[<ffffffff81488813>] ? jffs2_scan_medium+0xccf/0xfe0
[<ffffffff81483fe0>] ? jffs2_scan_make_ino_cache+0x14c/0x14c
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff812d5d90>] ? kmem_cache_alloc_trace+0x10c/0x2cc
[<ffffffff818169fb>] ? mtd_point+0xf7/0x130
[<ffffffff81487dc9>] jffs2_scan_medium+0x285/0xfe0
[<ffffffff81487b44>] ? jffs2_scan_eraseblock+0x3b64/0x3b64
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff812d57df>] ? __kmalloc+0x12b/0x300
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff814a2753>] ? jffs2_sum_init+0x9f/0x240
[<ffffffff8148b2ff>] jffs2_do_mount_fs+0x5fb/0x1bbc
[<ffffffff8148ad04>] ? jffs2_del_noinode_dirent+0x640/0x640
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff81127c5b>] ? __init_rwsem+0x97/0xac
[<ffffffff81492349>] jffs2_do_fill_super+0x245/0x6f0
[<ffffffff81493c5b>] jffs2_fill_super+0x287/0x2e0
[<ffffffff814939d4>] ? jffs2_parse_options+0x594/0x594
[<ffffffff81819bea>] mount_mtd_aux.isra.0+0x9a/0x144
[<ffffffff81819eb6>] mount_mtd+0x222/0x2f0
[<ffffffff814939d4>] ? jffs2_parse_options+0x594/0x594
[<ffffffff81819c94>] ? mount_mtd_aux.isra.0+0x144/0x144
[<ffffffff81258757>] ? free_pages+0x13/0x1c
[<ffffffff814fa0ac>] ? selinux_sb_copy_data+0x278/0x2e0
[<ffffffff81492b35>] jffs2_mount+0x41/0x60
[<ffffffff81302fb7>] mount_fs+0x63/0x230
[<ffffffff8133755f>] ? alloc_vfsmnt+0x32f/0x3b0
[<ffffffff81337f2c>] vfs_kern_mount.part.6+0x6c/0x1f4
[<ffffffff8133ceec>] do_mount+0xae8/0x1940
[<ffffffff811b94e0>] ? audit_filter_rules.constprop.6+0x1d10/0x1d10
[<ffffffff8133c404>] ? copy_mount_string+0x40/0x40
[<ffffffff812cbf78>] ? alloc_pages_current+0xa4/0x1bc
[<ffffffff81253a89>] ? __get_free_pages+0x25/0x50
[<ffffffff81338993>] ? copy_mount_options.part.17+0x183/0x264
[<ffffffff8133e3a9>] SyS_mount+0x105/0x1d0
[<ffffffff8133e2a4>] ? copy_mnt_ns+0x560/0x560
[<ffffffff810e8391>] ? msa_space_switch_handler+0x13d/0x190
[<ffffffff81be184a>] entry_SYSCALL_64_fastpath+0x1e/0x97
[<ffffffff810e9274>] ? msa_space_switch+0xb0/0xe0
Memory state around the buggy address:
ffff8800842cf180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800842cf200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8800842cf280: fc fc fc fc fc fc 00 00 00 00 01 fc fc fc fc fc
^
ffff8800842cf300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800842cf380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Cc: stable@vger.kernel.org
Reported-by: Kunkun Xu <xukunkun1@huawei.com>
Signed-off-by: lizhe <lizhe67@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit de144ff4234f935bd2150108019b5d87a90a8a96 upstream.
If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).
Fixes: 6d597e175012 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 39fd01863616964f009599e50ca5c6ea9ebf88d6 upstream.
If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).
Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c09f11ef35955785f92369e25819bf0629df2e59 upstream.
Fix shift out-of-bounds in xprt_calc_majortimeo(). This is caused
by a garbage timeout (retrans) mount option being passed to nfs mount,
in this case from syzkaller.
If the protocol is XPRT_TRANSPORT_UDP, then 'retrans' is a shift
value for a 64-bit long integer, so 'retrans' cannot be >= 64.
If it is >= 64, fail the mount and return an error.
Fixes: 9954bf92c0cd ("NFS: Move mount parameterisation bits into their own file")
Reported-by: syzbot+ba2e91df8f74809417fa@syzkaller.appspotmail.com
Reported-by: syzbot+f3a0fa110fd630ab56c8@syzkaller.appspotmail.com
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b862676e371715456c9dade7990c8004996d0d9e upstream.
butt3rflyh4ck <butterflyhuangxx@gmail.com> reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:
dump_stack+0xfa/0x151 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
current_nat_addr fs/f2fs/node.h:213 [inline]
get_next_nat_page fs/f2fs/node.c:123 [inline]
__flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
__sync_filesystem fs/sync.c:39 [inline]
sync_filesystem fs/sync.c:67 [inline]
sync_filesystem+0x1b5/0x260 fs/sync.c:48
generic_shutdown_super+0x70/0x370 fs/super.c:448
kill_block_super+0x97/0xf0 fs/super.c:1394
The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.
[1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x43s2g@mail.gmail.com/T/#u
Reported-and-tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c0315424f5e3d2a4113c7272367bee1e8e6a174 upstream.
f2fs didn't properly clean up if verity failed to be enabled on a file:
- It left verity metadata (pages past EOF) in the page cache, which
would be exposed to userspace if the file was later extended.
- It didn't truncate the verity metadata at all (either from cache or
from disk) if an error occurred while setting the verity bit.
Fix these bugs by adding a call to truncate_inode_pages() and ensuring
that we truncate the verity metadata (both from cache and from disk) in
all error paths. Also rework the code to cleanly separate the success
path from the error paths, which makes it much easier to understand.
Finally, log a message if f2fs_truncate() fails, since it might
otherwise fail silently.
Reported-by: Yunlei He <heyunlei@hihonor.com>
Fixes: 95ae251fe828 ("f2fs: add fs-verity support")
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3e903315790baf4a966436e7f32e9c97864570ac upstream.
Conside the following case, it just write a big file into flash,
when complete writing, delete the file, and then power off promptly.
Next time power on, we'll get a replay list like:
...
LEB 1105:211344 len 4144 deletion 0 sqnum 428783 key type 1 inode 80
LEB 15:233544 len 160 deletion 1 sqnum 428785 key type 0 inode 80
LEB 1105:215488 len 4144 deletion 0 sqnum 428787 key type 1 inode 80
...
In the replay list, data nodes' deletion are 0, and the inode node's
deletion is 1. In current logic, the file's dentry will be removed,
but inode and the flash space it occupied will be reserved.
User will see that much free space been disappeared.
We only need to check the deletion value of the following inode type
node of the replay entry.
Fixes: e58725d51fa8 ("ubifs: Handle re-linking of inodes correctly while recovery")
Cc: stable@vger.kernel.org
Signed-off-by: Guochun Mao <guochun.mao@mediatek.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c79c5e0178922a9e092ec8fed026750f39dcaef4 upstream.
When accidentally passing twice the same tag to qemu, kmemleak ended up
reporting a memory leak in virtiofs. Also, looking at the log I saw the
following error (that's when I realised the duplicated tag):
virtiofs: probe of virtio5 failed with error -17
Here's the kmemleak log for reference:
unreferenced object 0xffff888103d47800 (size 1024):
comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s)
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff ................
backtrace:
[<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs]
[<00000000f8aca419>] virtio_dev_probe+0x15f/0x210
[<000000004d6baf3c>] really_probe+0xea/0x430
[<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0
[<00000000196f47a7>] __driver_attach+0x98/0x140
[<000000000b20601d>] bus_for_each_dev+0x7b/0xc0
[<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0
[<0000000032b09ba7>] driver_register+0x8f/0xe0
[<00000000cdd55998>] 0xffffffffa002c013
[<000000000ea196a2>] do_one_initcall+0x64/0x2e0
[<0000000008f727ce>] do_init_module+0x5c/0x260
[<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120
[<00000000ad2f48c6>] do_syscall_64+0x33/0x40
[<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5afa7e8b70d65819245fece61a65fd753b4aae33 upstream.
statx(2) notes that any attribute that is not indicated as supported
by stx_attributes_mask has no usable value. Commits 801e523796004
("fs: move generic stat response attr handling to vfs_getattr_nosec")
and 712b2698e4c02 ("fs/stat: Define DAX statx attribute") sets
STATX_ATTR_AUTOMOUNT and STATX_ATTR_DAX, respectively, without setting
stx_attributes_mask, which can cause xfstests generic/532 to fail.
Fix this in the same way as commit 1b9598c8fb99 ("xfs: fix reporting
supported extra file attributes for statx()")
Fixes: 801e523796004 ("fs: move generic stat response attr handling to vfs_getattr_nosec")
Fixes: 712b2698e4c02 ("fs/stat: Define DAX statx attribute")
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f9690f426b2134cc3e74bfc5d9dfd6a4b2ca5281 ]
Commit dbcc7d57bffc0c ("btrfs: fix race when cloning extent buffer during
rewind of an old root"), fixed a race when we need to rewind the extent
buffer of an old root. It was caused by picking a new mod log operation
for the extent buffer while getting a cloned extent buffer with an outdated
number of items (off by -1), because we cloned the extent buffer without
locking it first.
However there is still another similar race, but in the opposite direction.
The cloned extent buffer has a number of items that does not match the
number of tree mod log operations that are going to be replayed. This is
because right after we got the last (most recent) tree mod log operation to
replay and before locking and cloning the extent buffer, another task adds
a new pointer to the extent buffer, which results in adding a new tree mod
log operation and incrementing the number of items in the extent buffer.
So after cloning we have mismatch between the number of items in the extent
buffer and the number of mod log operations we are going to apply to it.
This results in hitting a BUG_ON() that produces the following stack trace:
------------[ cut here ]------------
kernel BUG at fs/btrfs/tree-mod-log.c:675!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 3 PID: 4811 Comm: crawl_1215 Tainted: G W 5.12.0-7d1efdf501f8-misc-next+ #99
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:tree_mod_log_rewind+0x3b1/0x3c0
Code: 05 48 8d 74 10 (...)
RSP: 0018:ffffc90001027090 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880a8514600 RCX: ffffffffaa9e59b6
RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8880a851462c
RBP: ffffc900010270e0 R08: 00000000000000c0 R09: ffffed1004333417
R10: ffff88802199a0b7 R11: ffffed1004333416 R12: 000000000000000e
R13: ffff888135af8748 R14: ffff88818766ff00 R15: ffff8880a851462c
FS: 00007f29acf62700(0000) GS:ffff8881f2200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0e6013f718 CR3: 000000010d42e003 CR4: 0000000000170ee0
Call Trace:
btrfs_get_old_root+0x16a/0x5c0
? lock_downgrade+0x400/0x400
btrfs_search_old_slot+0x192/0x520
? btrfs_search_slot+0x1090/0x1090
? free_extent_buffer.part.61+0xd7/0x140
? free_extent_buffer+0x13/0x20
resolve_indirect_refs+0x3e9/0xfc0
? lock_downgrade+0x400/0x400
? __kasan_check_read+0x11/0x20
? add_prelim_ref.part.11+0x150/0x150
? lock_downgrade+0x400/0x400
? __kasan_check_read+0x11/0x20
? lock_acquired+0xbb/0x620
? __kasan_check_write+0x14/0x20
? do_raw_spin_unlock+0xa8/0x140
? rb_insert_color+0x340/0x360
? prelim_ref_insert+0x12d/0x430
find_parent_nodes+0x5c3/0x1830
? stack_trace_save+0x87/0xb0
? resolve_indirect_refs+0xfc0/0xfc0
? fs_reclaim_acquire+0x67/0xf0
? __kasan_check_read+0x11/0x20
? lockdep_hardirqs_on_prepare+0x210/0x210
? fs_reclaim_acquire+0x67/0xf0
? __kasan_check_read+0x11/0x20
? ___might_sleep+0x10f/0x1e0
? __kasan_kmalloc+0x9d/0xd0
? trace_hardirqs_on+0x55/0x120
btrfs_find_all_roots_safe+0x142/0x1e0
? find_parent_nodes+0x1830/0x1830
? trace_hardirqs_on+0x55/0x120
? ulist_free+0x1f/0x30
? btrfs_inode_flags_to_xflags+0x50/0x50
iterate_extent_inodes+0x20e/0x580
? tree_backref_for_extent+0x230/0x230
? release_extent_buffer+0x225/0x280
? read_extent_buffer+0xdd/0x110
? lock_downgrade+0x400/0x400
? __kasan_check_read+0x11/0x20
? lock_acquired+0xbb/0x620
? __kasan_check_write+0x14/0x20
? do_raw_spin_unlock+0xa8/0x140
? _raw_spin_unlock+0x22/0x30
? release_extent_buffer+0x225/0x280
iterate_inodes_from_logical+0x129/0x170
? iterate_inodes_from_logical+0x129/0x170
? btrfs_inode_flags_to_xflags+0x50/0x50
? iterate_extent_inodes+0x580/0x580
? __vmalloc_node+0x92/0xb0
? init_data_container+0x34/0xb0
? init_data_container+0x34/0xb0
? kvmalloc_node+0x60/0x80
btrfs_ioctl_logical_to_ino+0x158/0x230
btrfs_ioctl+0x2038/0x4360
? __kasan_check_write+0x14/0x20
? mmput+0x3b/0x220
? btrfs_ioctl_get_supported_features+0x30/0x30
? __kasan_check_read+0x11/0x20
? __kasan_check_read+0x11/0x20
? lock_release+0xc8/0x650
? __might_fault+0x64/0xd0
? __kasan_check_read+0x11/0x20
? lock_downgrade+0x400/0x400
? lockdep_hardirqs_on_prepare+0x210/0x210
? lockdep_hardirqs_on_prepare+0x13/0x210
? _raw_spin_unlock_irqrestore+0x51/0x63
? __kasan_check_read+0x11/0x20
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? lock_downgrade+0x400/0x400
? lockdep_hardirqs_on_prepare+0x210/0x210
? __kasan_check_read+0x11/0x20
? lock_release+0xc8/0x650
? __task_pid_nr_ns+0xd3/0x250
? __kasan_check_read+0x11/0x20
? __fget_files+0x160/0x230
? __fget_light+0xf2/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f29ae85b427
Code: 00 00 90 48 8b (...)
RSP: 002b:00007f29acf5fcf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f29acf5ff40 RCX: 00007f29ae85b427
RDX: 00007f29acf5ff48 RSI: 00000000c038943b RDI: 0000000000000003
RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f29acf60120
R10: 00005640d5fc7b00 R11: 0000000000000246 R12: 0000000000000003
R13: 00007f29acf5ff48 R14: 00007f29acf5ff40 R15: 00007f29acf5fef8
Modules linked in:
---[ end trace 85e5fce078dfbe04 ]---
(gdb) l *(tree_mod_log_rewind+0x3b1)
0xffffffff819e5b21 is in tree_mod_log_rewind (fs/btrfs/tree-mod-log.c:675).
670 * the modification. As we're going backwards, we do the
671 * opposite of each operation here.
672 */
673 switch (tm->op) {
674 case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
675 BUG_ON(tm->slot < n);
676 fallthrough;
677 case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING:
678 case BTRFS_MOD_LOG_KEY_REMOVE:
679 btrfs_set_node_key(eb, &tm->key, tm->slot);
(gdb) quit
The following steps explain in more detail how it happens:
1) We have one tree mod log user (through fiemap or the logical ino ioctl),
with a sequence number of 1, so we have fs_info->tree_mod_seq == 1.
This is task A;
2) Another task is at ctree.c:balance_level() and we have eb X currently as
the root of the tree, and we promote its single child, eb Y, as the new
root.
Then, at ctree.c:balance_level(), we call:
ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
3) At btrfs_tree_mod_log_insert_root() we create a tree mod log operation
of type BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING, with a ->logical field
pointing to ebX->start. We only have one item in eb X, so we create
only one tree mod log operation, and store in the "tm_list" array;
4) Then, still at btrfs_tree_mod_log_insert_root(), we create a tree mod
log element of operation type BTRFS_MOD_LOG_ROOT_REPLACE, ->logical set
to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level
set to the level of eb X and ->generation set to the generation of eb X;
5) Then btrfs_tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
"tm_list" as argument. After that, tree_mod_log_free_eb() calls
tree_mod_log_insert(). This inserts the mod log operation of type
BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING from step 3 into the rbtree
with a sequence number of 2 (and fs_info->tree_mod_seq set to 2);
6) Then, after inserting the "tm_list" single element into the tree mod
log rbtree, the BTRFS_MOD_LOG_ROOT_REPLACE element is inserted, which
gets the sequence number 3 (and fs_info->tree_mod_seq set to 3);
7) Back to ctree.c:balance_level(), we free eb X by calling
btrfs_free_tree_block() on it. Because eb X was created in the current
transaction, has no other references and writeback did not happen for
it, we add it back to the free space cache/tree;
8) Later some other task B allocates the metadata extent from eb X, since
it is marked as free space in the space cache/tree, and uses it as a
node for some other btree;
9) The tree mod log user task calls btrfs_search_old_slot(), which calls
btrfs_get_old_root(), and finally that calls tree_mod_log_oldest_root()
with time_seq == 1 and eb_root == eb Y;
10) The first iteration of the while loop finds the tree mod log element
with sequence number 3, for the logical address of eb Y and of type
BTRFS_MOD_LOG_ROOT_REPLACE;
11) Because the operation type is BTRFS_MOD_LOG_ROOT_REPLACE, we don't
break out of the loop, and set root_logical to point to
tm->old_root.logical, which corresponds to the logical address of
eb X;
12) On the next iteration of the while loop, the call to
tree_mod_log_search_oldest() returns the smallest tree mod log element
for the logical address of eb X, which has a sequence number of 2, an
operation type of BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and
corresponds to the old slot 0 of eb X (eb X had only 1 item in it
before being freed at step 7);
13) We then break out of the while loop and return the tree mod log
operation of type BTRFS_MOD_LOG_ROOT_REPLACE (eb Y), and not the one
for slot 0 of eb X, to btrfs_get_old_root();
14) At btrfs_get_old_root(), we process the BTRFS_MOD_LOG_ROOT_REPLACE
operation and set "logical" to the logical address of eb X, which was
the old root. We then call tree_mod_log_search() passing it the logical
address of eb X and time_seq == 1;
15) But before calling tree_mod_log_search(), task B locks eb X, adds a
key to eb X, which results in adding a tree mod log operation of type
BTRFS_MOD_LOG_KEY_ADD, with a sequence number of 4, to the tree mod
log, and increments the number of items in eb X from 0 to 1.
Now fs_info->tree_mod_seq has a value of 4;
16) Task A then calls tree_mod_log_search(), which returns the most recent
tree mod log operation for eb X, which is the one just added by task B
at the previous step, with a sequence number of 4, a type of
BTRFS_MOD_LOG_KEY_ADD and for slot 0;
17) Before task A locks and clones eb X, task A adds another key to eb X,
which results in adding a new BTRFS_MOD_LOG_KEY_ADD mod log operation,
with a sequence number of 5, for slot 1 of eb X, increments the
number of items in eb X from 1 to 2, and unlocks eb X.
Now fs_info->tree_mod_seq has a value of 5;
18) Task A then locks eb X and clones it. The clone has a value of 2 for
the number of items and the pointer "tm" points to the tree mod log
operation with sequence number 4, not the most recent one with a
sequence number of 5, so there is mismatch between the number of
mod log operations that are going to be applied to the cloned version
of eb X and the number of items in the clone;
19) Task A then calls tree_mod_log_rewind() with the clone of eb X, the
tree mod log operation with sequence number 4 and a type of
BTRFS_MOD_LOG_KEY_ADD, and time_seq == 1;
20) At tree_mod_log_rewind(), we set the local variable "n" with a value
of 2, which is the number of items in the clone of eb X.
Then in the first iteration of the while loop, we process the mod log
operation with sequence number 4, which is targeted at slot 0 and has
a type of BTRFS_MOD_LOG_KEY_ADD. This results in decrementing "n" from
2 to 1.
Then we pick the next tree mod log operation for eb X, which is the
tree mod log operation with a sequence number of 2, a type of
BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and for slot 0, it is the one
added in step 5 to the tree mod log tree.
We go back to the top of the loop to process this mod log operation,
and because its slot is 0 and "n" has a value of 1, we hit the BUG_ON:
(...)
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
fallthrough;
(...)
Fix this by checking for a more recent tree mod log operation after locking
and cloning the extent buffer of the old root node, and use it as the first
operation to apply to the cloned extent buffer when rewinding it.
Stable backport notes: due to moved code and renames, in =< 5.11 the
change should be applied to ctree.c:get_old_root.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/20210404040732.GZ32440@hungrycats.org/
Fixes: 834328a8493079 ("Btrfs: tree mod log's old roots could still be part of the tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7a9213a93546e7eaef90e6e153af6b8fc7553f10 ]
A few BUG_ON()'s in replace_path are purely to keep us from making
logical mistakes, so replace them with ASSERT()'s.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Davi |