summaryrefslogtreecommitdiff
path: root/fs/jbd2/journal.c
AgeCommit message (Collapse)AuthorFilesLines
2024-10-10jbd2: correctly compare tids with tid_geq function in jbd2_fc_begin_commitKemeng Shi1-1/+1
commit f0e3c14802515f60a47e6ef347ea59c2733402aa upstream. Use tid_geq to compare tids to work over sequence number wraps. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20240801013815.2393869-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix fast commit inode enqueueing during a full journal commitLuis Henriques (SUSE)1-1/+1
commit 6db3c1575a750fd417a70e0178bdf6efa0dd5037 upstream. When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by re-enqueuing an inode into the STAGING queue during the fast commit clean-up callback when doing a full commit. However, to prevent a race with a fast-commit, the clean-up callback has to be called with the journal locked. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-08jbd2: increase maximum transaction sizeJan Kara1-1/+1
Originally, we were quite conservative in limiting maximum transaction size to a quarter of the journal because we were not accounting transaction descriptor and revoke blocks. These days we do properly account them and reserve space for them from the total transaction credits. Thus there's no need to be so conservative and we can increase the maximum transaction size to one third of the journal (even half should work fine in principle but the performance will likely suffer in that case). This also fixes failures to grow filesystems with tiny journals. Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: drop pointless shrinker batch initializationJan Kara1-1/+0
In jbd2_journal_init_common() we set batch size of a shrinker shrinking checkpointed buffers to journal->j_max_transaction_buffers. But that is guaranteed to be 0 at that point so we effectively stay with the default shrinker batch size of 128. It has been like this since introduction of jbd2 shrinkers so just drop the pointless initialization. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: precompute number of transaction descriptor blocksJan Kara1-15/+46
Instead of computing the number of descriptor blocks a transaction can have each time we need it (which is currently when starting each transaction but will become more frequent later) precompute the number once during journal initialization together with maximum transaction size. We perform the precomputation whenever journal feature set is updated similarly as for computation of journal->j_revoke_records_per_block. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: make jbd2_journal_get_max_txn_bufs() internalJan Kara1-0/+5
There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-05jbd2: add missing MODULE_DESCRIPTION()Jeff Johnson1-0/+1
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/jbd2/jbd2.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240526-md-fs-jbd2-v1-1-7bba6665327d@quicinc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-05jbd2: speed up jbd2_transaction_committed()Zhang Yi1-11/+1
jbd2_transaction_committed() is used to check whether a transaction with the given tid has already committed, it holds j_state_lock in read mode and check the tid of current running transaction and committing transaction, but holding the j_state_lock is expensive. We have already stored the sequence number of the most recently committed transaction in journal t->j_commit_sequence, we could do this check by comparing it with the given tid instead. If the given tid isn't smaller than j_commit_sequence, we can ensure that the given transaction has been committed. That way we could drop the expensive lock and achieve about 10% ~ 20% performance gains in concurrent DIOs on may virtual machine with 100G ramdisk. fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \ -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \ -group_reporting Before: overwrite IOPS=88.2k, BW=344MiB/s read IOPS=95.7k, BW=374MiB/s rand overwrite IOPS=98.7k, BW=386MiB/s randread IOPS=102k, BW=397MiB/s After: overwrite IOPS=105k, BW=410MiB/s read IOPS=112k, BW=436MiB/s rand overwrite IOPS=104k, BW=404MiB/s randread IOPS=111k, BW=432MiB/s CC: Dave Chinner <david@fromorbit.com> Suggested-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/linux-ext4/ZjILCPNZRHeazSqV@dread.disaster.area/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20240520131831.2910790-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unnecessary "should_sleep" in kjournald2Kemeng Shi1-5/+2
We only need to sleep if no running transaction is expired. Simply remove unnecessary "should_sleep". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-10-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove dead check of JBD2_UNMOUNT in kjournald2Kemeng Shi1-2/+0
We always set JBD2_UNMOUNT with j_state_lock held in journal_kill_thread. In kjournald2, we check JBD2_UNMOUNT flag two times under the same j_state_lock. Then the second check is unnecessary. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove dead equality check of j_commit_[sequence/request] in kjournald2Kemeng Shi1-2/+0
The j_commit_[sequence/request] are updated with j_state_lock held during runtime. In kjournald2, two equality checks of j_commit_[sequence/request] are under the same j_state_lock, then the second check is unnecessary. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: use bh_in instead of jh2bh(jh_in) to simplify codeKemeng Shi1-2/+2
We save jh2bh(jh_in) to bh_in, so use bh_in directly instead of jh2bh(jh_in) to simplify the code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unneeded kmap to do escape in jbd2_journal_write_metadata_bufferKemeng Shi1-5/+4
The data to do escape could be accessed directly from b_frozen_data, just remove unneeded kmap. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: jump to new copy_done tag when b_frozen_data is created concurrentlyKemeng Shi1-7/+6
If b_frozen_data is created concurrently, we can update new_folio and new_offset with b_frozen_data and then move forward Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unnedded "need_copy_out" in jbd2_journal_write_metadata_bufferKemeng Shi1-5/+2
As we only need to copy out when we should do escape, need_copy_out could be simply replaced by "do_escape". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unused return info from jbd2_journal_write_metadata_bufferKemeng Shi1-6/+3
The done_copy_out info from jbd2_journal_write_metadata_buffer is not used. Simply remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: avoid memleak in jbd2_journal_write_metadata_bufferKemeng Shi1-0/+1
The new_bh is from alloc_buffer_head, we should call free_buffer_head to free it in error case. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-03use ->bd_mapping instead of ->bd_inode->i_mappingAl Viro1-1/+1
Just the low-hanging fruit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240411145346.2516848-2-viro@zeniv.linux.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-04jbd2: replace journal state flag by checking errseqZhihao Cheng1-5/+5
Now JBD2 detects metadata writeback error of fs dev according to errseq. Replace journal state flag by checking errseq. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04jbd2: add errseq to detect client fs's bdev writeback errorZhihao Cheng1-0/+1
Add errseq in journal, so that JBD2 can detect whether metadata is successfully written to fs bdev. This patch adds detection in recovery process to replace original solution(using local variable wb_err). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30jbd2: increase the journal IO's priorityZhang Yi1-9/+11
Current jbd2 only add REQ_SYNC for descriptor block, metadata log buffer, commit buffer and superblock buffer, the submitted IO could be throttled by writeback throttle in block layer, that could lead to priority inversion in some cases. The log IO looks like a kind of high priority metadata IO, so it should not be throttled by WBT like QOS policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from writeback throttle, and also add REQ_META together indicates it's a metadata IO. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30jbd2: correct the printing of write_flags in jbd2_write_superblock()Zhang Yi1-1/+3
The write_flags print in the trace of jbd2_write_superblock() is not real, so move the modification before the trace. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-10-04jbd2,ext4: dynamically allocate the jbd2-journal shrinkerQi Zheng1-11/+18
In preparation for implementing lockless slab shrink, use new APIs to dynamically allocate the jbd2-journal shrinker, so that it can be freed asynchronously via RCU. Then it doesn't need to wait for RCU read-side critical section when releasing the struct journal_s. Link: https://lkml.kernel.org/r/20230911094444.68966-32-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-17Merge tag 'ext4_for_linus-6.6-rc2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Regression and bug fixes for ext4" * tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix rec_len verify error ext4: do not let fstrim block system suspend ext4: move setting of trimmed bit into ext4_try_to_trim_range() jbd2: Fix memory leak in journal_init_common() jbd2: Remove page size assumptions buffer: Make bh_offset() work for compound pages
2023-09-14jbd2: Fix memory leak in journal_init_common()Li Zetao1-0/+2
There is a memory leak reported by kmemleak: unreferenced object 0xff11000105903b80 (size 64): comm "mount", pid 3382, jiffies 4295032021 (age 27.826s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffae86ac40>] __kmalloc_node+0x50/0x160 [<ffffffffaf2486d8>] crypto_alloc_tfmmem.isra.0+0x38/0x110 [<ffffffffaf2498e5>] crypto_create_tfm_node+0x85/0x2f0 [<ffffffffaf24a92c>] crypto_alloc_tfm_node+0xfc/0x210 [<ffffffffaedde777>] journal_init_common+0x727/0x1ad0 [<ffffffffaede1715>] jbd2_journal_init_inode+0x2b5/0x500 [<ffffffffaed786b5>] ext4_load_and_init_journal+0x255/0x2440 [<ffffffffaed8b423>] ext4_fill_super+0x8823/0xa330 ... The root cause was traced to an error handing path in journal_init_common() when malloc memory failed in register_shrinker(). The checksum driver is used to reference to checksum algorithm via cryptoapi and the user should release the memory when the driver is no longer needed or the journal initialization failed. Fix it by calling crypto_free_shash() on the "err_cleanup" error handing path in journal_init_common(). Fixes: c30713084ba5 ("jbd2: move load_superblock() into journal_init_common()") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230911025138.983101-1-lizetao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-31Merge tag 'ext4_for_linus-6.6-rc1' of ↵Linus Torvalds1-254/+230
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many ext4 and jbd2 cleanups and bug fixes: - Cleanups in the ext4 remount code when going to and from read-only - Cleanups in ext4's multiblock allocator - Cleanups in the jbd2 setup/mounting code paths - Performance improvements when appending to a delayed allocation file - Miscellaneous syzbot and other bug fixes" * tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: fix slab-use-after-free in ext4_es_insert_extent() libfs: remove redundant checks of s_encoding ext4: remove redundant checks of s_encoding ext4: reject casefold inode flag without casefold feature ext4: use LIST_HEAD() to initialize the list_head in mballoc.c ext4: do not mark inode dirty every time when appending using delalloc ext4: rename s_error_work to s_sb_upd_work ext4: add periodic superblock update check ext4: drop dio overwrite only flag and associated warning ext4: add correct group descriptors and reserved GDT blocks to system zone ext4: remove unused function declaration ext4: mballoc: avoid garbage value from err ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() ext4: change the type of blocksize in ext4_mb_init_cache() ext4: fix unttached inode after power cut with orphan file feature enabled jbd2: correct the end of the journal recovery scan range ext4: ext4_get_{dev}_journal return proper error value ext4: cleanup ext4_get_dev_journal() and ext4_get_journal() jbd2: jbd2_journal_init_{dev,inode} return proper error return value jbd2: drop useless error tag in jbd2_journal_wipe() ...
2023-08-23jbd2: jbd2_journal_init_{dev,inode} return proper error return valueZhang Yi1-10/+9
Current jbd2_journal_init_{dev,inode} return NULL if some error happens, make them to pass out proper error return value. [ Fix from Yang Yingliang folded in. ] Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-11-yi.zhang@huaweicloud.com Link: https://lore.kernel.org/r/20230822030018.644419-1-yangyingliang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-18jbd2: use a folio in jbd2_journal_write_metadata_buffer()Matthew Wilcox (Oracle)1-19/+16
The primary goal here is removing the use of set_bh_page(). Take the opportunity to switch from kmap_atomic() to kmap_local(). This simplifies the function as the offset is already added to the pointer. Link: https://lkml.kernel.org/r/20230713035512.4139457-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-15jbd2: drop useless error tag in jbd2_journal_wipe()Zhang Yi1-3/+2
no_recovery is redundant, just drop it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: cleanup journal_init_common()Zhang Yi1-21/+24
Adjust the initialization sequence and error handle of journal_t, moving load superblock to the begin, and classify others initialization. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: add fast_commit space checkZhang Yi1-4/+12
If JBD2_FEATURE_INCOMPAT_FAST_COMMIT bit is set, it means the journal have fast commit records need to recover, so the fast commit size should not be too large, and the leftover normal journal size should never less than JBD2_MIN_JOURNAL_BLOCKS. If it happens, the journal->j_last is likely to be wrong and will probably lead to incorrect journal recovery. So add a check into the journal_check_superblock(), and drop the pointless check when initializing the fastcommit parameters. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: cleanup load_superblock()Zhang Yi1-50/+35
Rename load_superblock() to journal_load_superblock(), move getting and reading superblock from journal_init_common() and journal_get_superblock() to this function, and also rename journal_get_superblock() to journal_check_superblock(), make it a pure check helper to check superblock validity from disk. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: open code jbd2_verify_csum_type() helperZhang Yi1-13/+5
jbd2_verify_csum_type() helper check checksum type in the superblock for v2 or v3 checksum feature, it always return true if these features are not enabled, and it has only one user, so open code it is more clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: checking valid features early in journal_get_superblock()Zhang Yi1-15/+15
journal_get_superblock() is used to check validity of the jounal supberblock, so move the features checks from jbd2_journal_load() to journal_get_superblock(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: don't load superblock in jbd2_journal_check_used_features()Zhang Yi1-5/+0
Since load_superblock() has been moved to journal_init_common(), the in-memory superblock structure is initialized and contains valid data once the file system has a journal_t object, so it's safe to access it, let's drop the call to journal_get_superblock() from jbd2_journal_check_used_features() and also drop the setting/clearing of the veirfy bit of the superblock buffer. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: move load_superblock() into journal_init_common()Zhang Yi1-11/+5
Move the call to load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() early into journal_init_common(), the journal superblock gets read and the in-memory journal_t structure gets initialised after calling jbd2_journal_init_{dev,inode}, it's safe to do following initialization according to it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-15jbd2: move load_superblock() dependent functionsZhang Yi1-169/+168
Move load_superblock() declaration and the functions it calls before journal_init_common(). This is a preparation for moving a call to load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to journal_init_common(). No functional changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-07-29fs: jbd2: fix an incorrect warn logGuoqing Cai1-8/+10
In jbd2_journal_load(), when journal_reset fails, it prints an incorrect warn log. Fix this by changing the goto statement to return statement. Also, return actual error code from jbd2_journal_recover() and journal_reset(). Signed-off-by: Guoqing Cai <u202112087@hust.edu.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230413095740.2222066-1-u202112087@hust.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-29Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-39/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Various cleanups and bug fixes in ext4's extent status tree, journalling, and block allocator subsystems. Also improve performance for parallel DIO overwrites" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (55 commits) ext4: avoid updating the superblock on a r/o mount if not needed jbd2: skip reading super block if it has been verified ext4: fix to check return value of freeze_bdev() in ext4_shutdown() ext4: refactoring to use the unified helper ext4_quotas_off() ext4: turn quotas off if mount failed after enabling quotas ext4: update doc about journal superblock description ext4: add journal cycled recording support jbd2: continue to record log between each mount jbd2: remove j_format_version jbd2: factor out journal initialization from journal_get_superblock() jbd2: switch to check format version in superblock directly jbd2: remove unused feature macros ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev' ext4: Fix reusing stale buffer heads from last failed mounting ext4: allow concurrent unaligned dio overwrites ext4: clean up mballoc criteria comments ext4: make ext4_zeroout_es() return void ext4: make ext4_es_insert_extent() return void ext4: make ext4_es_insert_delayed_block() return void ext4: make ext4_es_remove_extent() return void ...
2023-06-26jbd2: skip reading super block if it has been verifiedZhang Yi1-4/+3
We got a NULL pointer dereference issue below while running generic/475 I/O failure pressure test. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 RIP: 0010:jbd2_journal_set_features+0x13d/0x430 ... Call Trace: <TASK> ? __die+0x23/0x60 ? page_fault_oops+0xa4/0x170 ? exc_page_fault+0x67/0x170 ? asm_exc_page_fault+0x26/0x30 ? jbd2_journal_set_features+0x13d/0x430 jbd2_journal_revoke+0x47/0x1e0 __ext4_forget+0xc3/0x1b0 ext4_free_blocks+0x214/0x2f0 ext4_free_branches+0xeb/0x270 ext4_ind_truncate+0x2bf/0x320 ext4_truncate+0x1e4/0x490 ext4_handle_inode_extension+0x1bd/0x2a0 ? iomap_dio_complete+0xaf/0x1d0 The root cause is the journal super block had been failed to write out due to I/O fault injection, it's uptodate bit was cleared by end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock(). And it raced by journal_get_superblock()->bh_read(), unfortunately, the read IO is also failed, so the error handling in journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer, finally lead to above NULL pointer dereference issue. If the journal super block had been read and verified, there is no need to call bh_read() read it again even if it has been failed to written out. So the fix could be simply move buffer_verified(bh) in front of bh_read(). Also remove a stale comment left in jbd2_journal_check_used_features(). Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()") Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616015547.3155195-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: continue to record log between each mountZhang Yi1-2/+16
For a newly mounted file system, the journal committing thread always record new transactions from the start of the journal area, no matter whether the journal was clean or just has been recovered. So the logdump code in debugfs cannot dump continuous logs between each mount, it is disadvantageous to analysis corrupted file system image and locate the file system inconsistency bugs. If we get a corrupted file system in the running products and want to find out what has happened, besides lookup the system log, one effective way is to backtrack the journal log. But we may not always run e2fsck before each mount and the default fsck -a mode also cannot always checkout all inconsistencies, so it could left over some inconsistencies into the next mount until we detect it. Finally, transactions in the journal may probably discontinuous and some relatively new transactions has been covered, it becomes hard to analyse. If we could record transactions continuously between each mount, we could acquire more useful info from the journal. Like this: |Previous mount checkpointed/recovered logs|Current mount logs | |{------}{---}{--------} ... {------}| ... |{======}{========}...000000| And yes the journal area is limited and cannot record everything, the problematic transaction may also be covered even if we do this, but this is still useful for fuzzy tests and short-running products. This patch save the head blocknr in the superblock after flushing the journal or unmounting the file system, let the next mount could continue to record new transaction behind it. This change is backward compatible because the old kernel does not care about the head blocknr of the journal. It is also fine if we mount a clean old image without valid head blocknr, we fail back to set it to s_first just like before. Finally, for the case of mount an unclean file system, we could also get the journal head easily after scanning/replaying the journal, it will continue to record new transaction after the recovered transactions. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: remove j_format_versionZhang Yi1-9/+0
journal->j_format_version is no longer used, remove it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-7-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: factor out journal initialization from journal_get_superblock()Zhang Yi1-24/+22
Current journal_get_superblock() couple journal superblock checking and partial journal initialization, factor out initialization part from it to make things clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-6-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: switch to check format version in superblock directlyZhang Yi1-9/+7
We should only check and set extented features if journal format version is 2, and now we check the in memory copy of the superblock 'journal->j_format_version', which relys on the parameter initialization sequence, switch to use the h_blocktype in superblock cloud be more clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-05jbd2: Avoid printing outside the boundary of the bufferAndy Shevchenko1-4/+2
Theoretically possible that "%pg" will take all room for the j_devname and hence the "-%lu" will go outside the boundary due to unconditional sprintf() in use. To make this code more robust, replace two sequential s*printf():s by a single call and then replace forbidden character. It's possible to do this way, because '/' won't ever be in the result of "-%lu". Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605170553.7835-2-andriy.shevchenko@linux.intel.com
2023-03-12Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Bug fixes and regressions for ext4, the most serious of which is a potential deadlock during directory renames that was introduced during the merge window discovered by a combination of syzbot and lockdep" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: zero i_disksize when initializing the bootloader inode ext4: make sure fs error flag setted before clear journal error ext4: commit super block if fs record error when journal record without error ext4, jbd2: add an optimized bmap for the journal inode ext4: fix WARNING in ext4_update_inline_data ext4: move where set the MAY_INLINE_DATA flag is set ext4: Fix deadlock during directory rename ext4: Fix comment about the 64BIT feature docs: ext4: modify the group desc size to 64 ext4: fix another off-by-one fsmap error on 1k block filesystems ext4: fix RENAME_WHITEOUT handling for inline directories ext4: make kobj_type structures constant ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-11ext4, jbd2: add an optimized bmap for the journal inodeTheodore Ts'o1-3/+6
The generic bmap() function exported by the VFS takes locks and does checks that are not necessary for the journal inode. So allow the file system to set a journal-optimized bmap function in journal->j_bmap. Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-01-18jbd2,ocfs2: move jbd2_journal_submit_inode_data_buffers to ocfs2Christoph Hellwig1-1/+0
jbd2_journal_submit_inode_data_buffers is only used by ocfs2, so move it there to prepare for removing generic_writepages. Link: https://lkml.kernel.org/r/20221229161031.391878-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18jbd2: replace obvious uses of b_page with b_folioMatthew Wilcox (Oracle)1-1/+1
These places just use b_page to get to the buffer's address_space or have already been converted to folio. Link: https://lkml.kernel.org/r/20221215214402.3522366-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds1-9/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc clea