summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2024-05-21Merge tag 'pull-bd_inode-1' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bdev bd_inode updates from Al Viro: "Replacement of bdev->bd_inode with sane(r) set of primitives by me and Yu Kuai" * tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RIP ->bd_inode dasd_format(): killing the last remaining user of ->bd_inode nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode block/bdev.c: use the knowledge of inode/bdev coallocation gfs2: more obvious initializations of mapping->host fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here... grow_dev_folio(): we only want ->bd_inode->i_mapping there use ->bd_mapping instead of ->bd_inode->i_mapping block_device: add a pointer to struct address_space (page cache of bdev) missing helpers: bdev_unhash(), bdev_drop() block: move two helpers into bdev.c block2mtd: prevent direct access of bd_inode dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) blkdev_write_iter(): saner way to get inode and bdev bcachefs: remove dead function bdev_sectors() ext4: remove block_device_ejected() erofs_buf: store address_space instead of inode erofs: switch erofs_bread() to passing offset instead of block number
2024-05-20bcachefs: Check for subvolues with bogus snapshot/inode fieldsKent Overstreet2-1/+12
This fixes an assertion pop in btree_iter.c that checks for forgetting to pass a snapshot ID when iterating over snapshots btrees. Reported-by: syzbot+0dfe05235e38653e2aee@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: bch2_checksum() returns 0 for unknown checksum typeKent Overstreet1-2/+2
This fixes missing guards on trying to calculate a checksum with an invalid/unknown checksum type; moving the guards up to e.g. btree_io.c might be "more correct", but doesn't buy us anything - an unknown checksum type will always be flagged as at least a checksum error so we aren't losing any safety doing it this way and it makes it less likely to accidentally pop an assert we don't want. Reported-by: syzbot+e951ad5349f3a34a715a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix bch2_alloc_ciphers()Kent Overstreet1-12/+13
Don't put error pointers in bch_fs, that's gross. This fixes (?) the check in bch2_checksum_type_valid() - depending on our error paths, or depending on what our error paths are doing it at least makes the code saner. Reported-by: syzbot+2e3cb81b5d1fe18a374b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Add missing guard in bch2_snapshot_has_children()Kent Overstreet1-5/+2
We additionally need to be going inconsistent if passed an invalid snapshot ID; that patch will need more thorough testing. Reported-by: syzbot+1c9fca23fe478633b305@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix missing parens in drop_locks_do()Kent Overstreet1-1/+1
Reported-by: syzbot+95db43b0a06f157ee865@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Improve bch2_assert_pos_locked()Kent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix shift overflows in replicas.cKent Overstreet1-8/+21
We can't disallow unknown data_types in verify() - we have to preserve them unchanged for backwards compat; that means we have to add a few more guards. Reported-by: syzbot+249018ea545364f78d04@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix shift overflow in btree_lost_data()Kent Overstreet2-0/+9
Reported-by: syzbot+29f65db1a5fe427b5c56@syzkaller.appspotmail.com Fixes: 55936afe1107 ("bcachefs: Flag btrees with missing data") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix ref in trans_mark_dev_sbs() error pathKent Overstreet1-1/+1
Reported-by: syzbot+5c7f715a7107a608a544@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO methodYouling Tang1-1/+2
Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Do that for bcachefs so that noop_direct_IO can eventually be removed. Similar to commit b29434999371 ("xfs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix rcu splat in check_fix_ptrs()Kent Overstreet1-5/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-19Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefsLinus Torvalds118-4324/+4395
Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ...
2024-05-13Merge tag 'vfs-6.10.misc' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
2024-05-09bcachefs: add no_invalid_checks flagThomas Bertschinger2-1/+8
Setting this flag on a filesystem results in validity checks being skipped when writing bkeys. This flag will be used by tooling that deliberately injects corruption into a filesystem in order to exercise fsck. It shouldn't be set outside of testing/debugging code. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: add counters for failed shrinker reclaimDaniel Hill4-22/+80
This adds distinct counters for every reason the btree node shrinker can fail to free an object - if our shrinker isn't making progress, this will tell us why. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: Fix sb_field_downgrade validationKent Overstreet1-2/+12
- bch2_sb_downgrade_validate() wasn't checking for a downgrade entry extending past the end of the superblock section - for_each_downgrade_entry() is used in to_text() and needs to work on malformed input; it also was missing a check for a field extending past the end of the section Reported-by: syzbot+e49ccab73449180bc9be@syzkaller.appspotmail.com Fixes: 84f1638795da ("bcachefs: bch_sb_field_downgrade") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: Plumb bch_validate_flags to sb_field_ops.validate()Kent Overstreet13-47/+38
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet31-110/+110
We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: fsync() should not return -EROFSKent Overstreet1-1/+4
fsync has a slightly odd usage of -EROFS, where it means "does not support fsync". I didn't choose it... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: Invalid devices are now checked for by fsck, not .invalid methodsKent Overstreet2-25/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs()Kent Overstreet2-14/+24
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio()Kent Overstreet1-14/+19
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref() checks for device not presentKent Overstreet9-37/+23
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); io_read.cKent Overstreet1-5/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); debug.cKent Overstreet1-4/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); journal_io.cKent Overstreet1-4/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); io_write.cKent Overstreet1-10/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); btree_io.cKent Overstreet1-15/+18
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); backpointers.cKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: bch2_dev_get_ioref2(); alloc_background.cKent Overstreet2-11/+26
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: for_each_bset() declares loop iterKent Overstreet6-34/+14
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.hPetr Vorel1-1/+2
Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC. There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC, which use UUID_INIT() and are used only in libbcachefs. Therefore move only BCACHEFS_STATFS_MAGIC value, which can be used outside of libbcachefs for f_type field in struct statfs in statfs() or fstatfs(). Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Improve sysfs internal/btree_cacheKent Overstreet2-5/+30
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Allocator prefers not to expand mi.btree_allocated bitmapKent Overstreet3-11/+66
We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Better bucket alloc tracepointsKent Overstreet2-111/+51
Tracepoints are garbage, and perf trace even cuts off some of our fields. Much nicer to just trace a string, and then we can build nicely formatted output with printbufs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Move nocow unlock to bch2_write_endio()Kent Overstreet2-19/+8
This fixes a lifetime issue; bch2_nocow_write_unlock() uses PTR_BUCKET_POS(), which needs the device - but we drop our ref to the device in bch2_write_endio(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in journal_ptrs_to_text()Kent Overstreet1-5/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in discard_one_bucket_fast()Kent Overstreet1-2/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in check_alloc_info()Kent Overstreet2-40/+44
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_have_ref()Kent Overstreet3-7/+7
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that we're looking up a device without checking if it's present because we have a reference to it already. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in data_update_init()Kent Overstreet1-2/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in bkey_pick_read_device()Kent Overstreet1-15/+22
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: pass bch_dev to read_from_stale_dirty_pointer()Kent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dev_bucket_exists() uses bch2_dev_rcu()Kent Overstreet1-7/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill bch2_dev_bkey_exists() in btree_gc.cKent Overstreet1-10/+31
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_extent_normalize() -> bch2_dev_rcu()Kent Overstreet1-1/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_bkey_has_target() -> bch2_dev_rcu()Kent Overstreet1-3/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: extent_ptr_invalid() -> bch2_dev_rcu()Kent Overstreet1-11/+19
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: ptr_stale() -> dev_ptr_stale()Kent Overstreet6-16/+18
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>