From 62ed1b58224636185fa689db81224b8c8af46473 Mon Sep 17 00:00:00 2001 From: Li Nan Date: Mon, 3 Nov 2025 20:57:57 +0800 Subject: md: allow configuring logical block size Previously, raid array used the maximum logical block size (LBS) of all member disks. Adding a larger LBS disk at runtime could unexpectedly increase RAID's LBS, risking corruption of existing partitions. This can be reproduced by: ``` # LBS of sd[de] is 512 bytes, sdf is 4096 bytes. mdadm -CRq /dev/md0 -l1 -n3 /dev/sd[de] missing --assume-clean # LBS is 512 cat /sys/block/md0/queue/logical_block_size # create partition md0p1 parted -s /dev/md0 mklabel gpt mkpart primary 1MiB 100% lsblk | grep md0p1 # LBS becomes 4096 after adding sdf mdadm --add -q /dev/md0 /dev/sdf cat /sys/block/md0/queue/logical_block_size # partition lost partprobe /dev/md0 lsblk | grep md0p1 ``` Simply restricting larger-LBS disks is inflexible. In some scenarios, only disks with 512 bytes LBS are available currently, but later, disks with 4KB LBS may be added to the array. Making LBS configurable is the best way to solve this scenario. After this patch, the raid will: - store LBS in disk metadata - add a read-write sysfs 'mdX/logical_block_size' Future mdadm should support setting LBS via metadata field during RAID creation and the new sysfs. Though the kernel allows runtime LBS changes, users should avoid modifying it after creating partitions or filesystems to prevent compatibility issues. Only 1.x metadata supports configurable LBS. 0.90 metadata inits all fields to default values at auto-detect. Supporting 0.90 would require more extensive changes and no such use case has been observed. Note that many RAID paths rely on PAGE_SIZE alignment, including for metadata I/O. A larger LBS than PAGE_SIZE will result in metadata read/write failures. So this config should be prevented. Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-6-linan666@huaweicloud.com Signed-off-by: Li Nan Reviewed-by: Xiao Ni Signed-off-by: Yu Kuai --- Documentation/admin-guide/md.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst index deed823eab01..dc7eab191caa 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -238,6 +238,16 @@ All md devices contain: the number of devices in a raid4/5/6, or to support external metadata formats which mandate such clipping. + logical_block_size + Configure the array's logical block size in bytes. This attribute + is only supported for 1.x meta. Write the value before starting + array. The final array LBS uses the maximum between this + configuration and LBS of all combined devices. Note that + LBS cannot exceed PAGE_SIZE before RAID supports folio. + WARNING: Arrays created on new kernel cannot be assembled at old + kernel due to padding check, Set module parameter 'check_new_feature' + to false to bypass, but data loss may occur. + reshape_position This is either ``none`` or a sector number within the devices of the array where ``reshape`` is up to. If this is set, the three -- cgit v1.2.3 From 7bf90cd740bf87dd1692cf74d49bb1dc849dcd11 Mon Sep 17 00:00:00 2001 From: Coly Li Date: Thu, 13 Nov 2025 13:36:25 +0800 Subject: bcache: remove discard sysfs interface document This patch removes documents of bcache discard sysfs interface, it drops discard related sections from, - Documentation/ABI/testing/sysfs-block-bcache - Documentation/admin-guide/bcache.rst Signed-off-by: Coly Li Signed-off-by: Jens Axboe --- Documentation/admin-guide/bcache.rst | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst index 6fdb495ac466..f71f349553e4 100644 --- a/Documentation/admin-guide/bcache.rst +++ b/Documentation/admin-guide/bcache.rst @@ -17,8 +17,7 @@ The latest bcache kernel code can be found from mainline Linux kernel: It's designed around the performance characteristics of SSDs - it only allocates in erase block sized buckets, and it uses a hybrid btree/log to track cached extents (which can be anywhere from a single sector to the bucket size). It's -designed to avoid random writes at all costs; it fills up an erase block -sequentially, then issues a discard before reusing it. +designed to avoid random writes at all costs. Both writethrough and writeback caching are supported. Writeback defaults to off, but can be switched on and off arbitrarily at runtime. Bcache goes to @@ -618,19 +617,11 @@ bucket_size cache_replacement_policy One of either lru, fifo or random. -discard - Boolean; if on a discard/TRIM will be issued to each bucket before it is - reused. Defaults to off, since SATA TRIM is an unqueued command (and thus - slow). - freelist_percent Size of the freelist as a percentage of nbuckets. Can be written to to increase the number of buckets kept on the freelist, which lets you artificially reduce the size of the cache at runtime. Mostly for testing - purposes (i.e. testing how different size caches affect your hit rate), but - since buckets are discarded when they move on to the freelist will also make - the SSD's garbage collection easier by effectively giving it more reserved - space. + purposes (i.e. testing how different size caches affect your hit rate). io_errors Number of errors that have occurred, decayed by io_error_halflife. -- cgit v1.2.3 From ade260ca858627b21be87711b1e12a7bf80c0261 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Sat, 15 Nov 2025 21:15:56 +0900 Subject: Documentation: admin-guide: blockdev: update zloop parameters In Documentation/admin-guide/blockdev/zoned_loop.rst, add the description of the zone_append and ordered_zone_append configuration arguments of zloop "add" command (device creation). Signed-off-by: Damien Le Moal Signed-off-by: Jens Axboe --- Documentation/admin-guide/blockdev/zoned_loop.rst | 61 ++++++++++++++--------- 1 file changed, 37 insertions(+), 24 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/blockdev/zoned_loop.rst b/Documentation/admin-guide/blockdev/zoned_loop.rst index 64dcfde7450a..806adde664db 100644 --- a/Documentation/admin-guide/blockdev/zoned_loop.rst +++ b/Documentation/admin-guide/blockdev/zoned_loop.rst @@ -68,30 +68,43 @@ The options available for the add command can be listed by reading the In more details, the options that can be used with the "add" command are as follows. -================ =========================================================== -id Device number (the X in /dev/zloopX). - Default: automatically assigned. -capacity_mb Device total capacity in MiB. This is always rounded up to - the nearest higher multiple of the zone size. - Default: 16384 MiB (16 GiB). -zone_size_mb Device zone size in MiB. Default: 256 MiB. -zone_capacity_mb Device zone capacity (must always be equal to or lower than - the zone size. Default: zone size. -conv_zones Total number of conventioanl zones starting from sector 0. - Default: 8. -base_dir Path to the base directory where to create the directory - containing the zone files of the device. - Default=/var/local/zloop. - The device directory containing the zone files is always - named with the device ID. E.g. the default zone file - directory for /dev/zloop0 is /var/local/zloop/0. -nr_queues Number of I/O queues of the zoned block device. This value is - always capped by the number of online CPUs - Default: 1 -queue_depth Maximum I/O queue depth per I/O queue. - Default: 64 -buffered_io Do buffered IOs instead of direct IOs (default: false) -================ =========================================================== +=================== ========================================================= +id Device number (the X in /dev/zloopX). + Default: automatically assigned. +capacity_mb Device total capacity in MiB. This is always rounded up + to the nearest higher multiple of the zone size. + Default: 16384 MiB (16 GiB). +zone_size_mb Device zone size in MiB. Default: 256 MiB. +zone_capacity_mb Device zone capacity (must always be equal to or lower + than the zone size. Default: zone size. +conv_zones Total number of conventioanl zones starting from + sector 0 + Default: 8 +base_dir Path to the base directory where to create the directory + containing the zone files of the device. + Default=/var/local/zloop. + The device directory containing the zone files is always + named with the device ID. E.g. the default zone file + directory for /dev/zloop0 is /var/local/zloop/0. +nr_queues Number of I/O queues of the zoned block device. This + value is always capped by the number of online CPUs + Default: 1 +queue_depth Maximum I/O queue depth per I/O queue. + Default: 64 +buffered_io Do buffered IOs instead of direct IOs (default: false) +zone_append Enable or disable a zloop device native zone append + support. + Default: 1 (enabled). + If native zone append support is disabled, the block layer + will emulate this operation using regular write + operations. +ordered_zone_append Enable zloop mitigation of zone append reordering. + Default: disabled. + This is useful for testing file systems file data mapping + (extents), as when enabled, this can significantly reduce + the number of data extents needed to for a file data + mapping. +=================== ========================================================= 3) Deleting a Zoned Device -------------------------- -- cgit v1.2.3