summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2020-09-11f2fs: support age threshold based garbage collectionChao Yu10-60/+623
There are several issues in current background GC algorithm: - valid blocks is one of key factors during cost overhead calculation, so if segment has less valid block, however even its age is young or it locates hot segment, CB algorithm will still choose the segment as victim, it's not appropriate. - GCed data/node will go to existing logs, no matter in-there datas' update frequency is the same or not, it may mix hot and cold data again. - GC alloctor mainly use LFS type segment, it will cost free segment more quickly. This patch introduces a new algorithm named age threshold based garbage collection to solve above issues, there are three steps mainly: 1. select a source victim: - set an age threshold, and select candidates beased threshold: e.g. 0 means youngest, 100 means oldest, if we set age threshold to 80 then select dirty segments which has age in range of [80, 100] as candiddates; - set candidate_ratio threshold, and select candidates based the ratio, so that we can shrink candidates to those oldest segments; - select target segment with fewest valid blocks in order to migrate blocks with minimum cost; 2. select a target victim: - select candidates beased age threshold; - set candidate_radius threshold, search candidates whose age is around source victims, searching radius should less than the radius threshold. - select target segment with most valid blocks in order to avoid migrating current target segment. 3. merge valid blocks from source victim into target victim with SSR alloctor. Test steps: - create 160 dirty segments: * half of them have 128 valid blocks per segment * left of them have 384 valid blocks per segment - run background GC Benefit: GC count and block movement count both decrease obviously: - Before: - Valid: 86 - Dirty: 1 - Prefree: 11 - Free: 6001 (6001) GC calls: 162 (BG: 220) - data segments : 160 (160) - node segments : 2 (2) Try to move 41454 blocks (BG: 41454) - data blocks : 40960 (40960) - node blocks : 494 (494) IPU: 0 blocks SSR: 0 blocks in 0 segments LFS: 41364 blocks in 81 segments - After: - Valid: 87 - Dirty: 0 - Prefree: 4 - Free: 6008 (6008) GC calls: 75 (BG: 76) - data segments : 74 (74) - node segments : 1 (1) Try to move 12813 blocks (BG: 12813) - data blocks : 12544 (12544) - node blocks : 269 (269) IPU: 0 blocks SSR: 12032 blocks in 77 segments LFS: 855 blocks in 2 segments Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: Use generic casefolding supportDaniel Rosenberg4-88/+20
This switches f2fs over to the generic support provided in the previous patch. Since casefolded dentries behave the same in ext4 and f2fs, we decrease the maintenance burden by unifying them, and any optimizations will immediately apply to both. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: compress: use more readable atomic_t type for {cic,dic}.refChao Yu3-10/+10
refcount_t type variable should never be less than one, so it's a little bit hard to understand when we use it to indicate pending compressed page count, let's change to use atomic_t for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: fix compile warningChao Yu1-2/+5
This patch fixes below compile warning reported by LKP (kernel test robot) cppcheck warnings: (new ones prefixed by >>) >> fs/f2fs/file.c:761:9: warning: Identical condition 'err', second condition is always false [identicalConditionAfterEarlyExit] return err; ^ fs/f2fs/file.c:753:6: note: first condition if (err) ^ fs/f2fs/file.c:761:9: note: second condition return err; Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: support 64-bits key in f2fs rb-tree node entryChao Yu3-7/+49
then, we can add specified entry into rb-tree with 64-bits segment time as key. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: inherit mtime of original block during GCChao Yu4-17/+50
Don't let f2fs inner GC ruins original aging degree of segment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: record average update time of segmentChao Yu1-3/+18
Previously, once we update one block in segment, we will update mtime of segment to last time, making aged segment becoming freshest, result in that GC with cost benefit algorithm missing such segment, So this patch changes to record mtime as average block updating time instead of last updating time. It's not needed to reset mtime for prefree segment, as se->valid_blocks is zero, then old se->mtime won't take any weight with below calculation: se->mtime = div_u64(se->mtime * se->valid_blocks + mtime, se->valid_blocks + 1); Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: introduce inmem cursegChao Yu8-52/+113
Previous implementation of aligned pinfile allocation will: - allocate new segment on cold data log no matter whether last used segment is partially used or not, it makes IOs more random; - force concurrent cold data/GCed IO going into warm data area, it can make a bad effect on hot/cold data separation; In this patch, we introduce a new type of log named 'inmem curseg', the differents from normal curseg is: - it reuses existed segment type (CURSEG_XXX_NODE/DATA); - it only exists in memory, its segno, blkofs, summary will not b persisted into checkpoint area; With this new feature, we can enhance scalability of log, special allocators can be created for purposes: - pure lfs allocator for aligned pinfile allocation or file defragmentation - pure ssr allocator for later feature So that, let's update aligned pinfile allocation to use this new inmem curseg fwk. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: compress: remove unneeded codeChao Yu1-4/+0
- f2fs_write_multi_pages - f2fs_compress_pages - init_compress_ctx - compress_pages - destroy_compress_ctx --- 1 - f2fs_write_compressed_pages - destroy_compress_ctx --- 2 destroy_compress_ctx() in f2fs_write_multi_pages() is redundant, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: remove duplicated type castingXiaojun Wang3-4/+4
Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been converted as unsigned long type, we don't need do type casting again. Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reported-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-10f2fs: support zone capacity less than zone sizeAravind Ramesh6-37/+260
NVMe Zoned Namespace devices can have zone-capacity less than zone-size. Zone-capacity indicates the maximum number of sectors that are usable in a zone beginning from the first sector of the zone. This makes the sectors sectors after the zone-capacity till zone-size to be unusable. This patch set tracks zone-size and zone-capacity in zoned devices and calculate the usable blocks per segment and usable segments per section. If zone-capacity is less than zone-size mark only those segments which start before zone-capacity as free segments. All segments at and beyond zone-capacity are treated as permanently used segments. In cases where zone-capacity does not align with segment size the last segment will start before zone-capacity and end beyond the zone-capacity of the zone. For such spanning segments only sectors within the zone-capacity are used. During writes and GC manage the usable segments in a section and usable blocks per segment. Segments which are beyond zone-capacity are never allocated, and do not need to be garbage collected, only the segments which are before zone-capacity needs to garbage collected. For spanning segments based on the number of usable blocks in that segment, write to blocks only up to zone-capacity. Zone-capacity is device specific and cannot be configured by the user. Since NVMe ZNS device zones are sequentially write only, a block device with conventional zones or any normal block device is needed along with the ZNS device for the metadata operations of F2fs. A typical nvme-cli output of a zoned device shows zone start and capacity and write pointer as below: SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones are in EMPTY state. For each zone, only zone start + 49MB is usable area, any lba/sector after 49MB cannot be read or written to, the drive will fail any attempts to read/write. So, the second zone starts at 64MB and is usable till 113MB (64 + 49) and the range between 113 and 128MB is again unusable. The next zone starts at 128MB, and so on. Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08f2fs: Return EOF on unaligned end of file DIO readGabriel Krisman Bertazi1-0/+3
Reading past end of file returns EOF for aligned reads but -EINVAL for unaligned reads on f2fs. While documentation is not strict about this corner case, most filesystem returns EOF on this case, like iomap filesystems. This patch consolidates the behavior for f2fs, by making it return EOF(0). it can be verified by a read loop on a file that does a partial read before EOF (A file that doesn't end at an aligned address). The following code fails on an unaligned file on f2fs, but not on btrfs, ext4, and xfs. while (done < total) { ssize_t delta = pread(fd, buf + done, total - done, off + done); if (!delta) break; ... } It is arguable whether filesystems should actually return EOF or -EINVAL, but since iomap filesystems support it, and so does the original DIO code, it seems reasonable to consolidate on that. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08f2fs: fix indefinite loop scanning for free nidSahitya Tummala1-0/+3
If the sbi->ckpt->next_free_nid is not NAT block aligned and if there are free nids in that NAT block between the start of the block and next_free_nid, then those free nids will not be scanned in scan_nat_page(). This results into mismatch between nm_i->available_nids and the sum of nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids will always be greater than the sum of free nids in all the blocks. Under this condition, if we use all the currently scanned free nids, then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids is still not zero but nm_i->free_nid_count of that partially scanned NAT block is zero. Fix this to align the nm_i->next_scan_nid to the first nid of the corresponding NAT block. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08f2fs: Fix type of section block count variablesShin'ichiro Kawasaki1-4/+4
Commit da52f8ade40b ("f2fs: get the right gc victim section when section has several segments") added code to count blocks of each section using variables with type 'unsigned short', which has 2 bytes size in many systems. However, the counts can be larger than the 2 bytes range and type conversion results in wrong values. Especially when the f2fs sections have blocks as many as USHRT_MAX + 1, the count is handled as 0. This triggers eternal loop in init_dirty_segmap() at mount system call. Fix this by changing the type of the variables to block_t. Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-07fscrypt: drop unused inode argument from fscrypt_fname_alloc_bufferJeff Layton1-1/+1
Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200810142139.487631-1-jlayton@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2-3/+3
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-10Merge tag 'f2fs-for-5.9-rc1' of ↵Linus Torvalds20-289/+745
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added two small interfaces: (a) GC_URGENT_LOW mode for performance and (b) F2FS_IOC_SEC_TRIM_FILE ioctl for security. The new GC mode allows Android to run some lower priority GCs in background, while new ioctl discards user information without race condition when the account is removed. In addition, some patches were merged to address latency-related issues. We've fixed some compression-related bug fixes as well as edge race conditions. Enhancements: - add GC_URGENT_LOW mode in gc_urgent - introduce F2FS_IOC_SEC_TRIM_FILE ioctl - bypass racy readahead to improve read latencies - shrink node_write lock coverage to avoid long latency Bug fixes: - fix missing compression flag control, i_size, and mount option - fix deadlock between quota writes and checkpoint - remove inode eviction path in synchronous path to avoid deadlock - fix to wait GCed compressed page writeback - fix a kernel panic in f2fs_is_compressed_page - check page dirty status before writeback - wait page writeback before update in node page write flow - fix a race condition between f2fs_write_end_io and f2fs_del_fsync_node_entry We've added some minor sanity checks and refactored trivial code blocks for better readability and debugging information" * tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: prepare a waiter before entering io_schedule f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more intuitive f2fs: replace test_and_set/clear_bit() with set/clear_bit() f2fs: make file immutable even if releasing zero compression block f2fs: compress: disable compression mount option if compression is off f2fs: compress: add sanity check during compressed cluster read f2fs: use macro instead of f2fs verity version f2fs: fix deadlock between quota writes and checkpoint f2fs: correct comment of f2fs_exist_written_data f2fs: compress: delay temp page allocation f2fs: compress: fix to update isize when overwriting compressed file f2fs: space related cleanup f2fs: fix use-after-free issue f2fs: Change the type of f2fs_flush_inline_data() to void f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctl f2fs: should avoid inode eviction in synchronous path f2fs: segment.h: delete a duplicated word f2fs: compress: fix to avoid memory leak on cc->cpages f2fs: use generic names for generic ioctls f2fs: don't keep meta inode pages used for compressed block migration ...
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-08-03f2fs: prepare a waiter before entering io_scheduleJaegeuk Kim1-2/+2
This is to avoid sleep() in the waiter thread. [ 20.157753] ------------[ cut here ]------------ [ 20.158393] do not call blocking ops when !TASK_RUNNING; state=2 set at [<0000000096354225>] prepare_to_wait+0xcd/0x430 [ 20.159858] WARNING: CPU: 1 PID: 1152 at kernel/sched/core.c:7142 __might_sleep+0x149/0x1a0 ... [ 20.176110] __submit_merged_write_cond+0x191/0x310 [ 20.176739] f2fs_submit_merged_write+0x18/0x20 [ 20.177323] f2fs_wait_on_all_pages+0x269/0x2d0 [ 20.177899] ? block_operations+0x980/0x980 [ 20.178441] ? __kasan_check_read+0x11/0x20 [ 20.178975] ? finish_wait+0x260/0x260 [ 20.179488] ? percpu_counter_set+0x147/0x230 [ 20.180049] do_checkpoint+0x1757/0x2a50 [ 20.180558] f2fs_write_checkpoint+0x840/0xaf0 [ 20.181126] f2fs_sync_fs+0x287/0x4a0 Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more ↵Zhihao Cheng1-1/+1
intuitive Current judgment condition of f2fs_bug_on in function update_sit_entry(): new_vblocks >> (sizeof(unsigned short) << 3) || new_vblocks > sbi->blocks_per_seg which equivalents to: new_vblocks < 0 || new_vblocks > sbi->blocks_per_seg The latter is more intuitive. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reported-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: replace test_and_set/clear_bit() with set/clear_bit()Yufen Yu1-2/+2
Since set/clear_inode_flag() don't need to return value to show if flag is set, we can just call set/clear_bit() here. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: make file immutable even if releasing zero compression blockDaeho Jeong1-3/+3
When we use F2FS_IOC_RELEASE_COMPRESS_BLOCKS ioctl, if we can't find any compressed blocks in the file even with large file size, the ioctl just ends up without changing the file's status as immutable. It makes the user, who expects that the file is immutable when it returns successfully, confused. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: compress: disable compression mount option if compression is offChao Yu1-1/+14
If CONFIG_F2FS_FS_COMPRESSION is off, don't allow to configure or show compression related mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: compress: add sanity check during compressed cluster readChao Yu1-3/+1
In f2fs_read_multi_pages(), we don't have to check cluster's type again, since overwrite or partial truncation need page lock in cluster which has already been held by reader, so cluster's type is stable, let's change check condition to sanity check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: use macro instead of f2fs verity versionJack Qiu1-2/+4
Because fsverity_descriptor_location.version is constant, so use macro for better reading. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: fix deadlock between quota writes and checkpointJaegeuk Kim1-0/+2
f2fs_write_data_pages(quota_mapping) __f2fs_write_data_pages f2fs_write_checkpoint * blk_start_plug(&plug); * add bio in write_io[DATA] - block_operations - skip syncing quota by >DEFAULT_RETRY_QUOTA_FLUSH_COUNT - down_write(&sbi->node_write); - f2fs_write_single_data_page - down_read(node_write) - f2fs_wait_on_all_pages(F2FS_WB_CP_DATA); Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03f2fs: correct comment of f2fs_exist_written_dataJack Qiu1-1/+1
Function parameter mode could be TRANS_DIR_INO. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-26f2fs: compress: delay temp page allocationChao Yu1-16/+21
Currently, we allocate temp pages which is used to pad hole in cluster during read IO submission, it may take long time before releasing them in f2fs_decompress_pages(), since they are only used as temp output buffer in decompression context, so let's just do the allocation in that context to reduce time of memory pool resource occupation. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-26f2fs: compress: fix to update isize when overwriting compressed fileChao Yu1-0/+4
We missed to update isize of compressed file in write_end() with below case: cluster size is 16KB - write 14KB data from offset 0 - overwrite 16KB data from offset 0 Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-26f2fs: space related cleanupJack Qiu7-15/+15
Just for code style, no logic change 1. delete useless space 2. change spaces into tab Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-23f2fs: fix use-after-free issueLi Guifu1-2/+3
During umount, f2fs_put_super() unregisters procfs entries after f2fs_destroy_segment_manager(), it may cause use-after-free issue when umount races with procfs accessing, fix it by relocating f2fs_unregister_sysfs(). [Chao Yu: change commit title/message a bit] Signed-off-by: Li Guifu <bluce.liguifu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-23f2fs: Change the type of f2fs_flush_inline_data() to voidJia Yang2-4/+2
The return value of f2fs_flush_inline_data() is not used, so delete it. Signed-off-by: Jia Yang <jiayang5@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-21f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctlDaeho Jeong2-0/+206
Added a new ioctl to send discard commands or/and zero out to selected data area of a regular file for security reason. The way of handling range.len of F2FS_IOC_SEC_TRIM_FILE: 1. Added -1 value support for range.len to secure trim the whole blocks starting from range.start regardless of i_size. 2. If the end of the range passes over the end of file, it means until the end of file (i_size). 3. ignored the case of that range.len is zero to prevent the function from making end_addr zero and triggering different behaviour of the function. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-21f2fs: should avoid inode eviction in synchronous pathJaegeuk Kim1-3/+7
https://bugzilla.kernel.org/show_bug.cgi?id=208565 PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init" #0 [<c0b420ec>] (__schedule) from [<c0b423c8>] #1 [<c0b423c8>] (schedule) from [<c0b459d4>] #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>] #3 [<c0b44fa0>] (down_read) from [<c044233c>] #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>] #5 [<c0442890>] (f2fs_truncate) from [<c044d408>] #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>] #7 [<c030be18>] (evict) from [<c030a558>] #8 [<c030a558>] (iput) from [<c047c600>] #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>] #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>] #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>] #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>] #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>] #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>] #15 [<c0324294>] (do_fsync) from [<c0324014>] #16 [<c0324014>] (sys_fsync) from [<c0108bc0>] This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where iput() requires f2fs_lock_op() again resulting in livelock. Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-20f2fs: segment.h: delete a duplicated wordRandy Dunlap1-1/+1
Drop the repeated word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: linux-f2fs-devel@lists.sourceforge.net Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-20f2fs: compress: fix to avoid memory leak on cc->cpagesChao Yu1-0/+2
Memory allocated for storing compressed pages' poitner should be released after f2fs_write_compressed_pages(), otherwise it will cause memory leak issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-20f2fs: use generic names for generic ioctlsEric Biggers2-51/+28
Don't define F2FS_IOC_* aliases to ioctls that already have a generic FS_IOC_* name. These aliases are unnecessary, and they make it unclear which ioctls are f2fs-specific and which are generic. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16f2fs: Eliminate usage of uninitialized_var() macroJason Yan1-3/+1
This is an effort to eliminate the uninitialized_var() macro[1]. The use of this macro is the wrong solution because it forces off ANY analysis by the compiler for a given variable. It even masks "unused variable" warnings. Quoted from Linus[2]: "It's a horrible thing to use, in that it adds extra cruft to the source code, and then shuts up a compiler warning (even the _reliable_ warnings from gcc)." Fix it by remove this variable since it is not needed at all. [1] https://github.com/KSPP/linux/issues/81 [2] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200615085132.166470-1-yanaijie@huawei.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08f2fs: don't keep meta inode pages used for compressed block migrationChao Yu1-2/+3
meta inode's pages are used for encrypted, verity and compressed blocks, so the meta inode's cache invalidation condition in do_checkpoint() should consider compression as well, not just for verity and encryption, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08f2fs: add inline encryption supportSatya Tangirala3-14/+102
Wire up f2fs to support inline encryption via the helper functions which fs/crypto/ now provides. This includes: - Adding a mount option 'inlinecrypt' which enables inline encryption on encrypted files where it can be used. - Setting the bio_crypt_ctx on bios that will be submitted to an inline-encrypted file. - Not adding logically discontiguous data to bios that will be submitted to an inline-encrypted file. - Not doing filesystem-layer crypto on inline-encrypted files. This patch includes a fix for a race during IPU by Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08f2fs: fix error path in do_recover_data()Chao Yu4-13/+26
- don't panic kernel if f2fs_get_node_page() fails in f2fs_recover_inline_data() or f2fs_recover_inline_xattr(); - return error number of f2fs_truncate_blocks() to f2fs_recover_inline_data()'s caller; Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08f2fs: fix to wait GCed compressed page writebackChao Yu1-0/+7
like we did for encrypted page. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08f2fs: remove write attribute of main_blkaddr sysfs nodeDehe Gu1-1/+8
Fuzzing main_blkaddr sysfs node will corrupt this field's value, causing kernel panic, remove its write attribute to avoid potential security risk. [Chao Yu: add description] Signed-off-by: Dehe Gu <gudehe@huawei.com> Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: add GC_URGENT_LOW mode in gc_urgentDaeho Jeong4-10/+20
Added a new gc_urgent mode, GC_URGENT_LOW, in which mode F2FS will lower the bar of checking idle in order to process outstanding discard commands and GC a little bit aggressively. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: avoid readahead race conditionJaegeuk Kim3-0/+23
If two readahead threads having same offset enter in readpages, every read IOs are split and issued to the disk which giving lower bandwidth. This patch tries to avoid redundant readahead calls. Fixes one build error reported by Randy. Fix build error when F2FS_FS_COMPRESSION is not set/enabled. This label is needed in either case. ../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’: ../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined goto next_page; Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: fix return value of move_data_block()Chao Yu1-1/+3
If f2fs_grab_cache_page() fails, it needs to return -ENOMEM. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: add parameter op_flag in f2fs_submit_page_read()Jia Yang1-4/+6
The parameter op_flag is not used in f2fs_get_read_data_page(), but it is used in f2fs_grab_read_bio(). Obviously, op_flag is not passed to f2fs_grab_read_bio() successfully. We need to add parameter in f2fs_submit_page_read() to pass it. The case: - gc_data_segment - f2fs_get_read_data_page(.., op_flag = REQ_RAHEAD,..) - f2fs_submit_page_read - f2fs_grab_read_bio(.., op_flag = 0, ..) Signed-off-by: Jia Yang <jiayang5@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: split f2fs_allocate_new_segments()Chao Yu4-19/+27
to two independent functions: - f2fs_allocate_new_segment() for specified type segment allocation - f2fs_allocate_new_segments() for all data type segments allocation Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: lost matching-pair of trace in f2fs_truncate_inode_blocksYubo Feng1-1/+3
if get_node_path() return -E2BIG and trace of f2fs_truncate_inode_blocks_enter/exit enabled then the matching-pair of trace_exit will lost in log. Signed-off-by: Yubo Feng <fengyubo3@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-07f2fs: fix an oops in f2fs_is_compressed_pageYu Changchun2-0/+15
This patch is to fix a crash: #3 [ffffb6580689f898] oops_end at ffffffffa2835bc2 #4 [ffffb6580689f8b8] no_context at ffffffffa28766e7 #5 [ffffb6580689f920] async_page_fault at ffffffffa320135e [exception RIP: f2fs_is_compressed_page+34] RIP: ffffffffa2ba83a2 RSP: ffffb6580689f9d8 RFLAGS: 00010213 RAX: 0000000000000001 RBX: fffffc0f50b34bc0 RCX: 0000000000002122 RDX: 0000000000002123 RSI: 0000000000000c00 RDI: fffffc0f50b34bc0 RBP: ffff97e815a40178 R8: 0000000000000000 R9: ffff97e83ffc9000 R10: 0000000000032300 R11: 0000000000032380 R12: ffffb6580689fa38 R13: fffffc0f50b34bc0 R14: ffff97e825cbd000 R15: 0000000000000c00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffffb6580689f9d8] __is_cp_guaranteed at ffffffffa2b7ea98 #7 [ffffb6580689f9f0] f2fs_submit_page_write at ffffffffa2b81a69 #8 [ffffb6580689fa30] f2fs_do_write_meta_page at ffffffffa2b99777 #9 [ffffb6580689fae0] __f2fs_write_meta_page at ffffffffa2b75f1a #10 [ffffb6580689fb18] f2fs_sync_meta_pages at ffffffffa2b77466 #11 [ffffb6580689fc98] do_checkpoint at ffffffffa2b78e46 #12 [ffffb6580689fd88] f2fs_write_checkpoint at ffffffffa2b79c29 #13 [ffffb6580689fdd0] f2fs_sync_fs at ffffffffa2b69d95 #14 [ffffb6580689fe20] sync_filesystem at ffffffffa2ad2574 #15 [ffffb6580689fe30] generic_shutdown_super at ffffffffa2a9b582 #16 [ffffb6580689fe48] kill_block_super at ffffffffa2a9b6d1 #17 [ffffb6580689fe60] kill_f2fs_super at ffffffffa2b6abe1 #18 [ffffb6580689fea0] deactivate_locked_super at ffffffffa2a9afb6 #19 [ffffb6580689feb8] cleanup_mnt at ffffffffa2abcad4 #20 [ffffb6580689fee0] task_work_run at ffffffffa28bca28 #21 [ffffb6580689ff00] exit_to_usermode_loop at ffffffffa28050b7 #22 [ffffb6580689ff38] do_syscall_64 at ffffffffa280560e #23 [ffffb6580689ff50] entry_SYSCALL_64_after_hwframe at ffffffffa320008c This occurred when umount f2fs if enable F2FS_FS_COMPRESSION with F2FS_IO_TRACE. Fixes it by adding IS_IO_TRACED_PAGE to check validity of pid for page_private. Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>