summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-06-06Linux 5.10.120v5.10.120Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20220603173818.716010877@linuxfoundation.org Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Fox Chen <foxhlchen@gmail.com> Tested-by: Hulk Robot <hulkrobot@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytesLiu Jian1-2/+2
commit 45969b4152c1752089351cd6836a42a566d49bcf upstream. The data length of skb frags + frag_list may be greater than 0xffff, and skb_header_pointer can not handle negative offset. So, here INT_MAX is used to check the validity of offset. Add the same change to the related function skb_store_bytes. Fixes: 05c74e5e53f6 ("bpf: add bpf_skb_load_bytes helper") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06bpf: Fix potential array overflow in bpf_trampoline_get_progs()Yuntao Wang1-6/+12
commit a2aa95b71c9bbec793b5c5fa50f0a80d882b3e8d upstream. The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline can exceed BPF_MAX_TRAMP_PROGS. When this happens, the assignment '*progs++ = aux->prog' in bpf_trampoline_get_progs() will cause progs array overflow as the progs field in the bpf_tramp_progs struct can only hold at most BPF_MAX_TRAMP_PROGS bpf programs. Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code") Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06NFSD: Fix possible sleep during nfsd4_release_lockowner()Chuck Lever1-8/+4
commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream. nfsd4_release_lockowner() holds clp->cl_lock when it calls check_for_locks(). However, check_for_locks() calls nfsd_file_get() / nfsd_file_put() to access the backing inode's flc_posix list, and nfsd_file_put() can sleep if the inode was recently removed. Let's instead rely on the stateowner's reference count to gate whether the release is permitted. This should be a reliable indication of locks-in-use since file lock operations and ->lm_get_owner take appropriate references, which are released appropriately when file locks are removed. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06NFS: Memory allocation failures are not server fatal errorsTrond Myklebust1-0/+1
commit 452284407c18d8a522c3039339b1860afa0025a8 upstream. We need to filter out ENOMEM in nfs_error_is_fatal_on_server(), because running out of memory on our client is not a server error. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 2dc23afffbca ("NFS: ENOMEM should also be a fatal error.") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06docs: submitting-patches: Fix crossref to 'The canonical patch format'Akira Yokosawa1-1/+1
commit 6d5aa418b3bd42cdccc36e94ee199af423ef7c84 upstream. The reference to `explicit_in_reply_to` is pointless as when the reference was added in the form of "#15" [1], Section 15) was "The canonical patch format". The reference of "#15" had not been properly updated in a couple of reorganizations during the plain-text SubmittingPatches era. Fix it by using `the_canonical_patch_format`. [1]: 2ae19acaa50a ("Documentation: Add "how to write a good patch summary" to SubmittingPatches") Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Fixes: 5903019b2a5e ("Documentation/SubmittingPatches: convert it to ReST markup") Fixes: 9b2c76777acc ("Documentation/SubmittingPatches: enrich the Sphinx output") Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: stable@vger.kernel.org # v4.9+ Link: https://lore.kernel.org/r/64e105a5-50be-23f2-6cae-903a2ea98e18@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()Xiu Jianfeng1-0/+1
commit d0dc1a7100f19121f6e7450f9cdda11926aa3838 upstream. Currently it returns zero when CRQ response timed out, it should return an error code instead. Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06tpm: Fix buffer access in tpm2_get_tpm_pt()Stefan Mahnke-Hartmann1-1/+10
commit e57b2523bd37e6434f4e64c7a685e3715ad21e9a upstream. Under certain conditions uninitialized memory will be accessed. As described by TCG Trusted Platform Module Library Specification, rev. 1.59 (Part 3: Commands), if a TPM2_GetCapability is received, requesting a capability, the TPM in field upgrade mode may return a zero length list. Check the property count in tpm2_get_tpm_pt(). Fixes: 2ab3241161b3 ("tpm: migrate tpm2_get_tpm_pt() to use struct tpm_buf") Cc: stable@vger.kernel.org Signed-off-by: Stefan Mahnke-Hartmann <stefan.mahnke-hartmann@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06HID: multitouch: add quirks to enable Lenovo X12 trackpointTao Jin2-0/+7
commit 95cd2cdc88c755dcd0a58b951faeb77742c733a4 upstream. This applies the similar quirks used by previous generation devices such as X1 tablet for X12 tablet, so that the trackpoint and buttons can work. This patch was applied and tested working on 5.17.1 . Cc: stable@vger.kernel.org # 5.8+ given that it relies on 40d5bb87377a Signed-off-by: Tao Jin <tao-j@outlook.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/CO6PR03MB6241CB276FCDC7F4CEDC34F6E1E29@CO6PR03MB6241.namprd03.prod.outlook.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06HID: multitouch: Add support for Google Whiskers TouchpadMarek Maślanka1-0/+3
commit 1d07cef7fd7599450b3d03e1915efc2a96e1f03f upstream. The Google Whiskers touchpad does not work properly with the default multitouch configuration. Instead, use the same configuration as Google Rose. Signed-off-by: Marek Maslanka <mm@semihalf.com> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06raid5: introduce MD_BROKENMariusz Tkaczyk1-25/+22
commit 57668f0a4cc4083a120cc8c517ca0055c4543b59 upstream. Raid456 module had allowed to achieve failed state. It was fixed by fb73b357fb9 ("raid5: block failing device if raid will be failed"). This fix introduces a bug, now if raid5 fails during IO, it may result with a hung task without completion. Faulty flag on the device is necessary to process all requests and is checked many times, mainly in analyze_stripe(). Allow to set faulty on drive again and set MD_BROKEN if raid is failed. As a result, this level is allowed to achieve failed state again, but communication with userspace (via -EBUSY status) will be preserved. This restores possibility to fail array via #mdadm --set-faulty command and will be fixed by additional verification on mdadm side. Reproduction steps: mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean mkfs.xfs /dev/md126 -f mount /dev/md126 /mnt/root/ fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 & echo 1 > /sys/block/nvme2n1/device/device/remove echo 1 > /sys/block/nvme1n1/device/device/remove [ 1475.787779] Call Trace: [ 1475.793111] __schedule+0x2a6/0x700 [ 1475.799460] schedule+0x38/0xa0 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456] [ 1475.813856] ? finish_wait+0x80/0x80 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456] [ 1475.828281] ? finish_wait+0x80/0x80 [ 1475.834727] ? finish_wait+0x80/0x80 [ 1475.841127] ? finish_wait+0x80/0x80 [ 1475.847480] md_handle_request+0x119/0x190 [ 1475.854390] md_make_request+0x8a/0x190 [ 1475.861041] generic_make_request+0xcf/0x310 [ 1475.868145] submit_bio+0x3c/0x160 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390 [ 1475.889149] iomap_apply+0xff/0x310 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.909974] iomap_dio_rw+0x2f2/0x490 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.923680] ? atime_needs_update+0x77/0xe0 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs] [ 1475.953403] aio_read+0xd5/0x180 [ 1475.959395] ? _cond_resched+0x15/0x30 [ 1475.965907] io_submit_one+0x20b/0x3c0 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180 [ 1475.979335] ? do_io_getevents+0x7c/0xc0 [ 1475.986009] do_syscall_64+0x5b/0x1a0 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 1476.000255] RIP: 0033:0x7f11fc27978d [ 1476.006631] Code: Bad RIP value. [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds. Cc: stable@vger.kernel.org Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed") Reviewd-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06dm verity: set DM_TARGET_IMMUTABLE feature flagSarthak Kukreti1-0/+1
commit 4caae58406f8ceb741603eee460d79bacca9b1b5 upstream. The device-mapper framework provides a mechanism to mark targets as immutable (and hence fail table reloads that try to change the target type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's feature flags to prevent switching the verity target with a different target type. Fixes: a4ffc152198e ("dm: add verity target") Cc: stable@vger.kernel.org Signed-off-by: Sarthak Kukreti <sarthakkukreti@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06dm stats: add cond_resched when looping over entriesMikulas Patocka1-0/+8
commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream. dm-stats can be used with a very large number of entries (it is only limited by 1/4 of total system memory), so add rescheduling points to the loops that iterate over the entries. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06dm crypt: make printing of the key constant-timeMikulas Patocka1-3/+11
commit 567dd8f34560fa221a6343729474536aa7ede4fd upstream. The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to report the current key to userspace. However, this is not a constant-time operation and it may leak information about the key via timing, via cache access patterns or via the branch predictor. Change dm-crypt's key printing to use "%c" instead of "%02x". Also introduce hex2asc() that carefully avoids any branching or memory accesses when converting a number in the range 0 ... 15 to an ascii character. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06dm integrity: fix error code in dm_integrity_ctr()Dan Carpenter1-2/+0
commit d3f2a14b8906df913cb04a706367b012db94a6e8 upstream. The "r" variable shadows an earlier "r" that has function scope. It means that we accidentally return success instead of an error code. Smatch has a warning for this: drivers/md/dm-integrity.c:4503 dm_integrity_ctr() warn: missing error code 'r' Fixes: 7eada909bfd7 ("dm: add integrity target") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06ARM: dts: s5pv210: Correct interrupt name for bluetooth in AriesJonathan Bakker1-1/+1
commit 3f5e3d3a8b895c8a11da8b0063ba2022dd9e2045 upstream. Correct the name of the bluetooth interrupt from host-wake to host-wakeup. Fixes: 1c65b6184441b ("ARM: dts: s5pv210: Correct BCM4329 bluetooth node") Cc: <stable@vger.kernel.org> Signed-off-by: Jonathan Bakker <xc-racer2@live.ca> Link: https://lore.kernel.org/r/CY4PR04MB0567495CFCBDC8D408D44199CB1C9@CY4PR04MB0567.namprd04.prod.outlook.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06Bluetooth: hci_qca: Use del_timer_sync() before freeingSteven Rostedt1-2/+2
commit 72ef98445aca568a81c2da050532500a8345ad3a upstream. While looking at a crash report on a timer list being corrupted, which usually happens when a timer is freed while still active. This is commonly triggered by code calling del_timer() instead of del_timer_sync() just before freeing. One possible culprit is the hci_qca driver, which does exactly that. Eric mentioned that wake_retrans_timer could be rearmed via the work queue, so also move the destruction of the work queue before del_timer_sync(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for UART") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06zsmalloc: fix races between asynchronous zspage free and page migrationSultan Alsawaf1-4/+33
commit 2505a981114dcb715f8977b8433f7540854851d8 upstream. The asynchronous zspage free worker tries to lock a zspage's entire page list without defending against page migration. Since pages which haven't yet been locked can concurrently migrate off the zspage page list while lock_zspage() churns away, lock_zspage() can suffer from a few different lethal races. It can lock a page which no longer belongs to the zspage and unsafely dereference page_private(), it can unsafely dereference a torn pointer to the next page (since there's a data race), and it can observe a spurious NULL pointer to the next page and thus not lock all of the zspage's pages (since a single page migration will reconstruct the entire page list, and create_page_chain() unconditionally zeroes out each list pointer in the process). Fix the races by using migrate_read_lock() in lock_zspage() to synchronize with page migration. Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com Fixes: 77ff465799c602 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse") Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: ecrdsa - Fix incorrect use of vli_cmpVitaly Chikunov1-4/+4
commit 7cc7ab73f83ee6d50dc9536bc3355495d8600fad upstream. Correctly compare values that shall be greater-or-equal and not just greater. Fixes: 0d7a78643f69 ("crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm") Cc: <stable@vger.kernel.org> Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: caam - fix i.MX6SX entropy delay valueFabio Estevam1-0/+18
commit 4ee4cdad368a26de3967f2975806a9ee2fa245df upstream. Since commit 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") the following CAAM errors can be seen on i.MX6SX: caam_jr 2101000.jr: 20003c5b: CCB: desc idx 60: RNG: Hardware error hwrng: no data available This error is due to an incorrect entropy delay for i.MX6SX. Fix it by increasing the minimum entropy delay for i.MX6SX as done in U-Boot: https://patchwork.ozlabs.org/project/uboot/patch/20220415111049.2565744-1-gaurav.jain@nxp.com/ As explained in the U-Boot patch: "RNG self tests are run to determine the correct entropy delay. Such tests are executed with different voltages and temperatures to identify the worst case value for the entropy delay. For i.MX6SX, it was determined that after adding a margin value of 1000 the minimum entropy delay should be at least 12000." Cc: <stable@vger.kernel.org> Fixes: 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Vabhav Sharma <vabhav.sharma@nxp.com> Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06KVM: x86: avoid calling x86 emulator without a decoded instructionSean Christopherson1-12/+19
commit fee060cd52d69c114b62d1a2948ea9648b5131f9 upstream. Whenever x86_decode_emulated_instruction() detects a breakpoint, it returns the value that kvm_vcpu_check_breakpoint() writes into its pass-by-reference second argument. Unfortunately this is completely bogus because the expected outcome of x86_decode_emulated_instruction is an EMULATION_* value. Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK and x86_emulate_instruction() is called without having decoded the instruction. This causes various havoc from running with a stale emulation context. The fix is to move the call to kvm_vcpu_check_breakpoint() where it was before commit 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") introduced x86_decode_emulated_instruction(). The other caller of the function does not need breakpoint checks, because it is invoked as part of a vmexit and the processor has already checked those before executing the instruction that #GP'd. This fixes CVE-2022-1852. Reported-by: Qiuhao Li <qiuhao@sysec.org> Reported-by: Gaoning Pan <pgn@zju.edu.cn> Reported-by: Yongkang Jia <kangel@zju.edu.cn> Fixes: 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220311032801.3467418-2-seanjc@google.com> [Rewrote commit message according to Qiuhao's report, since a patch already existed to fix the bug. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86, kvm: use correct GFP flags for preemption disabledPaolo Bonzini1-1/+1
commit baec4f5a018fe2d708fc1022330dba04b38b5fe3 upstream. Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of raw spinlock") leads to the following Smatch static checker warning: arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake() warn: sleeping in atomic context arch/x86/kernel/kvm.c 202 raw_spin_lock(&b->lock); 203 n = _find_apf_task(b, token); 204 if (!n) { 205 /* 206 * Async #PF not yet handled, add a dummy entry for the token. 207 * Allocating the token must be down outside of the raw lock 208 * as the allocator is preemptible on PREEMPT_RT kernels. 209 */ 210 if (!dummy) { 211 raw_spin_unlock(&b->lock); --> 212 dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); ^^^^^^^^^^ Smatch thinks the caller has preempt disabled. The `smdb.py preempt kvm_async_pf_task_wake` output call tree is: sysvec_kvm_asyncpf_interrupt() <- disables preempt -> __sysvec_kvm_asyncpf_interrupt() -> kvm_async_pf_task_wake() The caller is this: arch/x86/kernel/kvm.c 290 DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt) 291 { 292 struct pt_regs *old_regs = set_irq_regs(regs); 293 u32 token; 294 295 ack_APIC_irq(); 296 297 inc_irq_stat(irq_hv_callback_count); 298 299 if (__this_cpu_read(apf_reason.enabled)) { 300 token = __this_cpu_read(apf_reason.token); 301 kvm_async_pf_task_wake(token); 302 __this_cpu_write(apf_reason.token, 0); 303 wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1); 304 } 305 306 set_irq_regs(old_regs); 307 } The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function from the call_on_irqstack_cond(). It's inside the call_on_irqstack_cond() where preempt is disabled (unless it's already disabled). The irq_enter/exit_rcu() functions disable/enable preempt. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06x86/kvm: Alloc dummy async #PF token outside of raw spinlockSean Christopherson1-14/+27
commit 0547758a6de3cc71a0cfdd031a3621a30db6a68b upstream. Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the the dummy async #PF token, the allocator is preemptible on PREEMPT_RT kernels and must not be called from truly atomic contexts. Opportunistically document why it's ok to loop on allocation failure, i.e. why the function won't get stuck in an infinite loop. Reported-by: Yajun Deng <yajun.deng@linux.dev> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06KVM: PPC: Book3S HV: fix incorrect NULL check on list iteratorXiaomeng Tong1-3/+5
commit 300981abddcb13f8f06ad58f52358b53a8096775 upstream. The bug is here: if (!p) return ret; The list iterator value 'p' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, Use a new value 'iter' as the list iterator, while use the old value 'p' as a dedicated variable to point to the found element. Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06netfilter: conntrack: re-fetch conntrack after insertionFlorian Westphal1-1/+6
commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream. In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry. This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger. Reported-by: <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com> Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06netfilter: nf_tables: sanitize nft_set_desc_concat_parse()Pablo Neira Ayuso1-4/+13
commit fecf31ee395b0295f2d7260aa29946b7605f7c85 upstream. Add several sanity checks for nft_set_desc_concat_parse(): - validate desc->field_count not larger than desc->field_len array. - field length cannot be larger than desc->field_len (ie. U8_MAX) - total length of the concatenation cannot be larger than register array. Joint work with Florian Westphal. Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reported-by: <zhangziming.zzm@antgroup.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - make reseeding from get_random_bytes() synchronousNicolai Stange3-54/+11
commit 074bcd4000e0d812bc253f86fedc40f81ed59ccc upstream. get_random_bytes() usually hasn't full entropy available by the time DRBG instances are first getting seeded from it during boot. Thus, the DRBG implementation registers random_ready_callbacks which would in turn schedule some work for reseeding the DRBGs once get_random_bytes() has sufficient entropy available. For reference, the relevant history around handling DRBG (re)seeding in the context of a not yet fully seeded get_random_bytes() is: commit 16b369a91d0d ("random: Blocking API for accessing nonblocking_pool") commit 4c7879907edd ("crypto: drbg - add async seeding operation") commit 205a525c3342 ("random: Add callback API for random pool readiness") commit 57225e679788 ("crypto: drbg - Use callback API for random readiness") commit c2719503f5e1 ("random: Remove kernel blocking API") However, some time later, the initialization state of get_random_bytes() has been made queryable via rng_is_initialized() introduced with commit 9a47249d444d ("random: Make crng state queryable"). This primitive now allows for streamlining the DRBG reseeding from get_random_bytes() by replacing that aforementioned asynchronous work scheduling from random_ready_callbacks with some simpler, synchronous code in drbg_generate() next to the related logic already present therein. Apart from improving overall code readability, this change will also enable DRBG users to rely on wait_for_random_bytes() for ensuring that the initial seeding has completed, if desired. The previous patches already laid the grounds by making drbg_seed() to record at each DRBG instance whether it was being seeded at a time when rng_is_initialized() still had been false as indicated by ->seeded == DRBG_SEED_STATE_PARTIAL. All that remains to be done now is to make drbg_generate() check for this condition, determine whether rng_is_initialized() has flipped to true in the meanwhile and invoke a reseed from get_random_bytes() if so. Make this move: - rename the former drbg_async_seed() work handler, i.e. the one in charge of reseeding a DRBG instance from get_random_bytes(), to "drbg_seed_from_random()", - change its signature as appropriate, i.e. make it take a struct drbg_state rather than a work_struct and change its return type from "void" to "int" in order to allow for passing error information from e.g. its __drbg_seed() invocation onwards to callers, - make drbg_generate() invoke this drbg_seed_from_random() once it encounters a DRBG instance with ->seeded == DRBG_SEED_STATE_PARTIAL by the time rng_is_initialized() has flipped to true and - prune everything related to the former, random_ready_callback based mechanism. As drbg_seed_from_random() is now getting invoked from drbg_generate() with the ->drbg_mutex being held, it must not attempt to recursively grab it once again. Remove the corresponding mutex operations from what is now drbg_seed_from_random(). Furthermore, as drbg_seed_from_random() can now report errors directly to its caller, there's no need for it to temporarily switch the DRBG's ->seeded state to DRBG_SEED_STATE_UNSEEDED so that a failure of the subsequently invoked __drbg_seed() will get signaled to drbg_generate(). Don't do it then. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [Jason: for stable, undid the modifications for the backport of 5acd3548.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - move dynamic ->reseed_threshold adjustments to __drbg_seed()Nicolai Stange1-9/+21
commit 262d83a4290c331cd4f617a457408bdb82fbb738 upstream. Since commit 42ea507fae1a ("crypto: drbg - reseed often if seedsource is degraded"), the maximum seed lifetime represented by ->reseed_threshold gets temporarily lowered if the get_random_bytes() source cannot provide sufficient entropy yet, as is common during boot, and restored back to the original value again once that has changed. More specifically, if the add_random_ready_callback() invoked from drbg_prepare_hrng() in the course of DRBG instantiation does not return -EALREADY, that is, if get_random_bytes() has not been fully initialized at this point yet, drbg_prepare_hrng() will lower ->reseed_threshold to a value of 50. The drbg_async_seed() scheduled from said random_ready_callback will eventually restore the original value. A future patch will replace the random_ready_callback based notification mechanism and thus, there will be no add_random_ready_callback() return value anymore which could get compared to -EALREADY. However, there's __drbg_seed() which gets invoked in the course of both, the DRBG instantiation as well as the eventual reseeding from get_random_bytes() in aforementioned drbg_async_seed(), if any. Moreover, it knows about the get_random_bytes() initialization state by the time the seed data had been obtained from it: the new_seed_state argument introduced with the previous patch would get set to DRBG_SEED_STATE_PARTIAL in case get_random_bytes() had not been fully initialized yet and to DRBG_SEED_STATE_FULL otherwise. Thus, __drbg_seed() provides a convenient alternative for managing that ->reseed_threshold lowering and restoring at a central place. Move all ->reseed_threshold adjustment code from drbg_prepare_hrng() and drbg_async_seed() respectively to __drbg_seed(). Make __drbg_seed() lower the ->reseed_threshold to 50 in case its new_seed_state argument equals DRBG_SEED_STATE_PARTIAL and let it restore the original value otherwise. There is no change in behaviour. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - track whether DRBG was seeded with !rng_is_initialized()Nicolai Stange2-4/+9
commit 2bcd25443868aa8863779a6ebc6c9319633025d2 upstream. Currently, the DRBG implementation schedules asynchronous works from random_ready_callbacks for reseeding the DRBG instances with output from get_random_bytes() once the latter has sufficient entropy available. However, as the get_random_bytes() initialization state can get queried by means of rng_is_initialized() now, there is no real need for this asynchronous reseeding logic anymore and it's better to keep things simple by doing it synchronously when needed instead, i.e. from drbg_generate() once rng_is_initialized() has flipped to true. Of course, for this to work, drbg_generate() would need some means by which it can tell whether or not rng_is_initialized() has flipped to true since the last seeding from get_random_bytes(). Or equivalently, whether or not the last seed from get_random_bytes() has happened when rng_is_initialized() was still evaluating to false. As it currently stands, enum drbg_seed_state allows for the representation of two different DRBG seeding states: DRBG_SEED_STATE_UNSEEDED and DRBG_SEED_STATE_FULL. The former makes drbg_generate() to invoke a full reseeding operation involving both, the rather expensive jitterentropy as well as the get_random_bytes() randomness sources. The DRBG_SEED_STATE_FULL state on the other hand implies that no reseeding at all is required for a !->pr DRBG variant. Introduce the new DRBG_SEED_STATE_PARTIAL state to enum drbg_seed_state for representing the condition that a DRBG was being seeded when rng_is_initialized() had still been false. In particular, this new state implies that - the given DRBG instance has been fully seeded from the jitterentropy source (if enabled) - and drbg_generate() is supposed to reseed from get_random_bytes() *only* once rng_is_initialized() turns to true. Up to now, the __drbg_seed() helper used to set the given DRBG instance's ->seeded state to constant DRBG_SEED_STATE_FULL. Introduce a new argument allowing for the specification of the to be written ->seeded value instead. Make the first of its two callers, drbg_seed(), determine the appropriate value based on rng_is_initialized(). The remaining caller, drbg_async_seed(), is known to get invoked only once rng_is_initialized() is true, hence let it pass constant DRBG_SEED_STATE_FULL for the new argument to __drbg_seed(). There is no change in behaviour, except for that the pr_devel() in drbg_generate() would now report "unseeded" for ->pr DRBG instances which had last been seeded when rng_is_initialized() was still evaluating to false. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - prepare for more fine-grained tracking of seeding stateNicolai Stange2-10/+16
commit ce8ce31b2c5c8b18667784b8c515650c65d57b4e upstream. There are two different randomness sources the DRBGs are getting seeded from, namely the jitterentropy source (if enabled) and get_random_bytes(). At initial DRBG seeding time during boot, the latter might not have collected sufficient entropy for seeding itself yet and thus, the DRBG implementation schedules a reseed work from a random_ready_callback once that has happened. This is particularly important for the !->pr DRBG instances, for which (almost) no further reseeds are getting triggered during their lifetime. Because collecting data from the jitterentropy source is a rather expensive operation, the aforementioned asynchronously scheduled reseed work restricts itself to get_random_bytes() only. That is, it in some sense amends the initial DRBG seed derived from jitterentropy output at full (estimated) entropy with fresh randomness obtained from get_random_bytes() once that has been seeded with sufficient entropy itself. With the advent of rng_is_initialized(), there is no real need for doing the reseed operation from an asynchronously scheduled work anymore and a subsequent patch will make it synchronous by moving it next to related logic already present in drbg_generate(). However, for tracking whether a full reseed including the jitterentropy source is required or a "partial" reseed involving only get_random_bytes() would be sufficient already, the boolean struct drbg_state's ->seeded member must become a tristate value. Prepare for this by introducing the new enum drbg_seed_state and change struct drbg_state's ->seeded member's type from bool to that type. For facilitating review, enum drbg_seed_state is made to only contain two members corresponding to the former ->seeded values of false and true resp. at this point: DRBG_SEED_STATE_UNSEEDED and DRBG_SEED_STATE_FULL. A third one for tracking the intermediate state of "seeded from jitterentropy only" will be introduced with a subsequent patch. There is no change in behaviour at this point. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06lib/crypto: add prompts back to crypto librariesJustin M. Forbes3-7/+14
commit e56e18985596617ae426ed5997fb2e737cffb58b upstream. Commit 6048fdcc5f269 ("lib/crypto: blake2s: include as built-in") took away a number of prompt texts from other crypto libraries. This makes values flip from built-in to module when oldconfig runs, and causes problems when these crypto libs need to be built in for thingslike BIG_KEYS. Fixes: 6048fdcc5f269 ("lib/crypto: blake2s: include as built-in") Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org> [Jason: - moved menu into submenu of lib/ instead of root menu - fixed chacha sub-dependencies for CONFIG_CRYPTO] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06exfat: check if cluster num is validTadeusz Struk3-10/+14
commit 64ba4b15e5c045f8b746c6da5fc9be9a6b00b61d upstream. Syzbot reported slab-out-of-bounds read in exfat_clear_bitmap. This was triggered by reproducer calling truncute with size 0, which causes the following trace: BUG: KASAN: slab-out-of-bounds in exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 Read of size 8 at addr ffff888115aa9508 by task syz-executor251/365 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack_lvl+0x1e2/0x24b lib/dump_stack.c:118 print_address_description+0x81/0x3c0 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report+0x1a4/0x1f0 mm/kasan/report.c:436 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:309 exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 exfat_free_cluster+0x25a/0x4a0 fs/exfat/fatent.c:181 __exfat_truncate+0x99e/0xe00 fs/exfat/file.c:217 exfat_truncate+0x11b/0x4f0 fs/exfat/file.c:243 exfat_setattr+0xa03/0xd40 fs/exfat/file.c:339 notify_change+0xb76/0xe10 fs/attr.c:336 do_truncate+0x1ea/0x2d0 fs/open.c:65 Move the is_valid_cluster() helper from fatent.c to a common header to make it reusable in other *.c files. And add is_valid_cluster() to validate if cluster number is within valid range in exfat_clear_bitmap() and exfat_set_bitmap(). Link: https://syzkaller.appspot.com/bug?id=50381fc73821ecae743b8cf24b4c9a04776f767c Reported-by: syzbot+a4087e40b9c13aad7892@syzkaller.appspotmail.com Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()Gustavo A. R. Silva1-1/+1
commit 336feb502a715909a8136eb6a62a83d7268a353b upstream. Fix the following -Wstringop-overflow warnings when building with GCC-11: drivers/gpu/drm/i915/intel_pm.c:3106:9: warning: ‘intel_read_wm_latency’ accessing 16 bytes in a region of size 10 [-Wstringop-overflow=] 3106 | intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/i915/intel_pm.c:3106:9: note: referencing argument 2 of type ‘u16 *’ {aka ‘short unsigned int *’} drivers/gpu/drm/i915/intel_pm.c:2861:13: note: in a call to function ‘intel_read_wm_latency’ 2861 | static void intel_read_wm_latency(struct drm_i915_private *dev_priv, | ^~~~~~~~~~~~~~~~~~~~~ by removing the over-specified array size from the argument declarations. It seems that this code is actually safe because the size of the array depends on the hardware generation, and the function checks for that. Notice that wm can be an array of 5 elements: drivers/gpu/drm/i915/intel_pm.c:3109: intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency); or an array of 8 elements: drivers/gpu/drm/i915/intel_pm.c:3131: intel_read_wm_latency(dev_priv, dev_priv->wm.skl_latency); and the compiler legitimately complains about that. This helps with the ongoing efforts to globally enable -Wstringop-overflow. Link: https://github.com/KSPP/linux/issues/181 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06xfs: Fix CIL throttle hang when CIL space used going backwardsDave Chinner3-24/+49
commit 19f4e7cc819771812a7f527d7897c2deffbf7a00 upstream. A hang with tasks stuck on the CIL hard throttle was reported and largely diagnosed by Donald Buczek, who discovered that it was a result of the CIL context space usage decrementing in committed transactions once the hard throttle limit had been hit and processes were already blocked. This resulted in the CIL push not waking up those waiters because the CIL context was no longer over the hard throttle limit. The surprising aspect of this was the CIL space usage going backwards regularly enough to trigger this situation. Assumptions had been made in design that the relogging process would only increase the size of the objects in the CIL, and so that space would only increase. This change and commit message fixes the issue and documents the result of an audit of the triggers that can cause the CIL space to go backwards, how large the backwards steps tend to be, the frequency in which they occur, and what the impact on the CIL accounting code is. Even though the CIL ctx->space_used can go backwards, it will only do so if the log item is already logged to the CIL and contains a space reservation for it's entire logged state. This is tracked by the shadow buffer state on the log item. If the item is not previously logged in the CIL it has no shadow buffer nor log vector, and hence the entire size of the logged item copied to the log vector is accounted to the CIL space usage. i.e. it will always go up in this case. If the item has a log vector (i.e. already in the CIL) and the size decreases, then the existing log vector will be overwritten and the space usage will go down. This is the only condition where the space usage reduces, and it can only occur when an item is already tracked in the CIL. Hence we are safe from CIL space usage underruns as a result of log items decreasing in size when they are relogged. Typically this reduction in CIL usage occurs from metadata blocks being free, such as when a btree block merge occurs or a directory enter/xattr entry is removed and the da-tree is reduced in size. This generally results in a reduction in size of around a single block in the CIL, but also tends to increase the number of log vectors because the parent and sibling nodes in the tree needs to be updated when a btree block is removed. If a multi-level merge occurs, then we see reduction in size of 2+ blocks, but again the log vector count goes up. The other vector is inode fork size changes, which only log the current size of the fork and ignore the previously logged size when the fork is relogged. Hence if we are removing items from the inode fork (dir/xattr removal in shortform, extent record removal in extent form, etc) the relogged size of the inode for can decrease. No other log items can decrease in size either because they are a fixed size (e.g. dquots) or they cannot be relogged (e.g. relogging an intent actually creates a new intent log item and doesn't relog the old item at all.) Hence the only two vectors for CIL context size reduction are relogging inode forks and marking buffers active in the CIL as stale. Long story short: the majority of the code does the right thing and handles the reduction in log item size correctly, and only the CIL hard throttle implementation is problematic and needs fixing. This patch makes that fix, as well as adds comments in the log item code that result in items shrinking in size when they are relogged as a clear reminder that this can and does happen frequently. The throttle fix is based upon the change Donald proposed, though it goes further to ensure that once the throttle is activated, it captures all tasks until the CIL push issues a wakeup, regardless of whether the CIL space used has gone back under the throttle threshold. This ensures that we prevent tasks reducing the CIL slightly under the throttle threshold and then making more changes that push it well over the throttle limit. This is acheived by checking if the throttle wait queue is already active as a condition of throttling. Hence once we start throttling, we continue to apply the throttle until the CIL context push wakes everything on the wait queue. We can use waitqueue_active() for the waitqueue manipulations and checks as they are all done under the ctx->xc_push_lock. Hence the waitqueue has external serialisation and we can safely peek inside the wait queue without holding the internal waitqueue locks. Many thanks to Donald for his diagnostic and analysis work to isolate the cause of this hang. Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06xfs: fix an ABBA deadlock in xfs_renameDarrick J. Wong3-20/+26
commit 6da1b4b1ab36d80a3994fd4811c8381de10af604 upstream. When overlayfs is running on top of xfs and the user unlinks a file in the overlay, overlayfs will create a whiteout inode and ask xfs to "rename" the whiteout file atop the one being unlinked. If the file being unlinked loses its one nlink, we then have to put the inode on the unlinked list. This requires us to grab the AGI buffer of the whiteout inode to take it off the unlinked list (which is where whiteouts are created) and to grab the AGI buffer of the file being deleted. If the whiteout was created in a higher numbered AG than the file being deleted, we'll lock the AGIs in the wrong order and deadlock. Therefore, grab all the AGI locks we think we'll need ahead of time, and i