summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-04-20Linux 5.10.112v5.10.112Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20220418121145.140991388@linuxfoundation.org Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Hulk Robot <hulkrobot@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: Fix UAF bugs in ax25 timersDuoming Zhou1-0/+5
commit 82e31755e55fbcea6a9dfaae5fe4860ade17cbc0 upstream. There are race conditions that may lead to UAF bugs in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call ax25_release() to deallocate ax25_dev. One of the UAF bugs caused by ax25_release() is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | ... | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25_std_establish_data_link() | ax25_start_t1timer() | ax25_dev_device_down() //(3) mod_timer(&ax25->t1timer,..) | | ax25_release() (wait a time) | ... | ax25_dev_put(ax25_dev) //(4)FREE ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //USE| ... ... | We increase the refcount of ax25_dev in position (1) and (2), and decrease the refcount of ax25_dev in position (3) and (4). The ax25_dev will be freed in position (4) and be used in ax25_t1timer_expiry(). The fail log is shown below: ============================================================== [ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0 [ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574 [ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14 [ 106.116942] Call Trace: ... [ 106.116942] ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] call_timer_fn+0x122/0x3d0 [ 106.116942] __run_timers.part.0+0x3f6/0x520 [ 106.116942] run_timer_softirq+0x4f/0xb0 [ 106.116942] __do_softirq+0x1c2/0x651 ... This patch adds del_timer_sync() in ax25_release(), which could ensure that all timers stop before we deallocate ax25_dev. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com> [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: Fix NULL pointer dereferences in ax25 timersDuoming Zhou2-8/+16
commit fc6d01ff9ef03b66d4a3a23b46fc3c3d8cf92009 upstream. The previous commit 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect") move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But there are race conditions that may lead to null pointer dereferences in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use ax25_kill_by_device() to detach the ax25 device. One of the race conditions that cause null pointer dereferences can be shown as below: (Thread 1) | (Thread 2) ax25_connect() | ax25_std_establish_data_link() | ax25_start_t1timer() | mod_timer(&ax25->t1timer,..) | | ax25_kill_by_device() (wait a time) | ... | s->ax25_dev = NULL; //(1) ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //(2)| ... ... | We set null to ax25_cb->ax25_dev in position (1) and dereference the null pointer in position (2). The corresponding fail log is shown below: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000050 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0 RIP: 0010:ax25_t1timer_expiry+0x12/0x40 ... Call Trace: call_timer_fn+0x21/0x120 __run_timers.part.0+0x1ca/0x250 run_timer_softirq+0x2c/0x60 __do_softirq+0xef/0x2f3 irq_exit_rcu+0xb6/0x100 sysvec_apic_timer_interrupt+0xa2/0xd0 ... This patch moves ax25_disconnect() before s->ax25_dev = NULL and uses del_timer_sync() to delete timers in ax25_disconnect(). If ax25_disconnect() is called by ax25_kill_by_device() or ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be equal to ENETUNREACH, it will wait all timers to stop before we set null to s->ax25_dev in ax25_kill_by_device(). Fixes: 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: fix NPD bug in ax25_disconnectDuoming Zhou1-1/+1
commit 7ec02f5ac8a5be5a3f20611731243dc5e1d9ba10 upstream. The ax25_disconnect() in ax25_kill_by_device() is not protected by any locks, thus there is a race condition between ax25_disconnect() and ax25_destroy_socket(). when ax25->sk is assigned as NULL by ax25_destroy_socket(), a NULL pointer dereference bug will occur if site (1) or (2) dereferences ax25->sk. ax25_kill_by_device() | ax25_release() ax25_disconnect() | ax25_destroy_socket() ... | if(ax25->sk != NULL) | ... ... | ax25->sk = NULL; bh_lock_sock(ax25->sk); //(1) | ... ... | bh_unlock_sock(ax25->sk); //(2)| This patch moves ax25_disconnect() into lock_sock(), which can synchronize with ax25_destroy_socket() in ax25_release(). Fail log: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000088 ... RIP: 0010:_raw_spin_lock+0x7e/0xd0 ... Call Trace: ax25_disconnect+0xf6/0x220 ax25_device_event+0x187/0x250 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x17d/0x230 rollback_registered_many+0x1f1/0x950 unregister_netdevice_queue+0x133/0x200 unregister_netdev+0x13/0x20 ... Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: fix UAF bug in ax25_send_control()Duoming Zhou1-4/+4
commit 5352a761308397a0e6250fdc629bb3f615b94747 upstream. There are UAF bugs in ax25_send_control(), when we call ax25_release() to deallocate ax25_dev. The possible race condition is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25->state = AX25_STATE_1 | ... | ax25_dev_device_down() //(3) (Thread 3) ax25_release() | ax25_dev_put() //(4) FREE | case AX25_STATE_1: | ax25_send_control() | alloc_skb() //USE | The refcount of ax25_dev increases in position (1) and (2), and decreases in position (3) and (4). The ax25_dev will be freed before dereference sites in ax25_send_control(). The following is part of the report: [ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210 [ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602 [ 102.297448] Call Trace: [ 102.303751] ax25_send_control+0x33/0x210 [ 102.303751] ax25_release+0x356/0x450 [ 102.305431] __sock_release+0x6d/0x120 [ 102.305431] sock_close+0xf/0x20 [ 102.305431] __fput+0x11f/0x420 [ 102.305431] task_work_run+0x86/0xd0 [ 102.307130] get_signal+0x1075/0x1220 [ 102.308253] arch_do_signal_or_restart+0x1df/0xc00 [ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0 [ 102.308253] syscall_exit_to_user_mode+0x19/0x50 [ 102.308253] do_syscall_64+0x48/0x90 [ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 102.308253] RIP: 0033:0x405ae7 This patch defers the free operation of ax25_dev and net_device after all corresponding dereference sites in ax25_release() to avoid UAF. Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com> [OP: backport to 5.10: adjust dev_put_track()->dev_put()] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: Fix refcount leaks caused by ax25_cb_del()Duoming Zhou1-3/+11
commit 9fd75b66b8f68498454d685dc4ba13192ae069b0 upstream. The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") increase the refcounts of ax25_dev and net_device in ax25_bind() and decrease the matching refcounts in ax25_kill_by_device() in order to prevent UAF bugs, but there are reference count leaks. The root cause of refcount leaks is shown below: (Thread 1) | (Thread 2) ax25_bind() | ... | ax25_addr_ax25dev() | ax25_dev_hold() //(1) | ... | dev_hold_track() //(2) | ... | ax25_destroy_socket() | ax25_cb_del() | ... | hlist_del_init() //(3) | | (Thread 3) | ax25_kill_by_device() | ... | ax25_for_each(s, &ax25_list) { | if (s->ax25_dev == ax25_dev) //(4) | ... | Firstly, we use ax25_bind() to increase the refcount of ax25_dev in position (1) and increase the refcount of net_device in position (2). Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete ax25_cb in hlist in position (3) before calling ax25_kill_by_device(). Finally, the decrements of refcounts in ax25_kill_by_device() will not be executed, because no s->ax25_dev equals to ax25_dev in position (4). This patch adds decrements of refcounts in ax25_release() and use lock_sock() to do synchronization. If refcounts decrease in ax25_release(), the decrements of refcounts in ax25_kill_by_device() will not be executed and vice versa. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Fixes: 87563a043cef ("ax25: fix reference count leaks of ax25_dev") Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") Reported-by: Thomas Osterried <thomas@osterried.de> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.10: adjust dev_put_track()->dev_put()] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: fix UAF bugs of net_device caused by rebinding operationDuoming Zhou1-1/+4
commit feef318c855a361a1eccd880f33e88c460eb63b4 upstream. The ax25_kill_by_device() will set s->ax25_dev = NULL and call ax25_disconnect() to change states of ax25_cb and sock, if we call ax25_bind() before ax25_kill_by_device(). However, if we call ax25_bind() again between the window of ax25_kill_by_device() and ax25_dev_device_down(), the values and states changed by ax25_kill_by_device() will be reassigned. Finally, ax25_dev_device_down() will deallocate net_device. If we dereference net_device in syscall functions such as ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname() and ax25_info_show(), a UAF bug will occur. One of the possible race conditions is shown below: (USE) | (FREE) ax25_bind() | | ax25_kill_by_device() ax25_bind() | ax25_connect() | ... | ax25_dev_device_down() | ... | dev_put_track(dev, ...) //FREE ax25_release() | ... ax25_send_control() | alloc_skb() //USE | the corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210 ... Call Trace: ... ax25_send_control+0x43/0x210 ax25_release+0x2db/0x3b0 __sock_release+0x6d/0x120 sock_close+0xf/0x20 __fput+0x11f/0x420 ... Allocated by task 1283: ... __kasan_kmalloc+0x81/0xa0 alloc_netdev_mqs+0x5a/0x680 mkiss_open+0x6c/0x380 tty_ldisc_open+0x55/0x90 ... Freed by task 1969: ... kfree+0xa3/0x2c0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 tty_ldisc_kill+0x3e/0x80 ... In order to fix these UAF bugs caused by rebinding operation, this patch adds dev_hold_track() into ax25_bind() and corresponding dev_put_track() into ax25_kill_by_device(). Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.10: adjust dev_put_track()->dev_put() and dev_hold_track()->dev_hold()] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: fix reference count leaks of ax25_devDuoming Zhou4-19/+41
commit 87563a043cef044fed5db7967a75741cc16ad2b1 upstream. The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt(). This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: add refcount in ax25_dev to avoid UAF bugsDuoming Zhou4-2/+25
commit d01ffb9eee4af165d83b08dd73ebdf9fe94a519b upstream. If we dereference ax25_dev after we call kfree(ax25_dev) in ax25_dev_device_down(), it will lead to concurrency UAF bugs. There are eight syscall functions suffer from UAF bugs, include ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(), ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and ax25_info_show(). One of the concurrency UAF can be shown as below: (USE) | (FREE) | ax25_device_event | ax25_dev_device_down ax25_bind | ... ... | kfree(ax25_dev) ax25_fillin_cb() | ... ax25_fillin_cb_from_dev() | ... | The root cause of UAF bugs is that kfree(ax25_dev) in ax25_dev_device_down() is not protected by any locks. When ax25_dev, which there are still pointers point to, is released, the concurrency UAF bug will happen. This patch introduces refcount into ax25_dev in order to guarantee that there are no pointers point to it when ax25_dev is released. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.10: adjusted context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20scsi: iscsi: Fix unbound endpoint error handlingMike Christie1-29/+36
commit 03690d81974535f228e892a14f0d2d44404fe555 upstream. If a driver raises a connection error before the connection is bound, we can leave a cleanup_work queued that can later run and disconnect/stop a connection that is logged in. The problem is that drivers can call iscsi_conn_error_event for endpoints that are connected but not yet bound when something like the network port they are using is brought down. iscsi_cleanup_conn_work_fn will check for this and exit early, but if the cleanup_work is stuck behind other works, it might not get run until after userspace has done ep_disconnect. Because the endpoint is not yet bound there was no way for ep_disconnect to flush the work. The bug of leaving stop_conns queued was added in: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") and: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") was supposed to fix it, but left this case. This patch moves the conn state check to before we even queue the work so we can avoid queueing. Link: https://lore.kernel.org/r/20220408001314.5014-7-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20scsi: iscsi: Fix endpoint reuse regressionMike Christie1-1/+11
commit 0aadafb5c34403a7cced1a8d61877048dc059f70 upstream. This patch fixes a bug where when using iSCSI offload we can free an endpoint while userspace still thinks it's active. That then causes the endpoint ID to be reused for a new connection's endpoint while userspace still thinks the ID is for the original connection. Userspace will then end up disconnecting a running connection's endpoint or trying to bind to another connection's endpoint. This bug is a regression added in: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") where we added a in kernel ep_disconnect call to fix a bug in: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") where we would call stop_conn without having done ep_disconnect. This early ep_disconnect call will then free the endpoint and it's ID while userspace still thinks the ID is valid. Fix the early release of the ID by having the in kernel recovery code keep a reference to the endpoint until userspace has called into the kernel to finish cleaning up the endpoint/connection. It requires the previous commit "scsi: iscsi: Release endpoint ID when its freed" which moved the freeing of the ID until when the endpoint is released. Link: https://lore.kernel.org/r/20220408001314.5014-5-michael.christie@oracle.com Fixes: 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20dma-direct: avoid redundant memory sync for swiotlbChao Gao1-1/+2
commit 9e02977bfad006af328add9434c8bffa40e053bb upstream. When we looked into FIO performance with swiotlb enabled in VM, we found swiotlb_bounce() is always called one more time than expected for each DMA read request. It turns out that the bounce buffer is copied to original DMA buffer twice after the completion of a DMA request (one is done by in dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()). But the content in bounce buffer actually doesn't change between the two rounds of copy. So, one round of copy is redundant. Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to skip the memory copy in it. This fix increases FIO 64KB sequential read throughput in a guest with swiotlb=force by 5.6%. Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code") Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com> Reported-by: Gao Liang <liang.gao@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20timers: Fix warning condition in __run_timers()Anna-Maria Behnsen1-4/+7
commit c54bc0fc84214b203f7a0ebfd1bd308ce2abe920 upstream. When the timer base is empty, base::next_expiry is set to base::clk + NEXT_TIMER_MAX_DELTA and base::next_expiry_recalc is false. When no timer is queued until jiffies reaches base::next_expiry value, the warning for not finding any expired timer and base::next_expiry_recalc is false in __run_timers() triggers. To prevent triggering the warning in this valid scenario base::timers_pending needs to be added to the warning condition. Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary") Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220405191732.7438-3-anna-maria@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20i2c: pasemi: Wait for write xfers to finishMartin Povišer1-0/+6
commit bd8963e602c77adc76dbbbfc3417c3cf14fed76b upstream. Wait for completion of write transfers before returning from the driver. At first sight it may seem advantageous to leave write transfers queued for the controller to carry out on its own time, but there's a couple of issues with it: * Driver doesn't check for FIFO space. * The queued writes can complete while the driver is in its I2C read transfer path which means it will get confused by the raising of XEN (the 'transaction ended' signal). This can cause a spurious ENODATA error due to premature reading of the MRXFIFO register. Adding the wait fixes some unreliability issues with the driver. There's some efficiency cost to it (especially with pasemi_smb_waitready doing its polling), but that will be alleviated once the driver receives interrupt support. Fixes: beb58aa39e6e ("i2c: PA Semi SMBus driver") Signed-off-by: Martin Povišer <povik+lin@cutebit.org> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20smp: Fix offline cpu check in flush_smp_call_function_queue()Nadav Amit1-1/+1
commit 9e949a3886356fe9112c6f6f34a6e23d1d35407f upstream. The check in flush_smp_call_function_queue() for callbacks that are sent to offline CPUs currently checks whether the queue is empty. However, flush_smp_call_function_queue() has just deleted all the callbacks from the queue and moved all the entries into a local list. This checks would only be positive if some callbacks were added in the short time after llist_del_all() was called. This does not seem to be the intention of this check. Change the check to look at the local list to which the entries were moved instead of the queue from which all the callbacks were just removed. Fixes: 8d056c48e4862 ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220319072015.1495036-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20dm integrity: fix memory corruption when tag_size is less than digest sizeMikulas Patocka1-2/+5
commit 08c1af8f1c13bbf210f1760132f4df24d0ed46d6 upstream. It is possible to set up dm-integrity in such a way that the "tag_size" parameter is less than the actual digest size. In this situation, a part of the digest beyond tag_size is ignored. In this case, dm-integrity would write beyond the end of the ic->recalc_tags array and corrupt memory. The corruption happened in integrity_recalc->integrity_sector_checksum->crypto_shash_final. Fix this corruption by increasing the tags array so that it has enough padding at the end to accomodate the loop in integrity_recalc() being able to write a full digest size for the last member of the tags array. Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ARM: davinci: da850-evm: Avoid NULL pointer dereferenceNathan Chancellor1-1/+3
commit 83a1cde5c74bfb44b49cb2a940d044bb2380f4ea upstream. With newer versions of GCC, there is a panic in da850_evm_config_emac() when booting multi_v5_defconfig in QEMU under the palmetto-bmc machine: Unable to handle kernel NULL pointer dereference at virtual address 00000020 pgd = (ptrval) [00000020] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0 #1 Hardware name: Generic DT based system PC is at da850_evm_config_emac+0x1c/0x120 LR is at do_one_initcall+0x50/0x1e0 The emac_pdata pointer in soc_info is NULL because davinci_soc_info only gets populated on davinci machines but da850_evm_config_emac() is called on all machines via device_initcall(). Move the rmii_en assignment below the machine check so that it is only dereferenced when running on a supported SoC. Fixes: bae105879f2f ("davinci: DA850/OMAP-L138 EVM: implement autodetect of RMII PHY") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/YcS4xVWs6bQlQSPC@archlinux-ax161/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20tick/nohz: Use WARN_ON_ONCE() to prevent console saturationPaul Gortmaker1-1/+1
commit 40e97e42961f8c6cc7bd5fe67cc18417e02d78f1 upstream. While running some testing on code that happened to allow the variable tick_nohz_full_running to get set but with no "possible" NOHZ cores to back up that setting, this warning triggered: if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) WARN_ON(tick_nohz_full_running); The console was overwhemled with an endless stream of one WARN per tick per core and there was no way to even see what was going on w/o using a serial console to capture it and then trace it back to this. Change it to WARN_ON_ONCE(). Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211206145950.10927-3-paul.gortmaker@windriver.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20genirq/affinity: Consider that CPUs on nodes can be unbalancedRei Yamamoto1-2/+3
commit 08d835dff916bfe8f45acc7b92c7af6c4081c8a7 upstream. If CPUs on a node are offline at boot time, the number of nodes is different when building affinity masks for present cpus and when building affinity masks for possible cpus. This causes the following problem: In the case that the number of vectors is less than the number of nodes there are cases where bits of masks for present cpus are overwritten when building masks for possible cpus. Fix this by excluding CPUs, which are not part of the current build mask (present/possible). [ tglx: Massaged changelog and added comment ] Fixes: b82592199032 ("genirq/affinity: Spread IRQs to all available NUMA nodes") Signed-off-by: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220331003309.10891-1-yamamoto.rei@jp.fujitsu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20drm/amdgpu: Enable gfxoff quirk on MacBook ProTomasz Moń1-0/+2
commit 4593c1b6d159f1e5c35c07a7f125e79e5a864302 upstream. Enabling gfxoff quirk results in perfectly usable graphical user interface on MacBook Pro (15-inch, 2019) with Radeon Pro Vega 20 4 GB. Without the quirk, X server is completely unusable as every few seconds there is gpu reset due to ring gfx timeout. Signed-off-by: Tomasz Moń <desowin@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20drm/amd/display: don't ignore alpha property on pre-multiplied modeMelissa Wen2-10/+18
commit e4f1541caf60fcbe5a59e9d25805c0b5865e546a upstream. "Pre-multiplied" is the default pixel blend mode for KMS/DRM, as documented in supported_modes of drm_plane_create_blend_mode_property(): https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/drm_blend.c In this mode, both 'pixel alpha' and 'plane alpha' participate in the calculation, as described by the pixel blend mode formula in KMS/DRM documentation: out.rgb = plane_alpha * fg.rgb + (1 - (plane_alpha * fg.alpha)) * bg.rgb Considering the blend config mechanisms we have in the driver so far, the alpha mode that better fits this blend mode is the _PER_PIXEL_ALPHA_COMBINED_GLOBAL_GAIN, where the value for global_gain is the plane alpha (global_alpha). With this change, alpha property stops to be ignored. It also addresses Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1734 v2: * keep the 8-bit value for global_alpha_value (Nicholas) * correct the logical ordering for combined global gain (Nicholas) * apply to dcn10 too (Nicholas) Signed-off-by: Melissa Wen <mwen@igalia.com> Tested-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Tested-by: Simon Ser <contact@emersion.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ipv6: fix panic when forwarding a pkt with no in6 devNicolas Dichtel1-1/+1
commit e3fa461d8b0e185b7da8a101fe94dfe6dd500ac0 upstream. kongweibin reported a kernel panic in ip6_forward() when input interface has no in6 dev associated. The following tc commands were used to reproduce this panic: tc qdisc del dev vxlan100 root tc qdisc add dev vxlan100 root netem corrupt 5% CC: stable@vger.kernel.org Fixes: ccd27f05ae7b ("ipv6: fix 'disable_policy' for fwd packets") Reported-by: kongweibin <kongweibin2@huawei.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20nl80211: correctly check NL80211_ATTR_REG_ALPHA2 sizeJohannes Berg1-1/+2
commit 6624bb34b4eb19f715db9908cca00122748765d7 upstream. We need this to be at least two bytes, so we can access alpha2[0] and alpha2[1]. It may be three in case some userspace used NUL-termination since it was NLA_STRING (and we also push it out with NUL-termination). Cc: stable@vger.kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed0d626fc54becf7ea799@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ALSA: pcm: Test for "silence" field in struct "pcm_format_data"Fabio M. De Francesco1-1/+1
commit 2f7a26abb8241a0208c68d22815aa247c5ddacab upstream. Syzbot reports "KASAN: null-ptr-deref Write in snd_pcm_format_set_silence".[1] It is due to missing validation of the "silence" field of struct "pcm_format_data" in "pcm_formats" array. Add a test for valid "pat" and, if it is not so, return -EINVAL. [1] https://lore.kernel.org/lkml/000000000000d188ef05dc2c7279@google.com/ Reported-and-tested-by: syzbot+205eb15961852c2c5974@syzkaller.appspotmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220409012655.9399-1-fmdefrancesco@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakersTao Jin1-0/+1
commit 264fb03497ec1c7841bba872571bcd11beed57a7 upstream. For this specific device on Lenovo Thinkpad X12 tablet, the verbs were dumped by qemu running a guest OS that init this codec properly. After studying the dump, it turns out that the same quirk used by the other Lenovo devices can be reused. The patch was tested working against the mainline kernel. Cc: <stable@vger.kernel.org> Signed-off-by: Tao Jin <tao-j@outlook.com> Link: https://lore.kernel.org/r/CO6PR03MB6241CD73310B37858FE64C85E1E89@CO6PR03MB6241.namprd03.prod.outlook.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ALSA: hda/realtek: Add quirk for Clevo PD50PNTTim Crawford1-0/+1
commit 9eb6f5c388060d8cef3c8b616cc31b765e022359 upstream. Fixes speaker output and headset detection on Clevo PD50PNT. Signed-off-by: Tim Crawford <tcrawford@system76.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220405182029.27431-1-tcrawford@system76.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20btrfs: mark resumed async balance as writingNaohiro Aota1-0/+2
commit a690e5f2db4d1dca742ce734aaff9f3112d63764 upstream. When btrfs balance is interrupted with umount, the background balance resumes on the next mount. There is a potential deadlock with FS freezing here like as described in commit 26559780b953 ("btrfs: zoned: mark relocation as writing"). Mark the process as sb_writing to avoid it. Reviewed-by: Filipe Manana <fdmanana@suse.com> CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20btrfs: fix root ref counts in error handling in btrfs_get_root_refJia-Ju Bai1-2/+3
commit 168a2f776b9762f4021421008512dd7ab7474df1 upstream. In btrfs_get_root_ref(), when btrfs_insert_fs_root() fails, btrfs_put_root() can happen for two reasons: - the root already exists in the tree, in that case it returns the reference obtained in btrfs_lookup_fs_root() - another error so the cleanup is done in the fail label Calling btrfs_put_root() unconditionally would lead to double decrement of the root reference possibly freeing it in the second case. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Fixes: bc44d7c4b2b1 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ath9k: Fix usage of driver-private space in tx_infoToke Høiland-Jørgensen2-11/+21
commit 5a6b06f5927c940fa44026695779c30b7536474c upstream. The ieee80211_tx_info_clear_status() helper also clears the rate counts and the driver-private part of struct ieee80211_tx_info, so using it breaks quite a few other things. So back out of using it, and instead define a ath-internal helper that only clears the area between the status_driver_data and the rates info. Combined with moving the ath_frame_info struct to status_driver_data, this avoids clearing anything we shouldn't be, and so we can keep the existing code for handling the rate information. While fixing this I also noticed that the setting of tx_info->status.rates[tx_rateindex].count on hardware underrun errors was always immediately overridden by the normal setting of the same fields, so rearrange the code so that the underrun detection actually takes effect. The new helper could be generalised to a 'memset_between()' helper, but leave it as a driver-internal helper for now since this needs to go to stable. Cc: stable@vger.kernel.org Reported-by: Peter Seiderer <ps.report@gmx.net> Fixes: 037250f0a45c ("ath9k: Properly clear TX status area before reporting to mac80211") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Peter Seiderer <ps.report@gmx.net> Tested-by: Peter Seiderer <ps.report@gmx.net> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220404204800.2681133-1-toke@toke.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ath9k: Properly clear TX status area before reporting to mac80211Toke Høiland-Jørgensen1-3/+2
commit 037250f0a45cf9ecf5b52d4b9ff8eadeb609c800 upstream. The ath9k driver was not properly clearing the status area in the ieee80211_tx_info struct before reporting TX status to mac80211. Instead, it was manually filling in fields, which meant that fields introduced later were left as-is. Conveniently, mac80211 actually provides a helper to zero out the status area, so use that to make sure we zero everything. The last commit touching the driver function writing the status information seems to have actually been fixing an issue that was also caused by the area being uninitialised; but it only added clearing of a single field instead of the whole struct. That is now redundant, though, so revert that commit and use it as a convenient Fixes tag. Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status") Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220330164409.16645-1-toke@toke.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20gcc-plugins: latent_entropy: use /dev/urandomJason A. Donenfeld1-17/+27
commit c40160f2998c897231f8454bf797558d30a20375 upstream. While the latent entropy plugin mostly doesn't derive entropy from get_random_const() for measuring the call graph, when __latent_entropy is applied to a constant, then it's initialized statically to output from get_random_const(). In that case, this data is derived from a 64-bit seed, which means a buffer of 512 bits doesn't really have that amount of compile-time entropy. This patch fixes that shortcoming by just buffering chunks of /dev/urandom output and doling it out as requested. At the same time, it's important that we don't break the use of -frandom-seed, for people who want the runtime benefits of the latent entropy plugin, while still having compile-time determinism. In that case, we detect whether gcc's set_random_seed() has been called by making a call to get_random_seed(noinit=true) in the plugin init function, which is called after set_random_seed() is called but before anything that calls get_random_seed(noinit=false), and seeing if it's zero or not. If it's not zero, we're in deterministic mode, and so we just generate numbers with a basic xorshift prng. Note that we don't detect if -frandom-seed is being used using the documented local_tick variable, because it's assigned via: local_tick = (unsigned) tv.tv_sec * 1000 + tv.tv_usec / 1000; which may well overflow and become -1 on its own, and so isn't reliable: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105171 [kees: The 256 byte rnd_buf size was chosen based on average (250), median (64), and std deviation (575) bytes of used entropy for a defconfig x86_64 build] Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220405222815.21155-1-Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20memory: renesas-rpc-if: fix platform-device leak in error pathJohan Hovold1-1/+9
commit b452dbf24d7d9a990d70118462925f6ee287d135 upstream. Make sure to free the flash platform device in the event that registration fails during probe. Fixes: ca7d8b980b67 ("memory: add Renesas RPC-IF driver") Cc: stable@vger.kernel.org # 5.8 Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20220303180632.3194-1-johan@kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loadedSean Christopherson3-8/+37
commit 1d0e84806047f38027d7572adb4702ef7c09b317 upstream. Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20mm: kmemleak: take a full lowmem check in kmemleak_*_phys()Patrick Wang1-4/+4
commit 23c2d497de21f25898fbea70aeb292ab8acc8c94 upstream. The kmemleak_*_phys() apis do not check the address for lowmem's min boundary, while the caller may pass an address below lowmem, which will trigger an oops: # echo scan > /sys/kernel/debug/kmemleak Unable to handle kernel paging request at virtual address ff5fffffffe00000 Oops [#1] Modules linked in: CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-20220407 #33 Hardware name: riscv-virtio,qemu (DT) epc : scan_block+0x74/0x15c ra : scan_block+0x72/0x15c epc : ffffffff801e5806 ra : ffffffff801e5804 sp : ff200000104abc30 gp : ffffffff815cd4e8 tp : ff60000004cfa340 t0 : 0000000000000200 t1 : 00aaaaaac23954cc t2 : 00000000000003ff s0 : ff200000104abc90 s1 : ffffffff81b0ff28 a0 : 0000000000000000 a1 : ff5fffffffe01000 a2 : ffffffff81b0ff28 a3 : 0000000000000002 a4 : 0000000000000001 a5 : 0000000000000000 a6 : ff200000104abd7c a7 : 0000000000000005 s2 : ff5fffffffe00ff9 s3 : ffffffff815cd998 s4 : ffffffff815d0e90 s5 : ffffffff81b0ff28 s6 : 0000000000000020 s7 : ffffffff815d0eb0 s8 : ffffffffffffffff s9 : ff5fffffffe00000 s10: ff5fffffffe01000 s11: 0000000000000022 t3 : 00ffffffaa17db4c t4 : 000000000000000f t5 : 0000000000000001 t6 : 0000000000000000 status: 0000000000000100 badaddr: ff5fffffffe00000 cause: 000000000000000d scan_gray_list+0x12e/0x1a6 kmemleak_scan+0x2aa/0x57e kmemleak_write+0x32a/0x40c full_proxy_write+0x56/0x82 vfs_write+0xa6/0x2a6 ksys_write+0x6c/0xe2 sys_write+0x22/0x2a ret_from_syscall+0x0/0x2 The callers may not quite know the actual address they pass(e.g. from devicetree). So the kmemleak_*_phys() apis should guarantee the address they finally use is in lowmem range, so check the address for lowmem's min boundary. Link: https://lkml.kernel.org/r/20220413122925.33856-1-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20mm: fix unexpected zeroed page mapping with zram swapMinchan Kim1-54/+0
commit e914d8f00391520ecc4495dd0ca0124538ab7119 upstream. Two processes under CLONE_VM cloning, user process can be corrupted by seeing zeroed page unexpectedly. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage valid data swap_slot_free_notify delete zram entry swap_readpage zeroed(invalid) data pte_lock map the *zero data* to userspace pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return and next refault will read zeroed data The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't increase the refcount of swap slot at copy_mm so it couldn't catch up whether it's safe or not to discard data from backing device. In the case, only the lock it could rely on to synchronize swap slot freeing is page table lock. Thus, this patch gets rid of the swap_slot_free_notify function. With this patch, CPU A will see correct data. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage original data pte_lock map the origi