summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2025-04-10platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()Dan Carpenter1-11/+25
commit 5b1122fc4995f308b21d7cfc64ef9880ac834d20 upstream. There are a few problems in this code: First, if amd_pmf_tee_init() fails then the function returns directly instead of cleaning up. We cannot simply do a "goto error;" because the amd_pmf_tee_init() cleanup calls tee_shm_free(dev->fw_shm_pool); and amd_pmf_tee_deinit() calls it as well leading to a double free. I have re-written this code to use an unwind ladder to free the allocations. Second, if amd_pmf_start_policy_engine() fails on every iteration though the loop then the code calls amd_pmf_tee_deinit() twice which is also a double free. Call amd_pmf_tee_deinit() inside the loop for each failed iteration. Also on that path the error codes are not necessarily negative kernel error codes. Set the error code to -EINVAL. There is a very subtle third bug which is that if the call to input_register_device() in amd_pmf_register_input_device() fails then we call input_unregister_device() on an input device that wasn't registered. This will lead to a reference counting underflow because of the device_del(&dev->dev) in __input_unregister_device(). It's unlikely that anyone would ever hit this bug in real life. Fixes: 376a8c2a1443 ("platform/x86/amd/pmf: Update PMF Driver for Compatibility with new PMF-TA") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/232231fc-6a71-495e-971b-be2a76f6db4c@stanley.mountain Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10media: streamzap: fix race between device disconnection and urb callbackMurad Masimov1-1/+1
commit f656cfbc7a293a039d6a0c7100e1c846845148c1 upstream. Syzkaller has reported a general protection fault at function ir_raw_event_store_with_filter(). This crash is caused by a NULL pointer dereference of dev->raw pointer, even though it is checked for NULL in the same function, which means there is a race condition. It occurs due to the incorrect order of actions in the streamzap_disconnect() function: rc_unregister_device() is called before usb_kill_urb(). The dev->raw pointer is freed and set to NULL in rc_unregister_device(), and only after that usb_kill_urb() waits for in-progress requests to finish. If rc_unregister_device() is called while streamzap_callback() handler is not finished, this can lead to accessing freed resources. Thus rc_unregister_device() should be called after usb_kill_urb(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 8e9e60640067 ("V4L/DVB: staging/lirc: port lirc_streamzap to ir-core") Cc: stable@vger.kernel.org Reported-by: syzbot+34008406ee9a31b13c73@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34008406ee9a31b13c73 Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10media: vimc: skip .s_stream() for stopped entitiesNikita Zhandarovich1-0/+6
commit 36cef585e2a31e4ddf33a004b0584a7a572246de upstream. Syzbot reported [1] a warning prompted by a check in call_s_stream() that checks whether .s_stream() operation is warranted for unstarted or stopped subdevs. Add a simple fix in vimc_streamer_pipeline_terminate() ensuring that entities skip a call to .s_stream() unless they have been previously properly started. [1] Syzbot report: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5933 at drivers/media/v4l2-core/v4l2-subdev.c:460 call_s_stream+0x2df/0x350 drivers/media/v4l2-core/v4l2-subdev.c:460 Modules linked in: CPU: 0 UID: 0 PID: 5933 Comm: syz-executor330 Not tainted 6.13.0-rc2-syzkaller-00362-g2d8308bf5b67 #0 ... Call Trace: <TASK> vimc_streamer_pipeline_terminate+0x218/0x320 drivers/media/test-drivers/vimc/vimc-streamer.c:62 vimc_streamer_pipeline_init drivers/media/test-drivers/vimc/vimc-streamer.c:101 [inline] vimc_streamer_s_stream+0x650/0x9a0 drivers/media/test-drivers/vimc/vimc-streamer.c:203 vimc_capture_start_streaming+0xa1/0x130 drivers/media/test-drivers/vimc/vimc-capture.c:256 vb2_start_streaming+0x15f/0x5a0 drivers/media/common/videobuf2/videobuf2-core.c:1789 vb2_core_streamon+0x2a7/0x450 drivers/media/common/videobuf2/videobuf2-core.c:2348 vb2_streamon drivers/media/common/videobuf2/videobuf2-v4l2.c:875 [inline] vb2_ioctl_streamon+0xf4/0x170 drivers/media/common/videobuf2/videobuf2-v4l2.c:1118 __video_do_ioctl+0xaf0/0xf00 drivers/media/v4l2-core/v4l2-ioctl.c:3122 video_usercopy+0x4d2/0x1620 drivers/media/v4l2-core/v4l2-ioctl.c:3463 v4l2_ioctl+0x1ba/0x250 drivers/media/v4l2-core/v4l2-dev.c:366 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl fs/ioctl.c:892 [inline] __x64_sys_ioctl+0x190/0x200 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f2b85c01b19 ... Reported-by: syzbot+5bcd7c809d365e14c4df@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5bcd7c809d365e14c4df Fixes: adc589d2a208 ("media: vimc: Add vimc-streamer for stream control") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustionLukas Wunner1-0/+4
commit 667f053b05f00a007738cd7ed6fa1901de19dc7e upstream. When BIOS neglects to assign bus numbers to PCI bridges, the kernel attempts to correct that during PCI device enumeration. If it runs out of bus numbers, no pci_bus is allocated and the "subordinate" pointer in the bridge's pci_dev remains NULL. The PCIe bandwidth controller erroneously does not check for a NULL subordinate pointer and dereferences it on probe. Bandwidth control of unusable devices below the bridge is of questionable utility, so simply error out instead. This mirrors what PCIe hotplug does since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp probe"). The PCI core emits a message with KERN_INFO severity if it has run out of bus numbers. PCIe hotplug emits an additional message with KERN_ERR severity to inform the user that hotplug functionality is disabled at the bridge. A similar message for bandwidth control does not seem merited, given that its only purpose so far is to expose an up-to-date link speed in sysfs and throttle the link speed on certain laptops with limited Thermal Design Power. So error out silently. User-visible messages: pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring [...] pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74 pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them [...] pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring [...] BUG: kernel NULL pointer dereference RIP: pcie_update_link_speed pcie_bwnotif_enable pcie_bwnotif_probe pcie_port_probe_service really_probe Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Cc: stable@vger.kernel.org # v6.13+ Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10wifi: mt76: mt7921: fix kernel panic due to null pointer dereferenceMing Yen Hsieh1-0/+1
commit adc3fd2a2277b7cc0b61692463771bf9bd298036 upstream. Address a kernel panic caused by a null pointer dereference in the `mt792x_rx_get_wcid` function. The issue arises because the `deflink` structure is not properly initialized with the `sta` context. This patch ensures that the `deflink` structure is correctly linked to the `sta` context, preventing the null pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000400 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 470 Comm: mt76-usb-rx phy Not tainted 6.12.13-gentoo-dist #1 Hardware name: /AMD HUDSON-M1, BIOS 4.6.4 11/15/2011 RIP: 0010:mt792x_rx_get_wcid+0x48/0x140 [mt792x_lib] RSP: 0018:ffffa147c055fd98 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8e9ecb652000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e9ecb652000 RBP: 0000000000000685 R08: ffff8e9ec6570000 R09: 0000000000000000 R10: ffff8e9ecd2ca000 R11: ffff8e9f22a217c0 R12: 0000000038010119 R13: 0000000080843801 R14: ffff8e9ec6570000 R15: ffff8e9ecb652000 FS: 0000000000000000(0000) GS:ffff8e9f22a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000400 CR3: 000000000d2ea000 CR4: 00000000000006f0 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? mt792x_rx_get_wcid+0x48/0x140 [mt792x_lib] mt7921_queue_rx_skb+0x1c6/0xaa0 [mt7921_common] mt76u_alloc_queues+0x784/0x810 [mt76_usb] ? __pfx___mt76_worker_fn+0x10/0x10 [mt76] __mt76_worker_fn+0x4f/0x80 [mt76] kthread+0xd2/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Reported-by: Nick Morrow <usbwifi2024@gmail.com> Closes: https://github.com/morrownr/USB-WiFi/issues/577 Cc: stable@vger.kernel.org Fixes: 90c10286b176 ("wifi: mt76: mt7925: Update mt792x_rx_get_wcid for per-link STA") Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Tested-by: Salah Coronya <salah.coronya@gmail.com> Link: https://patch.msgid.link/20250218033343.1999648-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10mmc: sdhci-omap: Disable MMC_CAP_AGGRESSIVE_PM for eMMC/SDUlf Hansson1-2/+2
commit 49d162635151d0dd04935070d7cf67137ab863aa upstream. We have received reports about cards can become corrupt related to the aggressive PM support. Let's make a partial revert of the change that enabled the feature. Reported-by: David Owens <daowens01@gmail.com> Reported-by: Romain Naour <romain.naour@smile.fr> Reported-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Robert Nelson <robertcnelson@gmail.com> Fixes: 3edf588e7fe0 ("mmc: sdhci-omap: Allow SDIO card power off and enable aggressive PM") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20250312121712.1168007-1-ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10mmc: sdhci-pxav3: set NEED_RSP_BUSY capabilityKarel Balej1-0/+1
commit a41fcca4b342811b473bbaa4b44f1d34d87fcce6 upstream. Set the MMC_CAP_NEED_RSP_BUSY capability for the sdhci-pxav3 host to prevent conversion of R1B responses to R1. Without this, the eMMC card in the samsung,coreprimevelte smartphone using the Marvell PXA1908 SoC with this mmc host doesn't probe with the ETIMEDOUT error originating in __mmc_poll_for_busy. Note that the other issues reported for this phone and host, namely floods of "Tuning failed, falling back to fixed sampling clock" dmesg messages for the eMMC and unstable SDIO are not mitigated by this change. Link: https://lore.kernel.org/r/20200310153340.5593-1-ulf.hansson@linaro.org/ Link: https://lore.kernel.org/r/D7204PWIGQGI.1FRFQPPIEE2P9@matfyz.cz/ Link: https://lore.kernel.org/r/20250115-pxa1908-lkml-v14-0-847d24f3665a@skole.hr/ Cc: stable@vger.kernel.org Signed-off-by: Karel Balej <balejk@matfyz.cz> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Duje Mihanović <duje.mihanovic@skole.hr> Link: https://lore.kernel.org/r/20250310140707.23459-1-balejk@matfyz.cz Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10mmc: omap: Fix memory leak in mmc_omap_new_slotMiaoqian Lin1-6/+13
commit 3834a759afb817e23a7a2f09c2c9911b0ce5c588 upstream. Add err_free_host label to properly pair mmc_alloc_host() with mmc_free_host() in GPIO error paths. The allocated host memory was leaked when GPIO lookups failed. Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250318140226.19650-1-linmq006@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10Remove unnecessary firmware version check for gc v9_4_2Candice Li1-0/+1
commit 5b3c08ae9ed324743f5f7286940d45caeb656e6e upstream. GC v9_4_2 uses a new versioning scheme for CP firmware, making the warning ("CP firmware version too old, please update!") irrelevant. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10media: omap3isp: Handle ARM dma_iommu_mappingRobin Murphy1-0/+7
commit 6bc076eec6f85f778f33a8242b438e1bd9fcdd59 upstream. It's no longer practical for the OMAP IOMMU driver to trick arm_setup_iommu_dma_ops() into ignoring its presence, so let's use the same tactic as other IOMMU API users on 32-bit ARM and explicitly kick the arch code's dma_iommu_mapping out of the way to avoid problems. Fixes: 4720287c7bf7 ("iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()") Cc: stable@vger.kernel.org Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Sicelo A. Mhlongo <absicsz@gmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10ACPI: video: Handle fetching EDID as ACPI_TYPE_PACKAGEGergo Koteles1-1/+8
commit ebca08fef88febdb0a898cefa7c99b9e25b3a984 upstream. The _DDC method should return a buffer, or an integer in case of an error. But some Lenovo laptops incorrectly return EDID as buffer in ACPI package. Calling _DDC generates this ACPI Warning: ACPI Warning: \_SB.PCI0.GP17.VGA.LCD._DDC: Return type mismatch - \ found Package, expected Integer/Buffer (20240827/nspredef-254) Use the first element of the package to get the EDID buffer. The DSDT: Name (AUOP, Package (0x01) { Buffer (0x80) { ... } }) ... Method (_DDC, 1, NotSerialized) // _DDC: Display Data Current { If ((PAID == AUID)) { Return (AUOP) /* \_SB_.PCI0.GP17.VGA_.LCD_.AUOP */ } ElseIf ((PAID == IVID)) { Return (IVOP) /* \_SB_.PCI0.GP17.VGA_.LCD_.IVOP */ } ElseIf ((PAID == BOID)) { Return (BOEP) /* \_SB_.PCI0.GP17.VGA_.LCD_.BOEP */ } ElseIf ((PAID == SAID)) { Return (SUNG) /* \_SB_.PCI0.GP17.VGA_.LCD_.SUNG */ } Return (Zero) } Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device Cc: stable@vger.kernel.org Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4085 Signed-off-by: Gergo Koteles <soyer@irl.hu> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/61c3df83ab73aba0bc7a941a443cd7faf4cf7fb0.1743195250.git.soyer@irl.hu Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10ACPI: resource: Skip IRQ override on ASUS Vivobook 14 X1404VAPPaul Menzel1-0/+7
commit 2da31ea2a085cd189857f2db0f7b78d0162db87a upstream. Like the ASUS Vivobook X1504VAP and Vivobook X1704VAP, the ASUS Vivobook 14 X1404VAP has its keyboard IRQ (1) described as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh breaking the keyboard. $ sudo dmidecode […] System Information Manufacturer: ASUSTeK COMPUTER INC. Product Name: ASUS Vivobook 14 X1404VAP_X1404VA […] $ grep -A 30 PS2K dsdt.dsl | grep IRQ -A 1 IRQ (Level, ActiveLow, Exclusive, ) {1} Add the X1404VAP to the irq1_level_low_skip_override[] quirk table to fix this. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219224 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Anton Shyndin <mrcold.il@gmail.com> Link: https://patch.msgid.link/20250318160903.77107-1-pmenzel@molgen.mpg.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10acpi: nfit: fix narrowing conversion in acpi_nfit_ctlMurad Masimov1-1/+1
commit 2ff0e408db36c21ed3fa5e3c1e0e687c82cf132f upstream. Syzkaller has reported a warning in to_nfit_bus_uuid(): "only secondary bus families can be translated". This warning is emited if the argument is equal to NVDIMM_BUS_FAMILY_NFIT == 0. Function acpi_nfit_ctl() first verifies that a user-provided value call_pkg->nd_family of type u64 is not equal to 0. Then the value is converted to int, and only after that is compared to NVDIMM_BUS_FAMILY_MAX. This can lead to passing an invalid argument to acpi_nfit_ctl(), if call_pkg->nd_family is non-zero, while the lower 32 bits are zero. Furthermore, it is best to return EINVAL immediately upon seeing the invalid user input. The WARNING is insufficient to prevent further undefined behavior based on other invalid user input. All checks of the input value should be applied to the original variable call_pkg->nd_family. [iweiny: update commit message] Fixes: 6450ddbd5d8e ("ACPI: NFIT: Define runtime firmware activation commands") Cc: stable@vger.kernel.org Reported-by: syzbot+c80d8dc0d9fa81a3cd8c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c80d8dc0d9fa81a3cd8c Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Link: https://patch.msgid.link/20250123163945.251-1-m.masimov@mt-integration.ru Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10wifi: mt76: mt7925: remove unused acpi function for clcMing Yen Hsieh1-1/+0
commit b4ea6fdfc08375aae59c7e7059653b9877171fe4 upstream. The code for handling ACPI configuration in CLC was copied from the mt7921 driver but is not utilized in the mt7925 implementation. So removes the unused functionality to clean up the codebase. Cc: stable@vger.kernel.org Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips") Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250304113649.867387-4-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10ntb_perf: Delete duplicate dmaengine_unmap_put() call in perf_copy_chunk()Markus Elfring1-3/+1
commit 4279e72cab31dd3eb8c89591eb9d2affa90ab6aa upstream. The function call “dmaengine_unmap_put(unmap)” was used in an if branch. The same call was immediately triggered by a subsequent goto statement. Thus avoid such a call repetition. This issue was detected by using the Coccinelle software. Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support") Cc: stable@vger.kernel.org Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10platform/x86: ISST: Correct command storage data lengthSrinivas Pandruvada1-1/+1
commit 9462e74c5c983cce34019bfb27f734552bebe59f upstream. After resume/online turbo limit ratio (TRL) is restored partially if the admin explicitly changed TRL from user space. A hash table is used to store SST mail box and MSR settings when modified to restore those settings after resume or online. This uses a struct isst_cmd field "data" to store these settings. This is a 64 bit field. But isst_store_new_cmd() is only assigning as u32. This results in truncation of 32 bits. Change the argument to u64 from u32. Fixes: f607874f35cb ("platform/x86: ISST: Restore state on resume") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250328224749.2691272-1-srinivas.pandruvada@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10platform/x86: thinkpad_acpi: disable ACPI fan access for T495* and E560Eduard Christian Dumitrescu1-0/+11
commit 2b9f84e7dc863afd63357b867cea246aeedda036 upstream. T495, T495s, and E560 laptops have the FANG+FANW ACPI methods (therefore fang_handle and fanw_handle are not NULL) but they do not actually work, which results in a "No such device or address" error. The DSDT table code for the FANG+FANW methods doesn't seem to do anything special regarding the fan being secondary. The bug was introduced in commit 57d0557dfa49 ("platform/x86: thinkpad_acpi: Add Thinkpad Edge E531 fan support"), which added a new fan control method via the FANG+FANW ACPI methods. Add a quirk for T495, T495s, and E560 to avoid the FANG+FANW methods. Fan access and control is restored after forcing the legacy non-ACPI fan control method by setting both fang_handle and fanw_handle to NULL. Reported-by: Vlastimil Holer <vlastimil.holer@gmail.com> Fixes: 57d0557dfa49 ("platform/x86: thinkpad_acpi: Add Thinkpad Edge E531 fan support") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219643 Cc: stable@vger.kernel.org Tested-by: Alireza Elikahi <scr0lll0ck1s4b0v3h0m3k3y@gmail.com> Reviewed-by: Kurt Borja <kuurtb@gmail.com> Signed-off-by: Eduard Christian Dumitrescu <eduard.c.dumitrescu@gmail.com> Co-developed-by: Seyediman Seyedarab <ImanDevel@gmail.com> Signed-off-by: Seyediman Seyedarab <ImanDevel@gmail.com> Link: https://lore.kernel.org/r/20250324152442.106113-1-ImanDevel@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10ACPI: x86: Extend Lenovo Yoga Tab 3 quirk with skip GPIO event-handlersHans de Goede1-1/+2
commit 2fa87c71d2adb4b82c105f9191e6120340feff00 upstream. Depending on the secureboot signature on EFI\BOOT\BOOTX86.EFI the Lenovo Yoga Tab 3 UEFI will switch its OSID ACPI variable between 1 (Windows) and 4 (Android(GMIN)). In Windows mode a GPIO event handler gets installed for GPO1 pin 5, causing Linux' x86-android-tables code which deals with the general brokenness of this device's ACPI tables to fail to probe with: [ 17.853705] x86_android_tablets: error -16 getting GPIO INT33FF:01 5 [ 17.859623] x86_android_tablets x86_android_tablets: probe with driver which renders sound, the touchscreen, charging-management, battery-monitoring and more non functional. Add ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS to the existing quirks for this device to fix this. Reported-by: Agoston Lorincz <pipacsba@gmail.com> Closes: https://lore.kernel.org/platform-driver-x86/CAMEzqD+DNXrAvUOHviB2O2bjtcbmo3xH=kunKr4nubuMLbb_0A@mail.gmail.com/ Cc: All applicable <stable@kernel.org> Fixes: fe820db35275 ("ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Tab 3 Pro (YT3-X90F)") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patch.msgid.link/20250325210450.358506-1-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10usbnet:fix NPE during rx_completeYing Lu1-3/+3
commit 51de3600093429e3b712e5f091d767babc5dd6df upstream. Missing usbnet_going_away Check in Critical Path. The usb_submit_urb function lacks a usbnet_going_away validation, whereas __usbnet_queue_skb includes this check. This inconsistency creates a race condition where: A URB request may succeed, but the corresponding SKB data fails to be queued. Subsequent processes: (e.g., rx_complete → defer_bh → __skb_unlink(skb, list)) attempt to access skb->next, triggering a NULL pointer dereference (Kernel Panic). Fixes: 04e906839a05 ("usbnet: fix cyclical race on disconnect with work queue") Cc: stable@vger.kernel.org Signed-off-by: Ying Lu <luying1@xiaomi.com> Link: https://patch.msgid.link/4c9ef2efaa07eb7f9a5042b74348a67e5a3a7aea.1743584159.git.luying1@xiaomi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10tty: serial: lpuart: only disable CTS instead of overwriting the whole ↵Sherry Sun1-5/+7
UARTMODIR register [ Upstream commit e98ab45ec5182605d2e00114cba3bbf46b0ea27f ] No need to overwrite the whole UARTMODIR register before waiting the transmit engine complete, actually our target here is only to disable CTS flow control to avoid the dirty data in TX FIFO may block the transmit engine complete. Also delete the following duplicate CTS disable configuration. Fixes: d5a2e0834364 ("tty: serial: lpuart: disable flow control while waiting for the transmit engine to complete") Cc: stable <stable@kernel.org> Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20250307065446.1122482-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10tty: serial: fsl_lpuart: Fix unused variable 'sport' build warningSherry Sun1-3/+0
commit 9f8fe348ac9544f6855f82565e754bf085d81f88 upstream. Remove the unused variable 'sport' to avoid the kernel build warning. Fixes: 3cc16ae096f1 ("tty: serial: fsl_lpuart: use port struct directly to simply code") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503210614.2qGlnbIq-lkp@intel.com/ Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20250324021051.162676-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10tty: serial: fsl_lpuart: use port struct directly to simply codeSherry Sun1-108/+102
commit 3cc16ae096f164ae0c6b98416c25a01db5f3a529 upstream. Most lpuart functions have the parameter struct uart_port *port, but still use the &sport->port to get the uart_port instead of use it directly, let's simply the code logic, directly use this struct instead of covert it from struct sport. Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20250312023904.1343351-3-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10tty: serial: fsl_lpuart: Use u32 and u8 for register variablesSherry Sun1-47/+46
[ Upstream commit b6a8f6ab2c53e5ea3c7f2a3978db378a89bb7595 ] Use u32 and u8 rather than unsigned long or unsigned char for register variables for clarity and consistency. Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20250312023904.1343351-2-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: e98ab45ec518 ("tty: serial: lpuart: only disable CTS instead of overwriting the whole UARTMODIR register") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Fix Oops after disconnect in agilent usbDave Penkler1-10/+55
[ Upstream commit 8491e73a5223acb0a4b4d78c3f8b96aa9c5e774d ] If the agilent usb dongle is disconnected subsequent calls to the driver cause a NULL dereference Oops as the bus_interface is set to NULL on disconnect. This problem was introduced by setting usb_dev from the bus_interface for dev_xxx messages. Previously bus_interface was checked for NULL only in the functions directly calling usb_fill_bulk_urb or usb_control_msg. Check for valid bus_interface on all interface entry points and return -ENODEV if it is NULL. Fixes: fbae7090f30c ("staging: gpib: Update messaging and usb_device refs in agilent_usb") Cc: stable <stable@kernel.org> Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250222204515.5104-1-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: agilent usb console messaging cleanupDave Penkler1-216/+143
[ Upstream commit 50a6ed0494bc082227c38f427b40731bca9bdf15 ] Enable module name to be printed in pr_xxx and dev_xxx Use DRV_NAME defined as KBUILD_MODNAME instead of hard coded string in usb_driver struct. Remove __func__ parameter in dev_dbg messages as this can be enabled with dynamic debug. Remove __func__ parameter in dev_err messages as they are not needed. The module name is sufficient. Change pr_info to dev_dbg where needed and remove the rest where possible. Remove test and error messages for buffer over run in write and read_registers as these are just trying to catch bugs in the driver. Remove agilent_82357a string prefix in error messages. Return -EIO for too many reads in read_registers instead of continuing after printing an error message. Change pr_warn to dev_warn. Remove warning on calls for unsupported functionality. Remove test and message for buffer overflow in agilent_82357_init and agilent_82357a_go_idle which are just checking for a driver bug. Use actual indeces in the array instead of i and then incrementing i so that the code is clear and there is no need to check for overflow or print a message. Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250214114708.28947-3-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8491e73a5223 ("staging: gpib: Fix Oops after disconnect in agilent usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Agilent usb code cleanupDave Penkler1-48/+36
[ Upstream commit 579b6f18c5ca162af040f44684cc55f7da182236 ] Remove useless #ifdef RESET_USB_CONFIG code. Change kalloc / memset to kzalloc The attach function was not freeing the private data on error returns. Separate the releasing of urbs and private data and add a common error exit for attach failure. Set the board private data pointer to NULL after freeing the private data. Reduce console spam by emitting only one attach message. Change last pr_err in attach to dev_err Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250118145046.12181-3-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8491e73a5223 ("staging: gpib: Fix Oops after disconnect in agilent usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Fix NULL pointer dereference in detachDave Penkler1-3/+1
[ Upstream commit 6a6c153537f093c3bc79ea9633f3954d3450d0ba ] When the detach function is called after a failed attach the usb_dev initialization can cause a NULL pointer dereference. This happens when the usb device is not found in the attach procedure. Remove the usb_dev variable and initialization and change the dev in the dev_info message from the usb_dev to the gpib_dev. Fixes: fbae7090f30c ("staging: gpib: Update messaging and usb_device refs in agilent_usb") Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250118145046.12181-2-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8491e73a5223 ("staging: gpib: Fix Oops after disconnect in agilent usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Add missing mutex unlock in agilent usb driverDave Penkler1-0/+1
[ Upstream commit 55eb3c3a6388420afda1374b353717de32ae9573 ] When no matching product id was found in the attach function the driver returned without unlocking the agilent_82357a_hotplug_lock mutex. Add the unlock call. This was detected by smatch: smatch warnings: drivers/staging/gpib/agilent_82357a/agilent_82357a.c:1381 agilent_82357a_attach() warn: inconsistent returns 'global &agilent_82357a_hotplug_lock'. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202412210143.WJhYzXfD-lkp@intel.com/ Fixes: 4c41fe886a56 ("staging: gpib: Add Agilent/Keysight 82357x USB GPIB driver") Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250111161457.27556-1-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8491e73a5223 ("staging: gpib: Fix Oops after disconnect in agilent usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: agilent_82357a: Handle gpib_register_driver() errorsNihar Chaithanya1-2/+14
[ Upstream commit 9e43ebc613e2adb9b9e900e11a4099e104b61f2d ] The usb_register() function can fail and returns an error value which is not returned. The function gpib_register_driver() can also fail which can result in semi-registered module. In case gpib_register_driver() fails unregister the previous usb driver registering function. Return the error value if gpib_register_driver() or usb_register() functions fail. Add pr_err when registering driver fails also indicating the error value. Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Link: https://lore.kernel.org/r/20241230185633.175690-4-niharchaithanya@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8491e73a5223 ("staging: gpib: Fix Oops after disconnect in agilent usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Fix Oops after disconnect in ni_usbDave Penkler1-20/+73
[ Upstream commit a239c6e91b665f1837cf57b97fe638ef1baf2e78 ] If the usb dongle is disconnected subsequent calls to the driver cause a NULL dereference Oops as the bus_interface is set to NULL on disconnect. This problem was introduced by setting usb_dev from the bus_interface for dev_xxx messages. Previously bus_interface was checked for NULL only in the the functions directly calling usb_fill_bulk_urb or usb_control_msg. Check for valid bus_interface on all interface entry points and return -ENODEV if it is NULL. Fixes: 4934b98bb243 ("staging: gpib: Update messaging and usb_device refs in ni_usb") Cc: stable <stable@kernel.org> Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250222165817.12856-1-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: ni_usb console messaging cleanupDave Penkler1-236/+189
[ Upstream commit 18ce5b5d9167bffce02550a85c7c3316a05ea598 ] Enable module name to be printed in pr_xxx and dev_xxx Use DRV_NAME defined as KBUILD_MODNAME instead of hard coded string "ni_usb_gpib" in usb_driver struct. Remove __func__ parameter from pr_err and dev_err. Remove __func__ parameter from dev_dbg as this can be enabled by dynamic debug. Remove commented printk's and dev_err's. Remove kmalloc failed messages. Remove buffer over run bug dev_err message as this just checks for a bug in the driver which does not exist. Remove read/write length too long messages and return -EINVAL Change dev_info to dev_dbg where possible. Move attach message to the end of attach function. Remove buffer overrun message. Use actual array indeces instead of i and i++ to make code clear and check redundnant. Remove module init and exit pr_info's. Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://lore.kernel.org/r/20250214114708.28947-15-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: a239c6e91b66 ("staging: gpib: Fix Oops after disconnect in ni_usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: ni_usb: Handle gpib_register_driver() errorsNihar Chaithanya1-2/+13
[ Upstream commit 635ddb8ccdbde0d917b0a7448b0fd9d6cc27a2a9 ] The usb_register() function can fail and returns an error value which is not returned. The function gpib_register_driver() can also fail which can result in semi-registered module. In case gpib_register_driver() fails unregister the previous usb driver registering function. Return the error value if gpib_register_driver() or usb_register() functions fail. Add pr_err() statements indicating the fail and error value. Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Link: https://lore.kernel.org/r/20241230185633.175690-14-niharchaithanya@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: a239c6e91b66 ("staging: gpib: Fix Oops after disconnect in ni_usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Modify gpib_register_driver() to return error if it failsNihar Chaithanya2-4/+5
[ Upstream commit e999bd2a897e7d70fa1fca6b80873529532322fe ] The function gpib_register_driver() can fail if kmalloc() fails, but it doesn't return any error if that happens. Modify the function to return error i.e int. Return the appropriate error code if it fails. Remove the pr_info() statement. Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Link: https://lore.kernel.org/r/20241230185633.175690-2-niharchaithanya@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: a239c6e91b66 ("staging: gpib: Fix Oops after disconnect in ni_usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10staging: gpib: Replace semaphore with completion for one-time signalingSantosh Mahto2-9/+9
[ Upstream commit 5d4db9cf4135d82634c7f31aac73081fba3a356e ] Replaced 'down_interruptible()' and 'up()' calls with 'wait_for_completion_interruptible()' and 'complete()' respectively. The completion API simplifies the code and adheres to kernel best practices for synchronization primitive Signed-off-by: Santosh Mahto <eisantosh95@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/20241212162112.13083-1-eisantosh95@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: a239c6e91b66 ("staging: gpib: Fix Oops after disconnect in ni_usb") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amdgpu/gfx12: fix num_mecAlex Deucher1-1/+1
[ Upstream commit dce8bd9137b88735dd0efc4e2693213d98c15913 ] GC12 only has 1 mec. Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/amdgpu/gfx11: fix num_mecAlex Deucher1-1/+1
[ Upstream commit 4161050d47e1b083a7e1b0b875c9907e1a6f1f1f ] GC11 only has 1 mec. Fixes: 3d879e81f0f9 ("drm/amdgpu: add init support for GFX11 (v2)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10drm/xe: Fix unmet direct dependencies warningYue Haibing1-1/+1
[ Upstream commit 5e66cf6edddb5f6237e3afb07475ace57ecb56bc ] WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n] Selected by [m]: - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=m] && DRM_XE [=m] && DRM_XE [=m]=m [=m] && HAS_IOPORT [=y] DRM_XE_DISPLAY requires FB_IOMEM_HELPERS, but the dependency FB_CORE is missing, selecting FB_IOMEM_HELPERS if DRM_FBDEV_EMULATION is set as other drm drivers. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250323114103.1960511-1-yuehaibing@huawei.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 689582882802cd64986c1eb584c9f5184d67f0cf) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10net: ibmveth: make veth_pool_store stop hangingDave Marquardt1-12/+27
[ Upstream commit 053f3ff67d7feefc75797863f3d84b47ad47086f ] v2: - Created a single error handling unlock and exit in veth_pool_store - Greatly expanded commit message with previous explanatory-only text Summary: Use rtnl_mutex to synchronize veth_pool_store with itself, ibmveth_close and ibmveth_open, preventing multiple calls in a row to napi_disable. Background: Two (or more) threads could call veth_pool_store through writing to /sys/devices/vio/30000002/pool*/*. You can do this easily with a little shell script. This causes a hang. I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new kernel. I ran this test again and saw: Setting pool0/active to 0 Setting pool1/active to 1 [ 73.911067][ T4365] ibmveth 30000002 eth0: close starting Setting pool1/active to 1 Setting pool1/active to 0 [ 73.911367][ T4366] ibmveth 30000002 eth0: close starting [ 73.916056][ T4365] ibmveth 30000002 eth0: close complete [ 73.916064][ T4365] ibmveth 30000002 eth0: open starting [ 110.808564][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 230.808495][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 243.683786][ T123] INFO: task stress.sh:4365 blocked for more than 122 seconds. [ 243.683827][ T123] Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8 [ 243.683833][ T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.683838][ T123] task:stress.sh state:D stack:28096 pid:4365 tgid:4365 ppid:4364 task_flags:0x400040 flags:0x00042000 [ 243.683852][ T123] Call Trace: [ 243.683857][ T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable) [ 243.683868][ T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0 [ 243.683878][ T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0 [ 243.683888][ T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210 [ 243.683896][ T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50 [ 243.683904][ T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0 [ 243.683913][ T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60 [ 243.683921][ T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc [ 243.683928][ T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270 [ 243.683936][ T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0 [ 243.683944][ T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0 [ 243.683951][ T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650 [ 243.683958][ T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150 [ 243.683966][ T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340 [ 243.683973][ T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ... [ 243.684087][ T123] Showing all locks held in the system: [ 243.684095][ T123] 1 lock held by khungtaskd/123: [ 243.684099][ T123] #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248 [ 243.684114][ T123] 4 locks held by stress.sh/4365: [ 243.684119][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684132][ T123] #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684143][ T123] #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684155][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60 [ 243.684166][ T123] 5 locks held by stress.sh/4366: [ 243.684170][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684183][ T123] #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684194][ T123] #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684205][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60 [ 243.684216][ T123] #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0 From the ibmveth debug, two threads are calling veth_pool_store, which calls ibmveth_close and ibmveth_open. Here's the sequence: T4365 T4366 ----------------- ----------------- --------- veth_pool_store veth_pool_store ibmveth_close ibmveth_close napi_disable napi_disable ibmveth_open napi_enable <- HANG ibmveth_close calls napi_disable at the top and ibmveth_open calls napi_enable at the top. https://docs.kernel.org/networking/napi.html]] says The control APIs are not idempotent. Control API calls are safe against concurrent use of datapath APIs but an incorrect sequence of control API calls may result in crashes, deadlocks, or race conditions. For example,