| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit eac32ff42393efa6657efc821231b8d802c1d485 ]
Fix the following corner case:-
Consider a 2M huge page SVM allocation, followed by prefetch call for
the first 4K page. The whole range is initially mapped with single PTE.
After the prefetch, this range gets split to first page + rest of the
pages. Currently, the first page mapping is not updated on MI300A (APU)
since page hasn't migrated. However, after range split PTE mapping it not
valid.
Fix this by forcing page table update for the whole range when prefetch
is called. Calling prefetch on APU doesn't improve performance. If all
it deteriotes. However, functionality has to be supported.
v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout
VRAM
v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU
v4: Remove v2 and v3 changes. Force update_mapping when range is split
at a size that is not aligned to prange granularity
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 076470b9f6f8d9c7c8ca73a9f054942a686f9ba7)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c1962742ffff7e245f935903a4658eb6f94f6058 ]
Rather than using an ad hoc reference count use kref which is atomic
and has underflow warnings.
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patch.msgid.link/20251030193640.153697-1-ian.forbes@broadcom.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 5861258c4e6a829a10200b41ba3fb4d7d1a4054f upstream.
If the source OUI DPCD register value matches the expected Intel OUI
value, the write timestamp doesn't get updated leaving it at the 0
initial value if the OUI wasn't written before. This can lead to an
incorrect wait duration in intel_dp_wait_source_oui(), since jiffies is
not inited to 0 in general (on a 32 bit system INITIAL_JIFFIES is set to
5 minutes ahead of wrap-around). Fix this by intializing the write
timestamp in the above case as well.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025160259.3088727-4-imre.deak@intel.com
Cc: "H. Nikolaus Schaller" <hns@goldelico.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit eb76d0f5553575599561010f24c277cc5b31d003 ]
Protect vga_switcheroo_client_fb_set() with console lock. Avoids OOB
access in fbcon_remap_all(). Without holding the console lock the call
races with switching outputs.
VGA switcheroo calls fbcon_remap_all() when switching clients. The fbcon
function uses struct fb_info.node, which is set by register_framebuffer().
As the fb-helper code currently sets up VGA switcheroo before registering
the framebuffer, the value of node is -1 and therefore not a legal value.
For example, fbcon uses the value within set_con2fb_map() [1] as an index
into an array.
Moving vga_switcheroo_client_fb_set() after register_framebuffer() can
result in VGA switching that does not switch fbcon correctly.
Therefore move vga_switcheroo_client_fb_set() under fbcon_fb_registered(),
which already holds the console lock. Fbdev calls fbcon_fb_registered()
from within register_framebuffer(). Serializes the helper with VGA
switcheroo's call to fbcon_remap_all().
Although vga_switcheroo_client_fb_set() takes an instance of struct fb_info
as parameter, it really only needs the contained fbcon state. Moving the
call to fbcon initialization is therefore cleaner than before. Only amdgpu,
i915, nouveau and radeon support vga_switcheroo. For all other drivers,
this change does nothing.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://elixir.bootlin.com/linux/v6.17/source/drivers/video/fbdev/core/fbcon.c#L2942 # [1]
Fixes: 6a9ee8af344e ("vga_switcheroo: initial implementation (v15)")
Acked-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-fbdev@vger.kernel.org
Cc: <stable@vger.kernel.org> # v2.6.34+
Link: https://patch.msgid.link/20251105161549.98836-1-tzimmermann@suse.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 81f4d4ba509522596143fd5d7dc2fc3495296b0a upstream.
[WHY]
When a laptop lid is closed the connector is disabled but userspace
can still try to change brightness. This doesn't work because the
panel is turned off. It will eventually time out, but there is a lot
of stutter along the way.
[How]
Iterate all connectors to check whether the matching one for the backlight
index is enabled.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4675
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f6eeab30323d1174a4cc022e769d248fe8241304)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ce62c189693e8ed7b3abe551802bbc67f3ace54 upstream.
[WHAT]
IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic
fails with NULL pointer dereference. This can be reproduced with
both an eDP panel and a DP monitors connected.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted
6.16.0-99-custom #8 PREEMPT(voluntary)
Hardware name: AMD ........
RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu]
Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49
89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30
c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02
RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668
RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000
RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760
R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000
R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c
FS: 000071f631b68700(0000) GS:ffff8b399f114000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu]
amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu]
? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu]
amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu]
drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400
drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30
drm_crtc_get_last_vbltimestamp+0x55/0x90
drm_crtc_next_vblank_start+0x45/0xa0
drm_atomic_helper_wait_for_fences+0x81/0x1f0
...
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 621e55f1919640acab25383362b96e65f2baea3c)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 971fb57429df5aa4e6efc796f7841e0d10b1e83c upstream.
Reserve vm invalidation engine 6 when uni_mes enabled. It
is used in processing tlb flush request from host.
Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 873373739b9b150720ea2c5390b4e904a4d21505)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 620a8f131154250f6a64a07d049a4f235d6451a5 upstream.
Make sure to drop the references taken to the vtg devices by
of_find_device_by_node() when looking up their driver data during
component probe.
Note that holding a reference to a platform device does not prevent its
driver data from going away so there is no point in keeping the
reference after the lookup helper returns.
Fixes: cc6b741c6f63 ("drm: sti: remove useless fields from vtg structure")
Cc: stable@vger.kernel.org # 4.16
Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20250922122012.27407-1-johan@kernel.org
Signed-off-by: Raphael Gallais-Pou <raphael.gallais-pou@foss.st.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3126c9ccb4373d8758733c6699ba5ab93dbe5c9d upstream.
This reverts commit 2681bf4ae8d24df950138b8c9ea9c271cd62e414.
This results in a blank screen on the HDMI port on some systems.
Revert for now so as not to regress 6.18, can be addressed
in 6.19 once the issue is root caused.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4652
Cc: Sunpeng.Li@amd.com
Cc: ivan.lipski@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d0e9de7a81503cdde37fb2d37f1d102f9e0f38fb)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7fa666ab07ba9e08f52f357cb8e1aad753e83ac6 ]
If the board supports IP discovery, we don't need to
parse the gpu info firmware.
Backport to 6.18.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721
Fixes: fa819e3a7c1e ("drm/amdgpu: add support for cyan skillfish gpu_info")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5427e32fa3a0ba9a016db83877851ed277b065fb)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7276878b069c57d9a9cca5db01d2f7a427b73456 ]
When tick counts are large and multiplication by MSEC_PER_SEC is larger
than 64 bits, the conversion from clock ticks to milliseconds can go bad.
Use mul_u64_u32_div() instead.
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 49cc215aad7f ("drm/xe: Add xe_gt_clock_interval_to_ms helper")
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/1562f1b62d5be3fbaee100f09107f3cc49e40dd1.1763408584.git.harish.chegondi@intel.com
(cherry picked from commit 96b93ac214f9dd66294d975d86c5dee256faef91)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cfa0904a35fd0231f4d05da0190f0a22ed881cce ]
[why]
1. With allow_0_dtb_clk enabled, the time required to latch DTBCLK to 600 MHz
depends on the SMU. If DTBCLK is not latched to 600 MHz before set_mode completes,
gating DTBCLK causes the DP2 sink to lose its clock source.
2. The existing DTBCLK gating sequence ungates DTBCLK based on both pix_clk and ref_dtbclk,
but gates DTBCLK when either pix_clk or ref_dtbclk is zero.
pix_clk can be zero outside the set_mode sequence before DTBCLK is properly latched,
which can lead to DTBCLK being gated by mistake.
[how]
Consider both pixel_clk and ref_dtbclk when determining when it is safe to gate DTBCLK;
this is more accurate.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4701
Fixes: 5949e7c4890c ("drm/amd/display: Enable Dynamic DTBCLK Switch")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d04eb0c402780ca037b62a6aecf23b863545ebca)
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 35bcc9168f3ce6416cbf3f776758be0937f84cb3 ]
[why]
Log for sequence tracking
Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: cfa0904a35fd ("drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1bcd679209420305a86833bc357d50021909edaf ]
[why]
DPP CLK enable needs to disable DPPCLK RCG first.
The DPPCLK_en in dccg should always be enabled when the corresponding
pipe is enabled.
Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: cfa0904a35fd ("drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0ae47e971b9add8f7b8f8d55ac5f407f6f346758 ]
[why & how]
this is to init to HW real DTBCLK.
and use real HW DTBCLK status to update internal logic state
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Martin Leung <martin.leung@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: cfa0904a35fd ("drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f2687d3cc9f905505d7b510c50970176115066a2 ]
Disable Panel Replay on MST links until it's properly implemented. For
instance the required VSC SDP is not programmed on MST and FEC is not
enabled if Panel Replay is enabled.
Fixes: 3257e55d3ea7 ("drm/i915/panelreplay: enable/disable panel replay")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15174
Cc: Jouni Högander <jouni.hogander@intel.com>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20251107124141.911895-1-imre.deak@intel.com
(cherry picked from commit e109f644b871df8440c886a69cdce971ed533088)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[ placed MST check at function start since DPCD read was moved to caller ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit eb6e7f520d6efa4d4ebf1671455abe4a681f7a05 ]
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.
Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.
The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5d1b32cfe4a676fe552416cb5ae847b215463a1a)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d52dea485cd3c98cfeeb474cf66cf95df2ab142f ]
If user provides a large value (such as 0x80) for parameter
prefetch_mem_region_instance in vm_bind ioctl, it will cause
BIT(prefetch_region) overflow as below:
"
------------[ cut here ]------------
UBSAN: shift-out-of-bounds in drivers/gpu/drm/xe/xe_vm.c:3414:7
shift exponent 128 is too large for 64-bit type 'long unsigned int'
CPU: 8 UID: 0 PID: 53120 Comm: xe_exec_system_ Tainted: G W 6.18.0-rc1-lgci-xe-kernel+ #200 PREEMPT(voluntary)
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
Call Trace:
<TASK>
dump_stack_lvl+0xa0/0xc0
dump_stack+0x10/0x20
ubsan_epilogue+0x9/0x40
__ubsan_handle_shift_out_of_bounds+0x10e/0x170
? mutex_unlock+0x12/0x20
xe_vm_bind_ioctl.cold+0x20/0x3c [xe]
...
"
Fix it by validating prefetch_region before the BIT() usage.
v2: Add Closes and Cc stable kernels. (Matt)
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6478
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251112181005.2120521-2-shuicheng.lin@intel.com
(cherry picked from commit 8f565bdd14eec5611cc041dba4650e42ccdf71d9)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit d52dea485cd3c98cfeeb474cf66cf95df2ab142f)
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6cbab9f0da72b4dc3c3f9161197aa3b9daa1fa3a ]
Add a call to put_pid() corresponding to get_task_pid().
host1x_memory_context_alloc() does not take ownership of the PID so we
need to free it here to avoid leaking.
Signed-off-by: Prateek Agarwal <praagarwal@nvidia.com>
Fixes: e09db97889ec ("drm/tegra: Support context isolation")
[mperttunen@nvidia.com: reword commit message]
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20250919-host1x-put-pid-v1-1-19c2163dfa87@nvidia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 5bab4c89390f32b2f491f49a151948cd226dd909 upstream.
[Why]
On DCN20 & DCN30, the 6th DPP's & HUBP's are powered on permanently and
cannot be power gated. Thus, when dpp_reset() is invoked for the DPP5,
while it's still powered on, the cached cursor_state
(dpp_base->pos.cur0_ctl.bits.cur0_enable)
and the actual state (CUR0_ENABLE) bit are unsycned. This can cause a
double cursor in full screen with non-native scaling.
[How]
Force disable cursor on DPP5 on plane powerdown for ASICs w/ 6 DPPs/HUBPs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4673
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 79b3c037f972dcb13e325a8eabfb8da835764e15)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1788ef30725da53face7e311cdf62ad65fababcd upstream.
[Why]
Existing routine has two conversion sequence,
pbn_to_kbps and kbps_to_pbn with margin.
Non of those has without-margin calculation.
kbps_to_pbn with margin conversion includes
fec overhead which has already been included in
pbn_div calculation with 0.994 factor considered.
It is a double counted fec overhead factor that causes
potential bw loss.
[How]
Add without-margin calculation.
Fix fec overhead double counted issue.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3735
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e0dec00f3d05e8c0eceaaebfdca217f8d10d380c)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 71ad9054c1f241be63f9d11df8cbd0aa0352fe16 upstream.
[Why]
When a monitor is booting it's possible that it isn't ready to retrieve
link caps and this can lead to an EDID read failure:
```
[drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read receiver caps dpcd data failed.
amdgpu 0000:c5:00.0: [drm] *ERROR* No EDID read.
```
[How]
Rather than msleep once and try a few times, msleep each time. Should
be no changes for existing working monitors, but should correct reading
caps on a monitor that is slow to boot.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 669dca37b3348a447db04bbdcbb3def94d5997cc)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8612badc331bcab2068baefa69e1458085ed89e3 upstream.
[Why]
Empirical measurement of some monitors that fail to read EDID while
booting shows that the number of retries with a 30ms delay between
tries is as high as 16.
[How]
Increase number of retries to 20.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ad1c59ad7cf74ec06e32fe2c330ac1e957222288)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 80d8a9ad1587b64c545d515ab6cb7ecb9908e1b3 upstream.
[Why]
Accoreding to CP updated to RS64 on gfx11,
WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW.
That packet is used for MCBP on F32 based system.
So it would lead to incorrect GRBM write and FW is not handling that
extra case correctly.
[How]
With gfx11 rs64 enabled, skip emit de meta data.
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8366cd442d226463e673bed5d199df916f4ecbcf)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 31ab31433c9bd2f255c48dc6cb9a99845c58b1e4 upstream.
During the suspend sequence VPE is already going to be power gated
as part of vpe_suspend(). It's unnecessary to call during calls to
amdgpu_device_set_pg_state().
It actually can expose a race condition with the firmware if s0i3
sequence starts as well. Drop these calls.
Cc: Peyton.Lee@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2a6c826cfeedd7714611ac115371a959ead55bda)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9eb00b5f5697bd56baa3222c7a1426fa15bacfb5 upstream.
Delete the attempt to progress the queue when checking if fence is
signaled. This avoids deadlock.
dma-fence_ops::signaled can be called with the fence lock in unknown
state. For radeon, the fence lock is also the wait queue lock. This can
cause a self deadlock when signaled() tries to make forward progress on
the wait queue. But advancing the queue is unneeded because incorrectly
returning false from signaled() is perfectly acceptable.
Link: https://github.com/brave/brave-browser/issues/49182
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4641
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Robert McClinton <rbmccav@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 527ba26e50ec2ca2be9c7c82f3ad42998a75d0db)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4c5376b4b143c4834ebd392aef2215847752b16a upstream.
driver_find_device() calls get_device() to increment the reference
count once a matching device is found, but there is no put_device() to
balance the reference count. To avoid reference count leakage, add
put_device() to decrease the reference count.
Found by code review.
Cc: stable@vger.kernel.org
Fixes: a31500fe7055 ("drm/tegra: dc: Restore coupling of display controllers")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Acked-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20251022114720.24937-1-make24@iscas.ac.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 949f1fd2225baefbea2995afa807dba5cbdb6bd3 upstream.
nvkm_falcon_fw::boot is allocated, but no one frees it. This causes a
kmemleak warning.
Make sure this data is deallocated.
Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251117084231.2910561-1-namcao@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 660b299bed2a2a55a1f9102d029549d0235f881c upstream.
Commit b6bcbce33596 ("soc/tegra: pmc: Ensure power-domains are in a
known state") was introduced so that all power domains get initialized
to a known working state when booting and it does this by shutting them
down (including asserting resets and disabling clocks) before registering
each power domain with the genpd framework, leaving it to each driver to
later on power its needed domains.
This caused the Google Pixel C to hang when booting due to a workaround
in the DSI driver introduced in commit b22fd0b9639e ("drm/tegra: dsi:
Clear enable register if powered by bootloader") meant to handle the case
where the bootloader enabled the DSI hardware module. The workaround relies
on reading a hardware register to determine the current status and after
b6bcbce33596 that now happens in a powered down state thus leading to
the boot hang.
Fix this by reverting b22fd0b9639e since currently we are guaranteed
that the hardware will be fully reset by the time we start enabling the
DSI module.
Fixes: b6bcbce33596 ("soc/tegra: pmc: Ensure power-domains are in a known state")
Cc: stable@vger.kernel.org
Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20251103-diogo-smaug_ec_typec-v1-1-be656ccda391@tecnico.ulisboa.pt
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 22a36e660d014925114feb09a2680bb3c2d1e279 upstream.
Certain multi-GPU configurations (especially GFX12) may hit
data corruption when a DCC-compressed VRAM surface is shared across GPUs
using peer-to-peer (P2P) DMA transfers.
Such surfaces rely on device-local metadata and cannot be safely accessed
through a remote GPU’s page tables. Attempting to import a DCC-enabled
surface through P2P leads to incorrect rendering or GPU faults.
This change disables P2P for DCC-enabled VRAM buffers that are contiguous
and allocated on GFX12+ hardware. In these cases, the importer falls back
to the standard system-memory path, avoiding invalid access to compressed
surfaces.
Future work could consider optional migration (VRAM→System→VRAM) if a
performance regression is observed when `attach->peer2peer = false`.
Tested on:
- Dual RX 9700 XT (Navi4x) setup
- GNOME and Wayland compositor scenarios
- Confirmed no corruption after disabling P2P under these conditions
v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS.
v3: simplify for upsteam and fix ip version check (Alex)
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9dff2bb709e6fbd97e263fd12bf12802d2b5a0cf)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d15deafab5d722afb9e2f83c5edcdef9d9d98bd1 upstream.
Over allocation of save area is not fatal, only under allocation is.
ROCm has various components that independently claim authority over save
area size.
Unless KFD decides to claim single authority, relax size checks.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 15bd4958fe38e763bc17b607ba55155254a01f55)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 32b415a9dc2c212e809b7ebc2b14bc3fbda2b9af ]
This data originates from userspace and is used in buffer offset
calculations which could potentially overflow causing an out-of-bounds
access.
Fixes: 8ce75f8ab904 ("drm/vmwgfx: Update device includes for DX device functionality")
Reported-by: Rohit Keshri <rkeshri@redhat.com>
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patch.msgid.link/20251021190128.13014-1-ian.forbes@broadcom.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 883f309add55060233bf11c1ea6947140372920f ]
Previously, APU platforms (and other scenarios with uninitialized VRAM managers)
triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root
cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL,
but that `man->bdev` (the backing device pointer within the manager) remains
uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully
set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to
acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to
a kernel OOPS.
1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
`amdgpu_cs_get_threshold_for_moves()` to include a check for
`ttm_resource_manager_used()`. If the manager is not used (uninitialized
`bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific
logic that would trigger the NULL dereference.
2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
reporting to use a conditional: if the manager is used, return the real VRAM
usage; otherwise, return 0. This avoids accessing `man->bdev` when it is
NULL.
3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function)
data write path. Use `ttm_resource_manager_used()` to check validity: if the
manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set
`fb_usage` to 0 (APUs have no discrete framebuffer to report).
This approach is more robust than APU-specific checks because it:
- Works for all scenarios where the VRAM manager is uninitialized (not just APUs),
- Aligns with TTM's design by using its native helper function,
- Preserves correct behavior for discrete GPUs (which have fully initialized
`man->bdev` and pass the `ttm_resource_manager_used()` check).
v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 33cc891b56b93cad1a83263eaf2e417436f70c82 ]
Otherwise accessing them can cause a crash.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5c05bcf6ae7732da1bd4dc1958d527b5f07f216a ]
On various SI GPUs, a flickering can be observed near the bottom
edge of the screen when using a single 4K 60Hz monitor over DP.
Disabling MCLK switching works around this problem.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5d55ed19d4190d2c210ac05ac7a53f800a8c6fe5 ]
Those can be triggered trivially by userspace.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b11a020d914c3b7628f56a9ea476a5b03679489b ]
Currently Xe driver is triggering flr without any clean-up on
shutdown. This is causing random warnings from pending related works as the
underlying hardware is reset in the middle of their execution.
Fix this by performing clean shutdown also when using flr.
Fixes: 501d799a47e2 ("drm/xe: Wire up device shutdown handler")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patch.msgid.link/20251031122312.1836534-1-jouni.hogander@intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
(cherry picked from commit a4ff26b7c8ef38e4dd34f77cbcd73576fdde6dd4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9cd27eec872f0b95dcdd811edc39d2d32e4158c8 ]
The xe_device_shutdown() function was needing a few declarations
that were only required under a specific condition. This change
moves those declarations to be within that conditional branch
to avoid unnecessary declarations.
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 15b3036045188f4da4ca62b2ed01b0f160252e9b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Stable-dep-of: b11a020d914c ("drm/xe: Do clean shutdown also when using flr")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 95af8f4fdce8349a5fe75264007f1af2aa1082ea ]
Cancel and wait for any Dead CT worker to complete before continuing
with device unbinding. Else the worker will end up using resources freed
by the undind operation.
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Fixes: d2c5a5a926f4 ("drm/xe/guc: Dead CT helper")
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251103123144.3231829-6-balasubramani.vivekanandan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 492671339114e376aaa38626d637a2751cdef263)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b09cb2996cdf50cd1ab4020e002c95d742c81313 ]
commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.
Add that to the failure handling path even on a quick failure.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4104c0a454f6a4d1e0d14895d03c0e7bdd0c8240)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7d44ad6b43d0be43d080180413a1b6c24cfbd266 ]
When tick values are large, the multiplication by NSEC_PER_SEC is larger
than 64 bits and results in bad conversions.
The issue is seen in PMU busyness counters that look like they have
wrapped around due to bad conversion. i915 PMU implementation returns
monotonically increasing counters. If a count is lesser than previous
one, it will only return the larger value until the smaller value
catches up. The user will see this as zero delta between two
measurements even though the engines are busy.
Fix it by using mul_u64_u32_div()
Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14955
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20251016000350.1152382-2-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 2ada9cb1df3f5405a01d013b708b1b0914efccfe)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Added the Fixes tag while cherry-picking to fixes]
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 84bbe327a5cbb060f3321c9d9d4d53936fc1ef9b ]
On completion of i915_vma_pin_ww(), a synchronous variant of
dma_fence_work_commit() is called. When pinning a VMA to GGTT address
space on a Cherry View family processor, or on a Broxton generation SoC
with VTD enabled, i.e., when stop_machine() is then called from
intel_ggtt_bind_vma(), that can potentially lead to lock inversion among
reservation_ww and cpu_hotplug locks.
[86.861179] ======================================================
[86.861193] WARNING: possible circular locking dependency detected
[86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ #1 Tainted: G U
[86.861226] ------------------------------------------------------
[86.861238] i915_module_loa/1432 is trying to acquire lock:
[86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50
[86.861290]
but task is already holding lock:
[86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.862233]
which lock already depends on the new lock.
[86.862251]
the existing dependency chain (in reverse order) is:
[86.862265]
-> #5 (reservation_ww_class_mutex){+.+.}-{3:3}:
[86.862292] dma_resv_lockdep+0x19a/0x390
[86.862315] do_one_initcall+0x60/0x3f0
[86.862334] kernel_init_freeable+0x3cd/0x680
[86.862353] kernel_init+0x1b/0x200
[86.862369] ret_from_fork+0x47/0x70
[86.862383] ret_from_fork_asm+0x1a/0x30
[86.862399]
-> #4 (reservation_ww_class_acquire){+.+.}-{0:0}:
[86.862425] dma_resv_lockdep+0x178/0x390
[86.862440] do_one_initcall+0x60/0x3f0
[86.862454] kernel_init_freeable+0x3cd/0x680
[86.862470] kernel_init+0x1b/0x200
[86.862482] ret_from_fork+0x47/0x70
[86.862495] ret_from_fork_asm+0x1a/0x30
[86.862509]
-> #3 (&mm->mmap_lock){++++}-{3:3}:
[86.862531] down_read_killable+0x46/0x1e0
[86.862546] lock_mm_and_find_vma+0xa2/0x280
[86.862561] do_user_addr_fault+0x266/0x8e0
[86.862578] exc_page_fault+0x8a/0x2f0
[86.862593] asm_exc_page_fault+0x27/0x30
[86.862607] filldir64+0xeb/0x180
[86.862620] kernfs_fop_readdir+0x118/0x480
[86.862635] iterate_dir+0xcf/0x2b0
[86.862648] __x64_sys_getdents64+0x84/0x140
[86.862661] x64_sys_call+0x1058/0x2660
[86.862675] do_syscall_64+0x91/0xe90
[86.862689] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.862703]
-> #2 (&root->kernfs_rwsem){++++}-{3:3}:
[86.862725] down_write+0x3e/0xf0
[86.862738] kernfs_add_one+0x30/0x3c0
[86.862751] kernfs_create_dir_ns+0x53/0xb0
[86.862765] internal_create_group+0x134/0x4c0
[86.862779] sysfs_create_group+0x13/0x20
[86.862792] topology_add_dev+0x1d/0x30
[86.862806] cpuhp_invoke_callback+0x4b5/0x850
[86.862822] cpuhp_issue_call+0xbf/0x1f0
[86.862836] __cpuhp_setup_state_cpuslocked+0x111/0x320
[86.862852] __cpuhp_setup_state+0xb0/0x220
[86.862866] topology_sysfs_init+0x30/0x50
[86.862879] do_one_initcall+0x60/0x3f0
[86.862893] kernel_init_freeable+0x3cd/0x680
[86.862908] kernel_init+0x1b/0x200
[86.862921] ret_from_fork+0x47/0x70
[86.862934] ret_from_fork_asm+0x1a/0x30
[86.862947]
-> #1 (cpuhp_state_mutex){+.+.}-{3:3}:
[86.862969] __mutex_lock+0xaa/0xed0
[86.862982] mutex_lock_nested+0x1b/0x30
[86.862995] __cpuhp_setup_state_cpuslocked+0x67/0x320
[86.863012] __cpuhp_setup_state+0xb0/0x220
[86.863026] page_alloc_init_cpuhp+0x2d/0x60
[86.863041] mm_core_init+0x22/0x2d0
[86.863054] start_kernel+0x576/0xbd0
[86.863068] x86_64_start_reservations+0x18/0x30
[86.863084] x86_64_start_kernel+0xbf/0x110
[86.863098] common_startup_64+0x13e/0x141
[86.863114]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
[86.863135] __lock_acquire+0x1635/0x2810
[86.863152] lock_acquire+0xc4/0x2f0
[86.863166] cpus_read_lock+0x41/0x100
[86.863180] stop_machine+0x1c/0x50
[86.863194] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.863987] intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.864735] __vma_bind+0x55/0x70 [i915]
[86.865510] fence_work+0x26/0xa0 [i915]
[86.866248] fence_notify+0xa1/0x140 [i915]
[86.866983] __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.867719] i915_sw_fence_commit+0x39/0x60 [i915]
[86.868453] i915_vma_pin_ww+0x462/0x1360 [i915]
[86.869228] i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.870001] initial_plane_vma+0x307/0x840 [i915]
[86.870774] intel_initial_plane_config+0x33f/0x670 [i915]
[86.871546] intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.872330] i915_driver_probe+0x7fa/0xe80 [i915]
[86.873057] i915_pci_probe+0xe6/0x220 [i915]
[86.873782] local_pci_probe+0x47/0xb0
[86.873802] pci_device_probe+0xf3/0x260
[86.873817] really_probe+0xf1/0x3c0
[86.873833] __driver_probe_device+0x8c/0x180
[86.873848] driver_probe_device+0x24/0xd0
[86.873862] __driver_attach+0x10f/0x220
[86.873876] bus_for_each_dev+0x7f/0xe0
[86.873892] driver_attach+0x1e/0x30
[86.873904] bus_add_driver+0x151/0x290
[86.873917] driver_register+0x5e/0x130
[86.873931] __pci_register_driver+0x7d/0x90
[86.873945] i915_pci_register_driver+0x23/0x30 [i915]
[86.874678] i915_init+0x37/0x120 [i915]
[86.875347] do_one_initcall+0x60/0x3f0
[86.875369] do_init_module+0x97/0x2a0
[86.875385] load_module+0x2c54/0x2d80
[86.875398] init_module_from_file+0x96/0xe0
[86.875413] idempotent_init_module+0x117/0x330
[86.875426] __x64_sys_finit_module+0x77/0x100
[86.875440] x64_sys_call+0x24de/0x2660
[86.875454] do_syscall_64+0x91/0xe90
[86.875470] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.875486]
other info that might help us debug this:
[86.875502] Chain exists of:
cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[86.875539] Possible unsafe locking scenario:
[86.875552] CPU0 CPU1
[86.875563] ---- ----
[86.875573] lock(reservation_ww_class_mutex);
[86.875588] lock(reservation_ww_class_acquire);
[86.875606] lock(reservation_ww_class_mutex);
[86.875624] rlock(cpu_hotplug_lock);
[86.875637]
*** DEADLOCK ***
[86.875650] 3 locks held by i915_module_loa/1432:
[86.875663] #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220
[86.875699] #1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.876512] #2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915]
[86.877305]
stack backtrace:
[86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G U 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ #1 PREEMPT(voluntary)
[86.877334] Tainted: [U]=USER
[86.877336] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020
[86.877339] Call Trace:
[86.877344] <TASK>
[86.877353] dump_stack_lvl+0x91/0xf0
[86.877364] dump_stack+0x10/0x20
[86.877369] print_circular_bug+0x285/0x360
[86.877379] check_noncircular+0x135/0x150
[86.877390] __lock_acquire+0x1635/0x2810
[86.877403] lock_acquire+0xc4/0x2f0
[86.877408] ? stop_machine+0x1c/0x50
[86.877422] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878173] cpus_read_lock+0x41/0x100
[86.878182] ? stop_machine+0x1c/0x50
[86.878191] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915]
[86.878916] stop_machine+0x1c/0x50
[86.878927] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915]
[86.879652] intel_ggtt_bind_vma+0x43/0x70 [i915]
[86.880375] __vma_bind+0x55/0x70 [i915]
[86.881133] fence_work+0x26/0xa0 [i915]
[86.881851] fence_notify+0xa1/0x140 [i915]
[86.882566] __i915_sw_fence_complete+0x8f/0x270 [i915]
[86.883286] i915_sw_fence_commit+0x39/0x60 [i915]
[86.884003] i915_vma_pin_ww+0x462/0x1360 [i915]
[86.884756] ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915]
[86.885513] i915_vma_pin.constprop.0+0x133/0x1d0 [i915]
[86.886281] initial_plane_vma+0x307/0x840 [i915]
[86.887049] intel_initial_plane_config+0x33f/0x670 [i915]
[86.887819] intel_display_driver_probe_nogem+0x1c6/0x260 [i915]
[86.888587] i915_driver_probe+0x7fa/0xe80 [i915]
[86.889293] ? mutex_unlock+0x12/0x20
[86.889301] ? drm_privacy_screen_get+0x171/0x190
[86.889308] ? acpi_dev_found+0x66/0x80
[86.889321] i915_pci_probe+0xe6/0x220 [i915]
[86.890038] local_pci_probe+0x47/0xb0
[86.890049] pci_device_probe+0xf3/0x260
[86.890058] really_probe+0xf1/0x3c0
[86.890067] __driver_probe_device+0x8c/0x180
[86.890072] driver_probe_device+0x24/0xd0
[86.890078] __driver_attach+0x10f/0x220
[86.890083] ? __pfx___driver_attach+0x10/0x10
[86.890088] bus_for_each_dev+0x7f/0xe0
[86.890097] driver_attach+0x1e/0x30
[86.890101] bus_add_driver+0x151/0x290
[86.890107] driver_register+0x5e/0x130
[86.890113] __pci_register_driver+0x7d/0x90
[86.890119] i915_pci_register_driver+0x23/0x30 [i915]
[86.890833] i915_init+0x37/0x120 [i915]
[86.891482] ? __pfx_i915_init+0x10/0x10 [i915]
[86.892135] do_one_initcall+0x60/0x3f0
[86.892145] ? __kmalloc_cache_noprof+0x33f/0x470
[86.892157] do_init_module+0x97/0x2a0
[86.892164] load_module+0x2c54/0x2d80
[86.892168] ? __kernel_read+0x15c/0x300
[86.892185] ? kernel_read_file+0x2b1/0x320
[86.892195] init_module_from_file+0x96/0xe0
[86.892199] ? init_module_from_file+0x96/0xe0
[86.892211] idempotent_init_module+0x117/0x330
[86.892224] __x64_sys_finit_module+0x77/0x100
[86.892230] x64_sys_call+0x24de/0x2660
[86.892236] do_syscall_64+0x91/0xe90
[86.892243] ? irqentry_exit+0x77/0xb0
[86.892249] ? sysvec_apic_timer_interrupt+0x57/0xc0
[86.892256] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[86.892261] RIP: 0033:0x7303e1b2725d
[86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d
[86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c
[86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80
[86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0
[86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710
[86.892304] </TASK>
Call asynchronous variant of dma_fence_work_commit() in that case.
v3: Provide more verbose in-line comment (Andi),
- mention target environments in commit message.
Fixes: 7d1c2618eac59 ("drm/i915: Take reservation lock around i915_vma_pin.")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985
Cc: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Acked-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20251023082925.351307-6-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 648ef1324add1c2e2b6041cdf0b28d31fbca5f13)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit afcfb6c8474d9e750880aaa77952cc588f859613 ]
Call pm_runtime_resume_and_get() before accessing GCE hardware in
mbox_send_message(), and invoke pm_runtime_put_autosuspend() in the
cmdq callback to release the PM reference and start autosuspend for
GCE. This ensures correct power management for the GCE device.
Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend")
Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829091727.3745415-3-jason-jh.lin@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit fdc93beeadc2439e5e85d056a8fe681dcced09da upstream.
[Why & How]
This fixes the black screen issue on certain APUs with HDMI,
accompanied by the following messages:
amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info
frame on connector DP-1: -22
amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm]
Cannot find any crtc or sizes
Fixes: 489f0f600ce2 ("drm/amd/display: Fix DVI-D/HDMI adapters")
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 678c901443a6d2e909e3b51331a20f9d8f84ce82)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 38ab33dbea594700c8d6cc81eec0a54e95d3eb2f upstream.
Align the function headers for `amdgpu_max_hdmi_pixel_clock` and
`amdgpu_connector_dvi_mode_valid` with the function implementations so
they match the expected kdoc style.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Returns the maximum supported HDMI (TMDS) pixel clock in KHz.
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1212: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Validates the given display mode on DVI and HDMI connectors.
Fixes: 585b2f685c56 ("drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 46b0e6b9d749cfa891e6969d6565be1131c53aa2 upstream.
The error log is supposed to be gaurded under if failure condition.
Fixes: faab5ea08367 ("drm/amdgpu: Check vcn sram load return value")
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2f9c63883730a0bfecb086e6e59246933f936ca1 upstream.
Use `atomic_commit_setup` to change the DC stream state. It's a
preparation to remove from `atomic_check` changes in CRTC color
components of DC stream state and prevent DC to commit TEST_ONLY
changes.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4444
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6dd97ceb645c08aca9fc871a3006e47fe699f0ac upstream.
When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6
Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025
RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
seq_read_iter+0x125/0x490
? __alloc_frozen_pages_noprof+0x18f/0x350
seq_read+0x12c/0x170
full_proxy_read+0x51/0x80
vfs_read+0xbc/0x390
? __handle_mm_fault+0xa46/0xef0
? do_syscall_64+0x71/0x900
ksys_read+0x73/0xf0
do_syscall_64+0x71/0x900
? count_memcg_events+0xc2/0x190
? handle_mm_fault+0x1d7/0x2d0
? do_user_addr_fault+0x21a/0x690
? exc_page_fault+0x7e/0x1a0
entry_SYSCALL_64_after_hwframe+0x6c/0x74
RIP: 0033:0x7f44d4031687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00>
RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687
RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003
RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
</TASK>
Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x>
snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn>
platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp>
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
PKRU: 55555554
Fix this by checking pipe_ctx->stream_res.tg before dereferencing.
Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info")
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Mario Limoncello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f19bbecd34e3c15eed7e5e593db2ac0fc7a0e6d8)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c5609b72bfe57d8c601d9561e0d2551b605c017 upstream.
Fix the flows for S0ix. There is no need to stop
rlc or reintialize PMFW in S0ix.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fd39b5a5830d8f2553e0c09d4d50bdff28b10080)
Cc: <stable@vger.kernel.org> # c81f5cebe849: drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 487df8b698345dd5a91346335f05170ed5f29d4e upstream.
The Mesa issue referenced below pointed out a possible deadlock:
[ 1231.611031] Possible interrupt unsafe locking scenario:
[ 1231.611033] CPU0 CPU1
[ 1231.611034] ---- ----
[ 1231.611035] lock(&xa->xa_lock#17);
[ 1231.611038] local_irq_disable();
[ 1231.611039] lock(&fence->lock);
[ 1231.611041] lock(&xa->xa_lock#17);
[ 1231.611044] <Interrupt>
[ 1231.611045] lock(&fence->lock);
[ 1231.611047]
*** DEADLOCK ***
In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).
CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.
Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.
dma_fence_signal() // locks f1.lock
-> drm_sched_entity_kill_jobs_cb()
-> foreach dependencies
-> dma_fence_add_callback() // locks f2.lock
This will deadlock if f1 and f2 share the same spinlock.
To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().
Cc: stable@vger.kernel.org # v6.2+
Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
[phasta: commit message nits]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|