<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/thermal, branch v6.6.23</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>thermal/drivers/qoriq: Fix getting tmu range</title>
<updated>2024-03-26T22:20:07+00:00</updated>
<author>
<name>Peng Fan</name>
<email>peng.fan@nxp.com</email>
</author>
<published>2024-02-26T00:36:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2584d6bbc2bca1819059907990fcbec8228b2090'/>
<id>2584d6bbc2bca1819059907990fcbec8228b2090</id>
<content type='text'>
[ Upstream commit 4d0642074c67ed9928e9d68734ace439aa06e403 ]

TMU Version 1 has 4 TTRCRs, while TMU Version &gt;=2 has 16 TTRCRs.
So limit the len to 4 will report "invalid range data" for i.MX93.

This patch drop the local array with allocated ttrcr array and
able to support larger tmu ranges.

Fixes: f12d60c81fce ("thermal/drivers/qoriq: Support version 2.1")
Tested-by: Sascha Hauer &lt;s.hauer@pengutronix.de&gt;
Signed-off-by: Peng Fan &lt;peng.fan@nxp.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20240226003657.3012880-1-peng.fan@oss.nxp.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 4d0642074c67ed9928e9d68734ace439aa06e403 ]

TMU Version 1 has 4 TTRCRs, while TMU Version &gt;=2 has 16 TTRCRs.
So limit the len to 4 will report "invalid range data" for i.MX93.

This patch drop the local array with allocated ttrcr array and
able to support larger tmu ranges.

Fixes: f12d60c81fce ("thermal/drivers/qoriq: Support version 2.1")
Tested-by: Sascha Hauer &lt;s.hauer@pengutronix.de&gt;
Signed-off-by: Peng Fan &lt;peng.fan@nxp.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20240226003657.3012880-1-peng.fan@oss.nxp.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error handling path</title>
<updated>2024-03-26T22:20:07+00:00</updated>
<author>
<name>Christophe JAILLET</name>
<email>christophe.jaillet@wanadoo.fr</email>
</author>
<published>2024-01-28T08:38:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2db869da91afd48e5b9ec76814709be49662b07d'/>
<id>2db869da91afd48e5b9ec76814709be49662b07d</id>
<content type='text'>
[ Upstream commit ca93bf607a44c1f009283dac4af7df0d9ae5e357 ]

If devm_krealloc() fails, then 'efuse' is leaking.
So free it to avoid a leak.

Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Christophe JAILLET &lt;christophe.jaillet@wanadoo.fr&gt;
Reviewed-by: Matthias Brugger &lt;matthias.bgg@gmail.com&gt;
Reviewed-by: AngeloGioacchino Del Regno &lt;angelogioacchino.delregno@collabora.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/481d345233862d58c3c305855a93d0dbc2bbae7e.1706431063.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ca93bf607a44c1f009283dac4af7df0d9ae5e357 ]

If devm_krealloc() fails, then 'efuse' is leaking.
So free it to avoid a leak.

Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Christophe JAILLET &lt;christophe.jaillet@wanadoo.fr&gt;
Reviewed-by: Matthias Brugger &lt;matthias.bgg@gmail.com&gt;
Reviewed-by: AngeloGioacchino Del Regno &lt;angelogioacchino.delregno@collabora.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/481d345233862d58c3c305855a93d0dbc2bbae7e.1706431063.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: core: Fix thermal zone suspend-resume synchronization</title>
<updated>2024-02-05T20:14:15+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-12-18T19:25:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=fcecef9a84f689631074240d63f4e00c4c94a614'/>
<id>fcecef9a84f689631074240d63f4e00c4c94a614</id>
<content type='text'>
[ Upstream commit 4e814173a8c4f432fd068b1c796f0416328c9d99 ]

There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:

 1. The resume code runs in a PM notifier which is invoked after user
    space has been thawed, so it can run concurrently with user space
    which can trigger a thermal zone device removal.  If that happens,
    the thermal zone resume code may use a stale pointer to the next
    list element and crash, because it does not hold thermal_list_lock
    while walking thermal_tz_list.

 2. The thermal zone resume code calls thermal_zone_device_init()
    outside the zone lock, so user space or an update triggered by
    the platform firmware may see an inconsistent state of a
    thermal zone leading to unexpected behavior.

 3. Clearing the in_suspend global variable in thermal_pm_notify()
    allows __thermal_zone_device_update() to continue for all thermal
    zones and it may as well run before the thermal_tz_list walk (or
    at any point during the list walk for that matter) and attempt to
    operate on a thermal zone that has not been resumed yet.  It may
    also race destructively with thermal_zone_device_init().

To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.

Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye &lt;bo.ye@mediatek.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 4e814173a8c4f432fd068b1c796f0416328c9d99 ]

There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:

 1. The resume code runs in a PM notifier which is invoked after user
    space has been thawed, so it can run concurrently with user space
    which can trigger a thermal zone device removal.  If that happens,
    the thermal zone resume code may use a stale pointer to the next
    list element and crash, because it does not hold thermal_list_lock
    while walking thermal_tz_list.

 2. The thermal zone resume code calls thermal_zone_device_init()
    outside the zone lock, so user space or an update triggered by
    the platform firmware may see an inconsistent state of a
    thermal zone leading to unexpected behavior.

 3. Clearing the in_suspend global variable in thermal_pm_notify()
    allows __thermal_zone_device_update() to continue for all thermal
    zones and it may as well run before the thermal_tz_list walk (or
    at any point during the list walk for that matter) and attempt to
    operate on a thermal zone that has not been resumed yet.  It may
    also race destructively with thermal_zone_device_init().

To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.

Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye &lt;bo.ye@mediatek.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: trip: Drop lockdep assertion from thermal_zone_trip_id()</title>
<updated>2024-02-01T00:19:14+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-10-11T15:45:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ee82479f5d740968828a1fde41598f1d1e62eee6'/>
<id>ee82479f5d740968828a1fde41598f1d1e62eee6</id>
<content type='text'>
commit 108ffd12be24ba1d74b3314df8db32a0a6d55ba5 upstream.

The lockdep assertion in thermal_zone_trip_id() triggers when the
trip point sysfs attribute of a thermal instance is read, because
there is no thermal zone locking in that code path.

This is not verly useful, though, because there is no mechanism by which
the location of the trips[] table in a thermal zone or its size can
change after binding cooling devices to the trips in that thermal
zone and before those cooling devices are unbound from them.  Thus
it is not in fact necessary to hold the thermal zone lock when
thermal_zone_trip_id() is called from trip_point_show() and so the
lockdep asserion in the former is invalid.

Accordingly, drop that lockdep assertion.

Fixes: 2c7b4bfadef0 ("thermal: core: Store trip pointer in struct thermal_instance")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 108ffd12be24ba1d74b3314df8db32a0a6d55ba5 upstream.

The lockdep assertion in thermal_zone_trip_id() triggers when the
trip point sysfs attribute of a thermal instance is read, because
there is no thermal zone locking in that code path.

This is not verly useful, though, because there is no mechanism by which
the location of the trips[] table in a thermal zone or its size can
change after binding cooling devices to the trips in that thermal
zone and before those cooling devices are unbound from them.  Thus
it is not in fact necessary to hold the thermal zone lock when
thermal_zone_trip_id() is called from trip_point_show() and so the
lockdep asserion in the former is invalid.

Accordingly, drop that lockdep assertion.

Fixes: 2c7b4bfadef0 ("thermal: core: Store trip pointer in struct thermal_instance")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: gov_power_allocator: avoid inability to reset a cdev</title>
<updated>2024-02-01T00:19:10+00:00</updated>
<author>
<name>Di Shen</name>
<email>di.shen@unisoc.com</email>
</author>
<published>2024-01-10T11:55:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d10ff0b3eaf6cec35f6928f1cca7a56b1a089b07'/>
<id>d10ff0b3eaf6cec35f6928f1cca7a56b1a089b07</id>
<content type='text'>
[ Upstream commit e95fa7404716f6e25021e66067271a4ad8eb1486 ]

Commit 0952177f2a1f ("thermal/core/power_allocator: Update once
cooling devices when temp is low") adds an update flag to avoid
triggering a thermal event when there is no need, and the thermal
cdev is updated once when the temperature is low.

But when the trips are writable, and switch_on_temp is set to be a
higher value, the cooling device state may not be reset to 0,
because last_temperature is smaller than switch_on_temp.

For example:
First:
switch_on_temp=70 control_temp=85;
Then userspace change the trip_temp:
switch_on_temp=45 control_temp=55 cur_temp=54

Then userspace reset the trip_temp:
switch_on_temp=70 control_temp=85 cur_temp=57 last_temp=54

At this time, the cooling device state should be reset to 0.
However, because cur_temp(57) &lt; switch_on_temp(70)
last_temp(54) &lt; switch_on_temp(70)  ----&gt;  update = false,
update is false, the cooling device state can not be reset.

Using the observation that tz-&gt;passive can also be regarded as the
temperature status, set the update flag to the tz-&gt;passive value.

When the temperature drops below switch_on for the first time, the
states of cooling devices can be reset once, and tz-&gt;passive is updated
to 0. In the next round, because tz-&gt;passive is 0, cdev-&gt;state will not
be updated.

By using the tz-&gt;passive value as the "update" flag, the issue above
can be solved, and the cooling devices can be updated only once when the
temperature is low.

Fixes: 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low")
Cc: 5.13+ &lt;stable@vger.kernel.org&gt; # 5.13+
Suggested-by: Wei Wang &lt;wvw@google.com&gt;
Signed-off-by: Di Shen &lt;di.shen@unisoc.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit e95fa7404716f6e25021e66067271a4ad8eb1486 ]

Commit 0952177f2a1f ("thermal/core/power_allocator: Update once
cooling devices when temp is low") adds an update flag to avoid
triggering a thermal event when there is no need, and the thermal
cdev is updated once when the temperature is low.

But when the trips are writable, and switch_on_temp is set to be a
higher value, the cooling device state may not be reset to 0,
because last_temperature is smaller than switch_on_temp.

For example:
First:
switch_on_temp=70 control_temp=85;
Then userspace change the trip_temp:
switch_on_temp=45 control_temp=55 cur_temp=54

Then userspace reset the trip_temp:
switch_on_temp=70 control_temp=85 cur_temp=57 last_temp=54

At this time, the cooling device state should be reset to 0.
However, because cur_temp(57) &lt; switch_on_temp(70)
last_temp(54) &lt; switch_on_temp(70)  ----&gt;  update = false,
update is false, the cooling device state can not be reset.

Using the observation that tz-&gt;passive can also be regarded as the
temperature status, set the update flag to the tz-&gt;passive value.

When the temperature drops below switch_on for the first time, the
states of cooling devices can be reset once, and tz-&gt;passive is updated
to 0. In the next round, because tz-&gt;passive is 0, cdev-&gt;state will not
be updated.

By using the tz-&gt;passive value as the "update" flag, the issue above
can be solved, and the cooling devices can be updated only once when the
temperature is low.

Fixes: 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low")
Cc: 5.13+ &lt;stable@vger.kernel.org&gt; # 5.13+
Suggested-by: Wei Wang &lt;wvw@google.com&gt;
Signed-off-by: Di Shen &lt;di.shen@unisoc.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: core: Store trip pointer in struct thermal_instance</title>
<updated>2024-02-01T00:19:10+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-09-21T17:52:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=77451ef587aa8aeb562e4163fd014bde940cca99'/>
<id>77451ef587aa8aeb562e4163fd014bde940cca99</id>
<content type='text'>
[ Upstream commit 2c7b4bfadef08cc0995c24a7b9eb120fe897165f ]

Replace the integer trip number stored in struct thermal_instance with
a pointer to the relevant trip and adjust the code using the structure
in question accordingly.

The main reason for making this change is to allow the trip point to
cooling device binding code more straightforward, as illustrated by
subsequent modifications of the ACPI thermal driver, but it also helps
to clarify the overall design and allows the governor code overhead to
be reduced (through subsequent modifications).

The only case in which it adds complexity is trip_point_show() that
needs to walk the trips[] table to find the index of the given trip
point, but this is not a critical path and the interface that
trip_point_show() belongs to is problematic anyway (for instance, it
doesn't cover the case when the same cooling devices is associated
with multiple trip points).

This is a preliminary change and the affected code will be refined by
a series of subsequent modifications of thermal governors, the core and
the ACPI thermal driver.

The general functionality is not expected to be affected by this change.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 2c7b4bfadef08cc0995c24a7b9eb120fe897165f ]

Replace the integer trip number stored in struct thermal_instance with
a pointer to the relevant trip and adjust the code using the structure
in question accordingly.

The main reason for making this change is to allow the trip point to
cooling device binding code more straightforward, as illustrated by
subsequent modifications of the ACPI thermal driver, but it also helps
to clarify the overall design and allows the governor code overhead to
be reduced (through subsequent modifications).

The only case in which it adds complexity is trip_point_show() that
needs to walk the trips[] table to find the index of the given trip
point, but this is not a critical path and the interface that
trip_point_show() belongs to is problematic anyway (for instance, it
doesn't cover the case when the same cooling devices is associated
with multiple trip points).

This is a preliminary change and the affected code will be refined by
a series of subsequent modifications of thermal governors, the core and
the ACPI thermal driver.

The general functionality is not expected to be affected by this change.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: trip: Drop redundant trips check from for_each_thermal_trip()</title>
<updated>2024-02-01T00:19:10+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-09-19T18:59:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=3a3bbc6911f57e1c3b4eabf1d098cde7bf7baeb0'/>
<id>3a3bbc6911f57e1c3b4eabf1d098cde7bf7baeb0</id>
<content type='text'>
[ Upstream commit a15ffa783ea4210877886c59566a0d20f6b2bc09 ]

It is invalid to call for_each_thermal_trip() on an unregistered thermal
zone anyway, and as per thermal_zone_device_register_with_trips(), the
trips[] table must be present if num_trips is greater than zero for the
given thermal zone.

Hence, the trips check in for_each_thermal_trip() is redundant and so it
can be dropped.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a15ffa783ea4210877886c59566a0d20f6b2bc09 ]

It is invalid to call for_each_thermal_trip() on an unregistered thermal
zone anyway, and as per thermal_zone_device_register_with_trips(), the
trips[] table must be present if num_trips is greater than zero for the
given thermal zone.

Hence, the trips check in for_each_thermal_trip() is redundant and so it
can be dropped.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Add syscore callbacks for system-wide PM</title>
<updated>2024-02-01T00:19:09+00:00</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-10T03:07:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=019ccc66d56a696a4dfee3bfa2f04d0a7c3d89ee'/>
<id>019ccc66d56a696a4dfee3bfa2f04d0a7c3d89ee</id>
<content type='text'>
[ Upstream commit 97566d09fd02d2ab329774bb89a2cdf2267e86d9 ]

The kernel allocates a memory buffer and provides its location to the
hardware, which uses it to update the HFI table. This allocation occurs
during boot and remains constant throughout runtime.

When resuming from hibernation, the restore kernel allocates a second
memory buffer and reprograms the HFI hardware with the new location as
part of a normal boot. The location of the second memory buffer may
differ from the one allocated by the image kernel.

When the restore kernel transfers control to the image kernel, its HFI
buffer becomes invalid, potentially leading to memory corruption if the
hardware writes to it (the hardware continues to use the buffer from the
restore kernel).

It is also possible that the hardware "forgets" the address of the memory
buffer when resuming from "deep" suspend. Memory corruption may also occur
in such a scenario.

To prevent the described memory corruption, disable HFI when preparing to
suspend or hibernate. Enable it when resuming.

Add syscore callbacks to handle the package of the boot CPU (packages of
non-boot CPUs are handled via CPU offline). Syscore ops always run on the
boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend
and hibernation. Syscore ops only run in these cases.

Cc: 6.1+ &lt;stable@vger.kernel.org&gt; # 6.1+
Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
[ rjw: Comment adjustment, subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 97566d09fd02d2ab329774bb89a2cdf2267e86d9 ]

The kernel allocates a memory buffer and provides its location to the
hardware, which uses it to update the HFI table. This allocation occurs
during boot and remains constant throughout runtime.

When resuming from hibernation, the restore kernel allocates a second
memory buffer and reprograms the HFI hardware with the new location as
part of a normal boot. The location of the second memory buffer may
differ from the one allocated by the image kernel.

When the restore kernel transfers control to the image kernel, its HFI
buffer becomes invalid, potentially leading to memory corruption if the
hardware writes to it (the hardware continues to use the buffer from the
restore kernel).

It is also possible that the hardware "forgets" the address of the memory
buffer when resuming from "deep" suspend. Memory corruption may also occur
in such a scenario.

To prevent the described memory corruption, disable HFI when preparing to
suspend or hibernate. Enable it when resuming.

Add syscore callbacks to handle the package of the boot CPU (packages of
non-boot CPUs are handled via CPU offline). Syscore ops always run on the
boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend
and hibernation. Syscore ops only run in these cases.

Cc: 6.1+ &lt;stable@vger.kernel.org&gt; # 6.1+
Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
[ rjw: Comment adjustment, subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Disable an HFI instance when all its CPUs go offline</title>
<updated>2024-02-01T00:19:09+00:00</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-03T04:14:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=0caf5dd01adfd4114d531699cc5c20bb4f9a348d'/>
<id>0caf5dd01adfd4114d531699cc5c20bb4f9a348d</id>
<content type='text'>
[ Upstream commit 1c53081d773c2cb4461636559b0d55b46559ceec ]

In preparation to support hibernation, add functionality to disable an HFI
instance during CPU offline. The last CPU of an instance that goes offline
will disable such instance.

The Intel Software Development Manual states that the operating system must
wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after
disabling an HFI instance to ensure that it will no longer write on the HFI
memory. Some processors, however, do not ever set such bit. Wait a minimum
of 2ms to give time hardware to complete any pending memory writes.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 1c53081d773c2cb4461636559b0d55b46559ceec ]

In preparation to support hibernation, add functionality to disable an HFI
instance during CPU offline. The last CPU of an instance that goes offline
will disable such instance.

The Intel Software Development Manual states that the operating system must
wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after
disabling an HFI instance to ensure that it will no longer write on the HFI
memory. Some processors, however, do not ever set such bit. Wait a minimum
of 2ms to give time hardware to complete any pending memory writes.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Refactor enabling code into helper functions</title>
<updated>2024-02-01T00:19:09+00:00</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-03T04:14:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=de791353675fcab27d476482014b9c899e8a36ab'/>
<id>de791353675fcab27d476482014b9c899e8a36ab</id>
<content type='text'>
[ Upstream commit 8a8b6bb93c704776c4b05cb517c3fa8baffb72f5 ]

In preparation for the addition of a suspend notifier, wrap the logic to
enable HFI and program its memory buffer into helper functions. Both the
CPU hotplug callback and the suspend notifier will use them.

This refactoring does not introduce functional changes.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8a8b6bb93c704776c4b05cb517c3fa8baffb72f5 ]

In preparation for the addition of a suspend notifier, wrap the logic to
enable HFI and program its memory buffer into helper functions. Both the
CPU hotplug callback and the suspend notifier will use them.

This refactoring does not introduce functional changes.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
