<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/kernel/mce.c, branch v5.10.258</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.</title>
<updated>2024-08-19T03:41:22+00:00</updated>
<author>
<name>Mahesh Salgaonkar</name>
<email>mahesh@linux.ibm.com</email>
</author>
<published>2024-04-10T04:30:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=fb6675db04c4b79883373edc578d5df7bbc84848'/>
<id>fb6675db04c4b79883373edc578d5df7bbc84848</id>
<content type='text'>
commit 0db880fc865ffb522141ced4bfa66c12ab1fbb70 upstream.

nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel
crash when invoked during real mode interrupt handling (e.g. early HMI/MCE
interrupt handler) if percpu allocation comes from vmalloc area.

Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI()
wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when
percpu allocation is from the embedded first chunk. However with
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu
allocation can come from the vmalloc area.

With kernel command line "percpu_alloc=page" we can force percpu allocation
to come from vmalloc area and can see kernel crash in machine_check_early:

[    1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110
[    1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0
[    1.215719] --- interrupt: 200
[    1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable)
[    1.215722] [c000000fffd731b0] [0000000000000000] 0x0
[    1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8

Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu
first chunk is not embedded.

Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Tested-by: Shirisha Ganta &lt;shirisha@linux.ibm.com&gt;
Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com
[ Conflicts in arch/powerpc/include/asm/interrupt.h
  because machine_check_early() and machine_check_exception()
  has been refactored. ]
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0db880fc865ffb522141ced4bfa66c12ab1fbb70 upstream.

nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel
crash when invoked during real mode interrupt handling (e.g. early HMI/MCE
interrupt handler) if percpu allocation comes from vmalloc area.

Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI()
wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when
percpu allocation is from the embedded first chunk. However with
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu
allocation can come from the vmalloc area.

With kernel command line "percpu_alloc=page" we can force percpu allocation
to come from vmalloc area and can see kernel crash in machine_check_early:

[    1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110
[    1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0
[    1.215719] --- interrupt: 200
[    1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable)
[    1.215722] [c000000fffd731b0] [0000000000000000] 0x0
[    1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8

Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu
first chunk is not embedded.

Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Tested-by: Shirisha Ganta &lt;shirisha@linux.ibm.com&gt;
Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com
[ Conflicts in arch/powerpc/include/asm/interrupt.h
  because machine_check_early() and machine_check_exception()
  has been refactored. ]
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash</title>
<updated>2020-10-16T09:13:55+00:00</updated>
<author>
<name>Ganesh Goudar</name>
<email>ganeshgr@linux.ibm.com</email>
</author>
<published>2020-10-09T06:40:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8d0e2101274358d9b6b1f27232b40253ca48bab5'/>
<id>8d0e2101274358d9b6b1f27232b40253ca48bab5</id>
<content type='text'>
Use of nmi_enter/exit in real mode handler causes the kernel to panic
and reboot on injecting SLB mutihit on pseries machine running in hash
MMU mode, because these calls try to accesses memory outside RMO
region in real mode handler where translation is disabled.

Add check to not to use these calls on pseries machine running in hash
MMU mode.

Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Ganesh Goudar &lt;ganeshgr@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20201009064005.19777-2-ganeshgr@linux.ibm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use of nmi_enter/exit in real mode handler causes the kernel to panic
and reboot on injecting SLB mutihit on pseries machine running in hash
MMU mode, because these calls try to accesses memory outside RMO
region in real mode handler where translation is disabled.

Add check to not to use these calls on pseries machine running in hash
MMU mode.

Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Ganesh Goudar &lt;ganeshgr@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20201009064005.19777-2-ganeshgr@linux.ibm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: Move HMI IRQ stat from percpu variable to paca.</title>
<updated>2020-07-29T13:47:53+00:00</updated>
<author>
<name>Mahesh Salgaonkar</name>
<email>mahesh@linux.ibm.com</email>
</author>
<published>2020-06-23T10:27:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ada68a66b72687e6b74e35c42efd1783e84b01fd'/>
<id>ada68a66b72687e6b74e35c42efd1783e84b01fd</id>
<content type='text'>
With the proposed change in percpu bootmem allocator to use page
mapping [1], the percpu first chunk memory area can come from vmalloc
ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler
crash the kernel whenever percpu variable is accessed in real mode.
This patch fixes this issue by moving the HMI IRQ stat inside paca for
safe access in realmode.

[1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/

Suggested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the proposed change in percpu bootmem allocator to use page
mapping [1], the percpu first chunk memory area can come from vmalloc
ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler
crash the kernel whenever percpu variable is accessed in real mode.
This patch fixes this issue by moving the HMI IRQ stat inside paca for
safe access in realmode.

[1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/

Suggested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Signed-off-by: Mahesh Salgaonkar &lt;mahesh@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/powernv: Machine check handler for POWER10</title>
<updated>2020-07-23T07:43:30+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-07-02T23:33:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=201220bb0e8cbc163ec7f550b3b7b3da46eb5877'/>
<id>201220bb0e8cbc163ec7f550b3b7b3da46eb5877</id>
<content type='text'>
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200702233343.1128026-1-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200702233343.1128026-1-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/mce: Add MCE notification chain</title>
<updated>2020-07-20T12:57:56+00:00</updated>
<author>
<name>Santosh Sivaraj</name>
<email>santosh@fossix.org</email>
</author>
<published>2020-07-09T13:51:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c37a63afc429ce959402168f67e4f094ab639ace'/>
<id>c37a63afc429ce959402168f67e4f094ab639ace</id>
<content type='text'>
Introduce notification chain which lets us know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.

Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Ganesh Goudar &lt;ganeshgr@linux.ibm.com&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce notification chain which lets us know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.

Signed-off-by: Santosh Sivaraj &lt;santosh@fossix.org&gt;
Signed-off-by: Ganesh Goudar &lt;ganeshgr@linux.ibm.com&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: machine check do not trace real-mode handler</title>
<updated>2020-05-18T14:10:34+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-05-08T04:34:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=abd106fb437ad1cd8c8df8ccabd0fa941ef6342a'/>
<id>abd106fb437ad1cd8c8df8ccabd0fa941ef6342a</id>
<content type='text'>
Rather than notrace annotations throughout a significant part of the
machine check code across kernel/ pseries/ and powernv/ which can
easily be broken and is infrequently tested, use paca-&gt;ftrace_enabled
to blanket-disable tracing of the real-mode non-maskable handler.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Acked-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Link: https://lore.kernel.org/r/20200508043408.886394-14-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rather than notrace annotations throughout a significant part of the
machine check code across kernel/ pseries/ and powernv/ which can
easily be broken and is infrequently tested, use paca-&gt;ftrace_enabled
to blanket-disable tracing of the real-mode non-maskable handler.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Acked-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Link: https://lore.kernel.org/r/20200508043408.886394-14-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s: machine check interrupt update NMI accounting</title>
<updated>2020-05-18T14:10:34+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-05-08T04:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=116ac378bb3ff844df333e7609e7604651a0db9d'/>
<id>116ac378bb3ff844df333e7609e7604651a0db9d</id>
<content type='text'>
machine_check_early() is taken as an NMI, so nmi_enter() is used
there. machine_check_exception() is no longer taken as an NMI (it's
invoked via irq_work in the case a machine check hits in kernel mode),
so remove the nmi_enter() from that case.

In NMI context, hash faults don't try to refill the hash table, which
can lead to crashes accessing non-pinned kernel pages. System reset
still has this potential problem.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Drop change in show_regs() which breaks Book3E]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200508043408.886394-12-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
machine_check_early() is taken as an NMI, so nmi_enter() is used
there. machine_check_exception() is no longer taken as an NMI (it's
invoked via irq_work in the case a machine check hits in kernel mode),
so remove the nmi_enter() from that case.

In NMI context, hash faults don't try to refill the hash table, which
can lead to crashes accessing non-pinned kernel pages. System reset
still has this potential problem.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Drop change in show_regs() which breaks Book3E]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200508043408.886394-12-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/pseries: Handle UE event for memcpy_mcsafe</title>
<updated>2020-03-27T03:59:35+00:00</updated>
<author>
<name>Ganesh Goudar</name>
<email>ganeshgr@linux.ibm.com</email>
</author>
<published>2020-03-26T18:49:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=efbc4303b255bb80ab1283794b36dd5fe1fb0ec3'/>
<id>efbc4303b255bb80ab1283794b36dd5fe1fb0ec3</id>
<content type='text'>
memcpy_mcsafe has been implemented for power machines which is used
by pmem infrastructure, so that an UE encountered during memcpy from
pmem devices would not result in panic instead a right error code
is returned. The implementation expects machine check handler to ignore
the event and set nip to continue the execution from fixup code.

Appropriate changes are already made to powernv machine check handler,
make similar changes to pseries machine check handler to ignore the
the event and set nip to continue execution at the fixup entry if we
hit UE at an instruction with a fixup entry.

while we are at it, have a common function which searches the exception
table entry and updates nip with fixup address, and any future common
changes can be made in this function that are valid for both architectures.

powernv changes are made by
commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")

Reviewed-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Reviewed-by: Santosh S &lt;santosh@fossix.org&gt;
Signed-off-by: Ganesh Goudar &lt;ganeshgr@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200326184916.31172-1-ganeshgr@linux.ibm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
memcpy_mcsafe has been implemented for power machines which is used
by pmem infrastructure, so that an UE encountered during memcpy from
pmem devices would not result in panic instead a right error code
is returned. The implementation expects machine check handler to ignore
the event and set nip to continue the execution from fixup code.

Appropriate changes are already made to powernv machine check handler,
make similar changes to pseries machine check handler to ignore the
the event and set nip to continue execution at the fixup entry if we
hit UE at an instruction with a fixup entry.

while we are at it, have a common function which searches the exception
table entry and updates nip with fixup address, and any future common
changes can be made in this function that are valid for both architectures.

powernv changes are made by
commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")

Reviewed-by: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
Reviewed-by: Santosh S &lt;santosh@fossix.org&gt;
Signed-off-by: Ganesh Goudar &lt;ganeshgr@linux.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20200326184916.31172-1-ganeshgr@linux.ibm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s/pseries: machine check convert to use common event code</title>
<updated>2019-08-30T00:32:35+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2019-08-02T10:56:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=9ca766f9891d23743b4e1a7b1cafdc63723cd6a7'/>
<id>9ca766f9891d23743b4e1a7b1cafdc63723cd6a7</id>
<content type='text'>
The common machine_check_event data structures and queues are mostly
platform independent, with powernv decoding SRR1/DSISR/etc., into
machine_check_event objects.

This patch converts pseries to use this infrastructure by decoding
fwnmi/rtas data into machine_check_event objects.

This allows queueing to be used by a subsequent change to delay the
virtual mode handling of machine checks that occur in kernel space
where it is unsafe to switch immediately to virtual mode, similarly
to powernv.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Fix implicit fallthrough warnings in mce_handle_error()]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-10-npiggin@gmail.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The common machine_check_event data structures and queues are mostly
platform independent, with powernv decoding SRR1/DSISR/etc., into
machine_check_event objects.

This patch converts pseries to use this infrastructure by decoding
fwnmi/rtas data into machine_check_event objects.

This allows queueing to be used by a subsequent change to delay the
virtual mode handling of machine checks that occur in kernel space
where it is unsafe to switch immediately to virtual mode, similarly
to powernv.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Fix implicit fallthrough warnings in mce_handle_error()]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-10-npiggin@gmail.com

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64s/powernv: machine check dump SLB contents</title>
<updated>2019-08-30T00:32:35+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2019-08-02T10:56:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7290f3b3d3e66b54720f23079ffc60e0b7bbb0cc'/>
<id>7290f3b3d3e66b54720f23079ffc60e0b7bbb0cc</id>
<content type='text'>
Re-use the code introduced in pseries to save and dump the contents
of the SLB in the case of an SLB involved machine check exception.

This patch also avoids allocating the SLB save array on pseries radix.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Re-use the code introduced in pseries to save and dump the contents
of the SLB in the case of an SLB involved machine check exception.

This patch also avoids allocating the SLB save array on pseries radix.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com

</pre>
</div>
</content>
</entry>
</feed>
