linux.git/arch/powerpc/kernel/mce.c, branch v5.10.258

powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.

2024-08-19T03:41:22+00:00

commit 0db880fc865ffb522141ced4bfa66c12ab1fbb70 upstream.

nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel
crash when invoked during real mode interrupt handling (e.g. early HMI/MCE
interrupt handler) if percpu allocation comes from vmalloc area.

Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI()
wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when
percpu allocation is from the embedded first chunk. However with
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu
allocation can come from the vmalloc area.

With kernel command line "percpu_alloc=page" we can force percpu allocation
to come from vmalloc area and can see kernel crash in machine_check_early:

[    1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110
[    1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0
[    1.215719] --- interrupt: 200
[    1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable)
[    1.215722] [c000000fffd731b0] [0000000000000000] 0x0
[    1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8

Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu
first chunk is not embedded.

Reviewed-by: Christophe Leroy 
Tested-by: Shirisha Ganta 
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Michael Ellerman 
Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com
[ Conflicts in arch/powerpc/include/asm/interrupt.h
  because machine_check_early() and machine_check_exception()
  has been refactored. ]
Signed-off-by: Jinjie Ruan 
Signed-off-by: Greg Kroah-Hartman

powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash

2020-10-16T09:13:55+00:00

Use of nmi_enter/exit in real mode handler causes the kernel to panic
and reboot on injecting SLB mutihit on pseries machine running in hash
MMU mode, because these calls try to accesses memory outside RMO
region in real mode handler where translation is disabled.

Add check to not to use these calls on pseries machine running in hash
MMU mode.

Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Ganesh Goudar 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20201009064005.19777-2-ganeshgr@linux.ibm.com

powerpc/64s: Move HMI IRQ stat from percpu variable to paca.

2020-07-29T13:47:53+00:00

With the proposed change in percpu bootmem allocator to use page
mapping [1], the percpu first chunk memory area can come from vmalloc
ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler
crash the kernel whenever percpu variable is accessed in real mode.
This patch fixes this issue by moving the HMI IRQ stat inside paca for
safe access in realmode.

[1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/

Suggested-by: Aneesh Kumar K.V 
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter

powerpc/powernv: Machine check handler for POWER10

2020-07-23T07:43:30+00:00

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200702233343.1128026-1-npiggin@gmail.com

powerpc/mce: Add MCE notification chain

2020-07-20T12:57:56+00:00

Introduce notification chain which lets us know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.

Signed-off-by: Santosh Sivaraj 
Signed-off-by: Ganesh Goudar 
Reviewed-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org

powerpc/64s: machine check do not trace real-mode handler

2020-05-18T14:10:34+00:00

Rather than notrace annotations throughout a significant part of the
machine check code across kernel/ pseries/ and powernv/ which can
easily be broken and is infrequently tested, use paca->ftrace_enabled
to blanket-disable tracing of the real-mode non-maskable handler.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
Reviewed-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
Link: https://lore.kernel.org/r/20200508043408.886394-14-npiggin@gmail.com

powerpc/64s: machine check interrupt update NMI accounting

2020-05-18T14:10:34+00:00

machine_check_early() is taken as an NMI, so nmi_enter() is used
there. machine_check_exception() is no longer taken as an NMI (it's
invoked via irq_work in the case a machine check hits in kernel mode),
so remove the nmi_enter() from that case.

In NMI context, hash faults don't try to refill the hash table, which
can lead to crashes accessing non-pinned kernel pages. System reset
still has this potential problem.

Signed-off-by: Nicholas Piggin 
[mpe: Drop change in show_regs() which breaks Book3E]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200508043408.886394-12-npiggin@gmail.com

powerpc/pseries: Handle UE event for memcpy_mcsafe

2020-03-27T03:59:35+00:00

memcpy_mcsafe has been implemented for power machines which is used
by pmem infrastructure, so that an UE encountered during memcpy from
pmem devices would not result in panic instead a right error code
is returned. The implementation expects machine check handler to ignore
the event and set nip to continue the execution from fixup code.

Appropriate changes are already made to powernv machine check handler,
make similar changes to pseries machine check handler to ignore the
the event and set nip to continue execution at the fixup entry if we
hit UE at an instruction with a fixup entry.

while we are at it, have a common function which searches the exception
table entry and updates nip with fixup address, and any future common
changes can be made in this function that are valid for both architectures.

powernv changes are made by
commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")

Reviewed-by: Mahesh Salgaonkar 
Reviewed-by: Santosh S 
Signed-off-by: Ganesh Goudar 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200326184916.31172-1-ganeshgr@linux.ibm.com

powerpc/64s/pseries: machine check convert to use common event code

2019-08-30T00:32:35+00:00

The common machine_check_event data structures and queues are mostly
platform independent, with powernv decoding SRR1/DSISR/etc., into
machine_check_event objects.

This patch converts pseries to use this infrastructure by decoding
fwnmi/rtas data into machine_check_event objects.

This allows queueing to be used by a subsequent change to delay the
virtual mode handling of machine checks that occur in kernel space
where it is unsafe to switch immediately to virtual mode, similarly
to powernv.

Signed-off-by: Nicholas Piggin 
[mpe: Fix implicit fallthrough warnings in mce_handle_error()]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190802105709.27696-10-npiggin@gmail.com

powerpc/64s/powernv: machine check dump SLB contents

2019-08-30T00:32:35+00:00

Re-use the code introduced in pseries to save and dump the contents
of the SLB in the case of an SLB involved machine check exception.

This patch also avoids allocating the SLB save array on pseries radix.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com