From 062022315e8ad9e0628515dfc756ab54b5fdb26b Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 1 May 2020 17:45:41 +0100 Subject: mm/memory-failure: Add memory_failure_queue_kick() The GHES code calls memory_failure_queue() from IRQ context to schedule work on the current CPU so that memory_failure() can sleep. For synchronous memory errors the arch code needs to know any signals that memory_failure() will trigger are pending before it returns to user-space, possibly when exiting from the IRQ. Add a helper to kick the memory failure queue, to ensure the scheduled work has happened. This has to be called from process context, so may have been migrated from the original cpu. Pass the cpu the work was queued on. Change memory_failure_work_func() to permit being called on the 'wrong' cpu. Signed-off-by: James Morse Tested-by: Tyler Baicar Acked-by: Naoya Horiguchi Signed-off-by: Rafael J. Wysocki --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mm.h b/include/linux/mm.h index 5a323422d783..c606dbbfa5e1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3012,6 +3012,7 @@ enum mf_flags { }; extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); +extern void memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern int get_hwpoison_page(struct page *page); #define put_hwpoison_page(page) put_page(page) -- cgit v1.2.3 From 7f17b4a121d0d50eca22cb1edebf0a157f3e43bf Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 1 May 2020 17:45:42 +0100 Subject: ACPI: APEI: Kick the memory_failure() queue for synchronous errors memory_failure() offlines or repairs pages of memory that have been discovered to be corrupt. These may be detected by an external component, (e.g. the memory controller), and notified via an IRQ. In this case the work is queued as not all of memory_failure()s work can happen in IRQ context. If the error was detected as a result of user-space accessing a corrupt memory location the CPU may take an abort instead. On arm64 this is a 'synchronous external abort', and on a firmware first system it is replayed using NOTIFY_SEA. This notification has NMI like properties, (it can interrupt IRQ-masked code), so the memory_failure() work is queued. If we return to user-space before the queued memory_failure() work is processed, we will take the fault again. This loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. For NMIlike notifications keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue. To save memory allocations, the task_work is allocated as part of the ghes_estatus_node, and free()ing it back to the pool is deferred. Signed-off-by: James Morse Tested-by: Tyler Baicar Signed-off-by: Rafael J. Wysocki --- include/acpi/ghes.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index e3f1cddb4ac8..517a5231cc1b 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -33,6 +33,9 @@ struct ghes_estatus_node { struct llist_node llnode; struct acpi_hest_generic *generic; struct ghes *ghes; + + int task_work_cpu; + struct callback_head task_work; }; struct ghes_estatus_cache { -- cgit v1.2.3