summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-13 10:05:47 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-13 10:05:47 -0700
commit076f14be7fc942e112c94c841baec44124275cd0 (patch)
tree3bc4d01b7732ebc444060f0df84bc10f26da6238 /arch/x86/kernel
parent6c3297841472b4e53e22e53826eea9e483d993e5 (diff)
parent0bf3924bfabd13ba21aa702344fc00b3b3263e5a (diff)
downloadlinux-076f14be7fc942e112c94c841baec44124275cd0.tar.gz
linux-076f14be7fc942e112c94c841baec44124275cd0.tar.bz2
linux-076f14be7fc942e112c94c841baec44124275cd0.zip
Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Thomas Gleixner: "The x86 entry, exception and interrupt code rework This all started about 6 month ago with the attempt to move the Posix CPU timer heavy lifting out of the timer interrupt code and just have lockless quick checks in that code path. Trivial 5 patches. This unearthed an inconsistency in the KVM handling of task work and the review requested to move all of this into generic code so other architectures can share. Valid request and solved with another 25 patches but those unearthed inconsistencies vs. RCU and instrumentation. Digging into this made it obvious that there are quite some inconsistencies vs. instrumentation in general. The int3 text poke handling in particular was completely unprotected and with the batched update of trace events even more likely to expose to endless int3 recursion. In parallel the RCU implications of instrumenting fragile entry code came up in several discussions. The conclusion of the x86 maintainer team was to go all the way and make the protection against any form of instrumentation of fragile and dangerous code pathes enforcable and verifiable by tooling. A first batch of preparatory work hit mainline with commit d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner") That (almost) full solution introduced a new code section '.noinstr.text' into which all code which needs to be protected from instrumentation of all sorts goes into. Any call into instrumentable code out of this section has to be annotated. objtool has support to validate this. Kprobes now excludes this section fully which also prevents BPF from fiddling with it and all 'noinstr' annotated functions also keep ftrace off. The section, kprobes and objtool changes are already merged. The major changes coming with this are: - Preparatory cleanups - Annotating of relevant functions to move them into the noinstr.text section or enforcing inlining by marking them __always_inline so the compiler cannot misplace or instrument them. - Splitting and simplifying the idtentry macro maze so that it is now clearly separated into simple exception entries and the more interesting ones which use interrupt stacks and have the paranoid handling vs. CR3 and GS. - Move quite some of the low level ASM functionality into C code: - enter_from and exit to user space handling. The ASM code now calls into C after doing the really necessary ASM handling and the return path goes back out without bells and whistels in ASM. - exception entry/exit got the equivivalent treatment - move all IRQ tracepoints from ASM to C so they can be placed as appropriate which is especially important for the int3 recursion issue. - Consolidate the declaration and definition of entry points between 32 and 64 bit. They share a common header and macros now. - Remove the extra device interrupt entry maze and just use the regular exception entry code. - All ASM entry points except NMI are now generated from the shared header file and the corresponding macros in the 32 and 64 bit entry ASM. - The C code entry points are consolidated as well with the help of DEFINE_IDTENTRY*() macros. This allows to ensure at one central point that all corresponding entry points share the same semantics. The actual function body for most entry points is in an instrumentable and sane state. There are special macros for the more sensitive entry points, e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF. They allow to put the whole entry instrumentation and RCU handling into safe places instead of the previous pray that it is correct approach. - The INT3 text poke handling is now completely isolated and the recursion issue banned. Aside of the entry rework this required other isolation work, e.g. the ability to force inline bsearch. - Prevent #DB on fragile entry code, entry relevant memory and disable it on NMI, #MC entry, which allowed to get rid of the nested #DB IST stack shifting hackery. - A few other cleanups and enhancements which have been made possible through this and already merged changes, e.g. consolidating and further restricting the IDT code so the IDT table becomes RO after init which removes yet another popular attack vector - About 680 lines of ASM maze are gone. There are a few open issues: - An escape out of the noinstr section in the MCE handler which needs some more thought but under the aspect that MCE is a complete trainwreck by design and the propability to survive it is low, this was not high on the priority list. - Paravirtualization When PV is enabled then objtool complains about a bunch of indirect calls out of the noinstr section. There are a few straight forward ways to fix this, but the other issues vs. general correctness were more pressing than parawitz. - KVM KVM is inconsistent as well. Patches have been posted, but they have not yet been commented on or picked up by the KVM folks. - IDLE Pretty much the same problems can be found in the low level idle code especially the parts where RCU stopped watching. This was beyond the scope of the more obvious and exposable problems and is on the todo list. The lesson learned from this brain melting exercise to morph the evolved code base into something which can be validated and understood is that once again the violation of the most important engineering principle "correctness first" has caused quite a few people to spend valuable time on problems which could have been avoided in the first place. The "features first" tinkering mindset really has to stop. With that I want to say thanks to everyone involved in contributing to this effort. Special thanks go to the following people (alphabetical order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra, Vitaly Kuznetsov, and Will Deacon" * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits) x86/entry: Force rcu_irq_enter() when in idle task x86/entry: Make NMI use IDTENTRY_RAW x86/entry: Treat BUG/WARN as NMI-like entries x86/entry: Unbreak __irqentry_text_start/end magic x86/entry: __always_inline CR2 for noinstr lockdep: __always_inline more for noinstr x86/entry: Re-order #DB handler to avoid *SAN instrumentation x86/entry: __always_inline arch_atomic_* for noinstr x86/entry: __always_inline irqflags for noinstr x86/entry: __always_inline debugreg for noinstr x86/idt: Consolidate idt functionality x86/idt: Cleanup trap_init() x86/idt: Use proper constants for table size x86/idt: Add comments about early #PF handling x86/idt: Mark init only functions __init x86/entry: Rename trace_hardirqs_off_prepare() x86/entry: Clarify irq_{enter,exit}_rcu() x86/entry: Remove DBn stacks x86/entry: Remove debug IDT frobbing x86/entry: Optimize local_db_save() for virt ...
Diffstat (limited to 'arch/x86/kernel')
-rw-r--r--arch/x86/kernel/alternative.c25
-rw-r--r--arch/x86/kernel/apic/apic.c39
-rw-r--r--arch/x86/kernel/apic/msi.c3
-rw-r--r--arch/x86/kernel/apic/vector.c5
-rw-r--r--arch/x86/kernel/asm-offsets_64.c3
-rw-r--r--arch/x86/kernel/cpu/acrn.c9
-rw-r--r--arch/x86/kernel/cpu/common.c19
-rw-r--r--arch/x86/kernel/cpu/mce/amd.c5
-rw-r--r--arch/x86/kernel/cpu/mce/core.c106
-rw-r--r--arch/x86/kernel/cpu/mce/inject.c4
-rw-r--r--arch/x86/kernel/cpu/mce/internal.h2
-rw-r--r--arch/x86/kernel/cpu/mce/p5.c8
-rw-r--r--arch/x86/kernel/cpu/mce/therm_throt.c5
-rw-r--r--arch/x86/kernel/cpu/mce/threshold.c5
-rw-r--r--arch/x86/kernel/cpu/mce/winchip.c8
-rw-r--r--arch/x86/kernel/cpu/mshyperv.c22
-rw-r--r--arch/x86/kernel/doublefault_32.c10
-rw-r--r--arch/x86/kernel/dumpstack_64.c7
-rw-r--r--arch/x86/kernel/ftrace_64.S2
-rw-r--r--arch/x86/kernel/head_64.S7
-rw-r--r--arch/x86/kernel/hw_breakpoint.c100
-rw-r--r--arch/x86/kernel/idt.c226
-rw-r--r--arch/x86/kernel/irq.c66
-rw-r--r--arch/x86/kernel/irq_32.c2
-rw-r--r--arch/x86/kernel/irq_64.c6
-rw-r--r--arch/x86/kernel/irq_work.c6
-rw-r--r--arch/x86/kernel/kprobes/core.c7
-rw-r--r--arch/x86/kernel/kprobes/opt.c4
-rw-r--r--arch/x86/kernel/kvm.c15
-rw-r--r--arch/x86/kernel/nmi.c75
-rw-r--r--arch/x86/kernel/smp.c37
-rw-r--r--arch/x86/kernel/tracepoint.c17
-rw-r--r--arch/x86/kernel/traps.c548
-rw-r--r--arch/x86/kernel/unwind_frame.c8
-rw-r--r--arch/x86/kernel/vmlinux.lds.S1
35 files changed, 797 insertions, 615 deletions
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index a9195ce8265d..8fd39ff74a49 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1011,28 +1011,29 @@ struct bp_patching_desc {
static struct bp_patching_desc *bp_desc;
-static inline struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
+static __always_inline
+struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
{
- struct bp_patching_desc *desc = READ_ONCE(*descp); /* rcu_dereference */
+ struct bp_patching_desc *desc = __READ_ONCE(*descp); /* rcu_dereference */
- if (!desc || !atomic_inc_not_zero(&desc->refs))
+ if (!desc || !arch_atomic_inc_not_zero(&desc->refs))
return NULL;
return desc;
}
-static inline void put_desc(struct bp_patching_desc *desc)
+static __always_inline void put_desc(struct bp_patching_desc *desc)
{
smp_mb__before_atomic();
- atomic_dec(&desc->refs);
+ arch_atomic_dec(&desc->refs);
}
-static inline void *text_poke_addr(struct text_poke_loc *tp)
+static __always_inline void *text_poke_addr(struct text_poke_loc *tp)
{
return _stext + tp->rel_addr;
}
-static int notrace patch_cmp(const void *key, const void *elt)
+static __always_inline int patch_cmp(const void *key, const void *elt)
{
struct text_poke_loc *tp = (struct text_poke_loc *) elt;
@@ -1042,9 +1043,8 @@ static int notrace patch_cmp(const void *key, const void *elt)
return 1;
return 0;
}
-NOKPROBE_SYMBOL(patch_cmp);
-int notrace poke_int3_handler(struct pt_regs *regs)
+int noinstr poke_int3_handler(struct pt_regs *regs)
{
struct bp_patching_desc *desc;
struct text_poke_loc *tp;
@@ -1077,9 +1077,9 @@ int notrace poke_int3_handler(struct pt_regs *regs)
* Skip the binary search if there is a single member in the vector.
*/
if (unlikely(desc->nr_entries > 1)) {
- tp = bsearch(ip, desc->vec, desc->nr_entries,
- sizeof(struct text_poke_loc),
- patch_cmp);
+ tp = __inline_bsearch(ip, desc->vec, desc->nr_entries,
+ sizeof(struct text_poke_loc),
+ patch_cmp);
if (!tp)
goto out_put;
} else {
@@ -1118,7 +1118,6 @@ out_put:
put_desc(desc);
return ret;
}
-NOKPROBE_SYMBOL(poke_int3_handler);
#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
static struct text_poke_loc tp_vec[TP_VEC_MAX];
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index bf4acb0b5365..e0e2f020ec02 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1088,23 +1088,14 @@ static void local_apic_timer_interrupt(void)
* [ if a single-CPU system runs an SMP kernel then we call the local
* interrupt as well. Thus we cannot inline the local irq ... ]
*/
-__visible void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
+DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt)
{
struct pt_regs *old_regs = set_irq_regs(regs);
- /*
- * NOTE! We'd better ACK the irq immediately,
- * because timer handling can be slow.
- *
- * update_process_times() expects us to have done irq_enter().
- * Besides, if we don't timer interrupts ignore the global
- * interrupt lock, which is the WrongThing (tm) to do.
- */
- entering_ack_irq();
+ ack_APIC_irq();
trace_local_timer_entry(LOCAL_TIMER_VECTOR);
local_apic_timer_interrupt();
trace_local_timer_exit(LOCAL_TIMER_VECTOR);
- exiting_irq();
set_irq_regs(old_regs);
}
@@ -2120,15 +2111,21 @@ void __init register_lapic_address(unsigned long address)
* Local APIC interrupts
*/
-/*
- * This interrupt should _never_ happen with our APIC/SMP architecture
+/**
+ * spurious_interrupt - Catch all for interrupts raised on unused vectors
+ * @regs: Pointer to pt_regs on stack
+ * @vector: The vector number
+ *
+ * This is invoked from ASM entry code to catch all interrupts which
+ * trigger on an entry which is routed to the common_spurious idtentry
+ * point.
+ *
+ * Also called from sysvec_spurious_apic_interrupt().
*/
-__visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
+DEFINE_IDTENTRY_IRQ(spurious_interrupt)
{
- u8 vector = ~regs->orig_ax;
u32 v;
- entering_irq();
trace_spurious_apic_entry(vector);
inc_irq_stat(irq_spurious_count);
@@ -2158,13 +2155,17 @@ __visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
}
out:
trace_spurious_apic_exit(vector);
- exiting_irq();
+}
+
+DEFINE_IDTENTRY_SYSVEC(sysvec_spurious_apic_interrupt)
+{
+ __spurious_interrupt(regs, SPURIOUS_APIC_VECTOR);
}
/*
* This interrupt should never happen with our APIC/SMP architecture
*/
-__visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
+DEFINE_IDTENTRY_SYSVEC(sysvec_error_interrupt)
{
static const char * const error_interrupt_reason[] = {
"Send CS error", /* APIC Error Bit 0 */
@@ -2178,7 +2179,6 @@ __visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
};
u32 v, i = 0;
- entering_irq();
trace_error_apic_entry(ERROR_APIC_VECTOR);
/* First tickle the hardware, only then report what went on. -- REW */
@@ -2202,7 +2202,6 @@ __visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
apic_printk(APIC_DEBUG, KERN_CONT "\n");
trace_error_apic_exit(ERROR_APIC_VECTOR);
- exiting_irq();
}
/**
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 159bd0cb8548..5cbaca58af95 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -115,7 +115,8 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
* denote it as spurious which is no harm as this is a rare event
* and interrupt handlers have to cope with spurious interrupts
* anyway. If the vector is unused, then it is marked so it won't
- * trigger the 'No irq handler for vector' warning in do_IRQ().
+ * trigger the 'No irq handler for vector' warning in
+ * common_interrupt().
*
* This requires to hold vector lock to prevent concurrent updates to
* the affected vector.
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 67768e54438b..c48be6e1f676 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -861,13 +861,13 @@ static void free_moved_vector(struct apic_chip_data *apicd)
apicd->move_in_progress = 0;
}
-asmlinkage __visible void __irq_entry smp_irq_move_cleanup_interrupt(void)
+DEFINE_IDTENTRY_SYSVEC(sysvec_irq_move_cleanup)
{
struct hlist_head *clhead = this_cpu_ptr(&cleanup_list);
struct apic_chip_data *apicd;
struct hlist_node *tmp;
- entering_ack_irq();
+ ack_APIC_irq();
/* Prevent vectors vanishing under us */
raw_spin_lock(&vector_lock);
@@ -892,7 +892,6 @@ asmlinkage __visible void __irq_entry smp_irq_move_cleanup_interrupt(void)
}
raw_spin_unlock(&vector_lock);
- exiting_irq();
}
static void __send_cleanup_vector(struct apic_chip_data *apicd)
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index c2a47016f243..828be792231e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -57,9 +57,6 @@ int main(void)
BLANK();
#undef ENTRY
- OFFSET(TSS_ist, tss_struct, x86_tss.ist);
- DEFINE(DB_STACK_OFFSET, offsetof(struct cea_exception_stacks, DB_stack) -
- offsetof(struct cea_exception_stacks, DB1_stack));
BLANK();
#ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 676022e71791..1da9b1c9a2db 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -10,10 +10,10 @@
*/
#include <linux/interrupt.h>
-#include <asm/acrn.h>
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/hypervisor.h>
+#include <asm/idtentry.h>
#include <asm/irq_regs.h>
static uint32_t __init acrn_detect(void)
@@ -24,7 +24,7 @@ static uint32_t __init acrn_detect(void)
static void __init acrn_init_platform(void)
{
/* Setup the IDT for ACRN hypervisor callback */
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, acrn_hv_callback_vector);
+ alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
}
static bool acrn_x2apic_available(void)
@@ -39,7 +39,7 @@ static bool acrn_x2apic_available(void)
static void (*acrn_intr_handler)(void);
-__visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
+DEFINE_IDTENTRY_SYSVEC(sysvec_acrn_hv_callback)
{
struct pt_regs *old_regs = set_irq_regs(regs);
@@ -50,13 +50,12 @@ __visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
* will block the interrupt whose vector is lower than
* HYPERVISOR_CALLBACK_VECTOR.
*/
- entering_ack_irq();
+ ack_APIC_irq();
inc_irq_stat(irq_hv_callback_count);
if (acrn_intr_handler)
acrn_intr_handler();
- exiting_irq();
set_irq_regs(old_regs);
}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8be042df12c3..043d93cdcaad 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1706,25 +1706,6 @@ void syscall_init(void)
X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
}
-DEFINE_PER_CPU(int, debug_stack_usage);
-DEFINE_PER_CPU(u32, debug_idt_ctr);
-
-void debug_stack_set_zero(void)
-{
- this_cpu_inc(debug_idt_ctr);
- load_current_idt();
-}
-NOKPROBE_SYMBOL(debug_stack_set_zero);
-
-void debug_stack_reset(void)
-{
- if (WARN_ON(!this_cpu_read(debug_idt_ctr)))
- return;
- if (this_cpu_dec_return(debug_idt_ctr) == 0)
- load_current_idt();
-}
-NOKPROBE_SYMBOL(debug_stack_reset);
-
#else /* CONFIG_X86_64 */
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 52de616a8065..a906d68a18a2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -907,14 +907,13 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
mce_log(&m);
}
-asmlinkage __visible void __irq_entry smp_deferred_error_interrupt(struct pt_regs *regs)
+DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
{
- entering_irq();
trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR);
inc_irq_stat(irq_deferred_error_count);
deferred_error_int_vector();
trace_deferred_error_apic_exit(DEFERRED_ERROR_VECTOR);
- exiting_ack_irq();
+ ack_APIC_irq();
}
/*
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e9265e2f28c9..b9cb381b4019 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -130,7 +130,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
/* Do initial initialization of a struct mce */
-void mce_setup(struct mce *m)
+noinstr void mce_setup(struct mce *m)
{
memset(m, 0, sizeof(struct mce));
m->cpu = m->extcpu = smp_processor_id();
@@ -140,12 +140,12 @@ void mce_setup(struct mce *m)
m->cpuid = cpuid_eax(1);
m->socketid = cpu_data(m->extcpu).phys_proc_id;
m->apicid = cpu_data(m->extcpu).initial_apicid;
- rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap);
+ m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
if (this_cpu_has(X86_FEATURE_INTEL_PPIN))
- rdmsrl(MSR_PPIN, m->ppin);
+ m->ppin = __rdmsr(MSR_PPIN);
else if (this_cpu_has(X86_FEATURE_AMD_PPIN))
- rdmsrl(MSR_AMD_PPIN, m->ppin);
+ m->ppin = __rdmsr(MSR_AMD_PPIN);
m->microcode = boot_cpu_data.microcode;
}
@@ -1100,13 +1100,15 @@ static void mce_clear_state(unsigned long *toclear)
* kdump kernel establishing a new #MC handler where a broadcasted MCE
* might not get handled properly.
*/
-static bool __mc_check_crashing_cpu(int cpu)
+static noinstr bool mce_check_crashing_cpu(void)
{
+ unsigned int cpu = smp_processor_id();
+
if (cpu_is_offline(cpu) ||
(crashing_cpu != -1 && crashing_cpu != cpu)) {
u64 mcgstatus;
- mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
+ mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
if (mcgstatus & MCG_STATUS_LMCES)
@@ -1114,7 +1116,7 @@ static bool __mc_check_crashing_cpu(int cpu)
}
if (mcgstatus & MCG_STATUS_RIPV) {
- mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+ __wrmsr(MSR_IA32_MCG_STATUS, 0, 0);
return true;
}
}
@@ -1230,12 +1232,11 @@ static void kill_me_maybe(struct callback_head *cb)
* backing the user stack, tracing that reads the user stack will cause
* potentially infinite recursion.
*/
-void noinstr do_machine_check(struct pt_regs *regs, long error_code)
+void noinstr do_machine_check(struct pt_regs *regs)
{
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
struct mca_config *cfg = &mca_cfg;
- int cpu = smp_processor_id();
struct mce m, *final;
char *msg = NULL;
int worst = 0;
@@ -1264,11 +1265,6 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
*/
int lmce = 1;
- if (__mc_check_crashing_cpu(cpu))
- return;
-
- nmi_enter();
-
this_cpu_inc(mce_exception_count);
mce_gather_info(&m, regs);
@@ -1356,7 +1352,7 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
sync_core();
if (worst != MCE_AR_SEVERITY && !kill_it)
- goto out_ist;
+ return;
/* Fault was in user mode and we need to take some action */
if ((m.cs & 3) == 3) {
@@ -1370,12 +1366,9 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
current->mce_kill_me.func = kill_me_now;
task_work_add(current, &current->mce_kill_me, true);
} else {
- if (!fixup_exception(regs, X86_TRAP_MC, error_code, 0))
+ if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
mce_panic("Failed kernel mode recovery", &m, msg);
}
-
-out_ist:
- nmi_exit();
}
EXPORT_SYMBOL_GPL(do_machine_check);
@@ -1902,21 +1895,84 @@ bool filter_mce(struct mce *m)
}
/* Handle unconfigured int18 (should never happen) */
-static void unexpected_machine_check(struct pt_regs *regs, long error_code)
+static noinstr void unexpected_machine_check(struct pt_regs *regs)
{
+ instrumentation_begin();
pr_err("CPU#%d: Unexpected int18 (Machine Check)\n",
smp_processor_id());
+ instrumentation_end();
}
/* Call the installed machine check handler for this CPU setup. */
-void (*machine_check_vector)(struct pt_regs *, long error_code) =
- unexpected_machine_check;
+void (*machine_check_vector)(struct pt_regs *) = unexpected_machine_check;
-dotraplinkage notrace void do_mce(struct pt_regs *regs, long error_code)
+static __always_inline void exc_machine_check_kernel(struct pt_regs *regs)
{
- machine_check_vector(regs, error_code);
+ /*
+ * Only required when from kernel mode. See
+ * mce_check_crashing_cpu() for details.
+ */
+ if (machine_check_vector == do_machine_check &&
+ mce_check_crashing_cpu())
+ return;
+
+ nmi_enter();
+ /*
+ * The call targets are marked noinstr, but objtool can't figure
+ * that out because it's an indirect call. Annotate it.
+ */
+ instrumentation_begin();
+ trace_hardirqs_off_finish();
+ machine_check_vector(regs);
+ if (regs->flags & X86_EFLAGS_IF)
+ trace_hardirqs_on_prepare();
+ instrumentation_end();
+ nmi_exit();
}
-NOKPROBE_SYMBOL(do_mce);
+
+static __always_inline void exc_machine_check_user(struct pt_regs *regs)
+{
+ idtentry_enter_user(regs);
+ instrumentation_begin();
+ machine_check_vector(regs);
+ instrumentation_end();
+ idtentry_exit_user(regs);
+}
+
+#ifdef CONFIG_X86_64
+/* MCE hit kernel mode */
+DEFINE_IDTENTRY_MCE(exc_machine_check)
+{
+ unsigned long dr7;
+
+ dr7 = local_db_save();
+ exc_machine_check_kernel(regs);
+ local_db_restore(dr7);
+}
+
+/* The user mode variant. */
+DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
+{
+ unsigned long dr7;
+
+ dr7 = local_db_save();
+ exc_machine_check_user(regs);
+ local_db_restore(dr7);
+}
+#else
+/* 32bit unified entry point */
+DEFINE_IDTENTRY_MCE(exc_machine_check)
+{
+ unsigned long dr7;
+
+ dr7 = local_db_save();
+ if (user_mode(regs))
+ exc_machine_check_user(regs);
+ else
+ exc_machine_check_kernel(regs);
+ local_db_restore(dr7);
+}
+#endif
/*
* Called for each booted CPU to set up machine checks.
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 3413b41b8d55..0593b192eb8f 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -146,9 +146,9 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
regs.cs = m->cs;
pregs = &regs;
}
- /* in mcheck exeception handler, irq will be disabled */
+ /* do_machine_check() expects interrupts disabled -- at least */
local_irq_save(flags);
- do_machine_check(pregs, 0);
+ do_machine_check(pregs);
local_irq_restore(flags);
m->finished = 0;
}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 3b008172ad73..b74ca4a28c66 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -9,7 +9,7 @@
#include <asm/mce.h>
/* Pointer to the installed machine check handler for this CPU setup. */
-extern void (*machine_check_vector)(struct pt_regs *, long error_code);
+extern void (*machine_check_vector)(struct pt_regs *);
enum severity_level {
MCE_NO_SEVERITY,
diff --git a/arch/x86/kernel/cpu/mce/p5.c b/arch/x86/kernel/cpu/mce/p5.c
index 5ee94aa1b766..19e90cae8e97 100644
--- a/arch/x86/kernel/cpu/mce/p5.c
+++ b/arch/x86/kernel/cpu/mce/p5.c
@@ -21,12 +21,11 @@
int mce_p5_enabled __read_mostly;
/* Machine check handler for Pentium class Intel CPUs: */
-static void pentium_machine_check(struct pt_regs *regs, long error_code)
+static noinstr void pentium_machine_check(struct pt_regs *regs)
{
u32 loaddr, hi, lotype;
- nmi_enter();
-
+ instrumentation_begin();
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
@@ -39,8 +38,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
}
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
-
- nmi_exit();
+ instrumentation_end();
}
/* Set up machine check reporting for processors with Intel style MCE: */
diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index f36dc0742085..a7cd2d203ced 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -614,14 +614,13 @@ static void unexpected_thermal_interrupt(void)
static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
-asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *regs)
+DEFINE_IDTENTRY_SYSVEC(sysvec_thermal)
{
- entering_irq();
trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
inc_irq_stat(irq_thermal_count);
smp_thermal_vector();
trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
- exiting_ack_irq();
+ ack_APIC_irq();
}
/* Thermal monitoring depends on APIC, ACPI and clock modulation */
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index 28812cc15300..6a059a035021 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -21,12 +21,11 @@ static void default_threshold_interrupt(void)
void (*mce_threshold_vector)(void) = default_threshold_interrupt;
-asmlinkage __visible void __irq_entry smp_threshold_interrupt(struct pt_regs *regs)
+DEFINE_IDTENTRY_SYSVEC(sysvec_threshold)
{
- entering_irq();
trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
inc_irq_stat(irq_threshold_count);
mce_threshold_v