| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit f15838e9cac8f78f0cc506529bb9d3b9fa589c1f upstream.
Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But
dedotify modifies the symbol names in place, which can also modify
unrelated symbols with a name that matches a suffix of a dotted name. To
remove the leading dot of a symbol name we can just increment the pointer
into the STRTAB section instead.
Backport to all stables to avoid breakage when people update their
binutils - mpe.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 7e56f627768da4e6480986b5145dc3422bc448a5 upstream.
In eeh_pe_loc_get(), the PE location code is retrieved from the
"ibm,loc-code" property of the device node for the bridge of the
PE's primary bus. It's not correct because the property indicates
the parent PE's location code.
This reads the correct PE location code from "ibm,io-base-loc-code"
or "ibm,slot-location-code" property of PE parent bus's device node.
Fixes: 357b2f3dd9b7 ("powerpc/eeh: Dump PE location code")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit a61674bdfc7c2bf909c4010699607b62b69b7bec upstream.
GCC 6 will include changes to generated code with -mcmodel=large,
which is used to build kernel modules on powerpc64le. This was
necessary because the large model is supposed to allow arbitrary
sizes and locations of the code and data sections, but the ELFv2
global entry point prolog still made the unconditional assumption
that the TOC associated with any particular function can be found
within 2 GB of the function entry point:
func:
addis r2,r12,(.TOC.-func)@ha
addi r2,r2,(.TOC.-func)@l
.localentry func, .-func
To remove this assumption, GCC will now generate instead this global
entry point prolog sequence when using -mcmodel=large:
.quad .TOC.-func
func:
.reloc ., R_PPC64_ENTRY
ld r2, -8(r12)
add r2, r2, r12
.localentry func, .-func
The new .reloc triggers an optimization in the linker that will
replace this new prolog with the original code (see above) if the
linker determines that the distance between .TOC. and func is in
range after all.
Since this new relocation is now present in module object files,
the kernel module loader is required to handle them too. This
patch adds support for the new relocation and implements the
same optimization done by the GNU linker.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 7f821fc9c77a9b01fe7b1d6e72717b33d8d64142 upstream.
Currently we can hit a scenario where we'll tm_reclaim() twice. This
results in a TM bad thing exception because the second reclaim occurs
when not in suspend mode.
The scenario in which this can happen is the following. We attempt to
deliver a signal to userspace. To do this we need obtain the stack
pointer to write the signal context. To get this stack pointer we
must tm_reclaim() in case we need to use the checkpointed stack
pointer (see get_tm_stackpointer()). Normally we'd then return
directly to userspace to deliver the signal without going through
__switch_to().
Unfortunatley, if at this point we get an error (such as a bad
userspace stack pointer), we need to exit the process. The exit will
result in a __switch_to(). __switch_to() will attempt to save the
process state which results in another tm_reclaim(). This
tm_reclaim() now causes a TM Bad Thing exception as this state has
already been saved and the processor is no longer in TM suspend mode.
Whee!
This patch checks the state of the MSR to ensure we are TM suspended
before we attempt the tm_reclaim(). If we've already saved the state
away, we should no longer be in TM suspend mode. This has the
additional advantage of checking for a potential TM Bad Thing
exception.
Found using syscall fuzzer.
Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit d2b9d2a5ad5ef04ff978c9923d19730cb05efd55 upstream.
Currently we allow both the MSR T and S bits to be set by userspace on
a signal return. Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).
This patch checks for this case in both the 32 and 64 bit signals
code. If both T and S are set, we mark the context as invalid.
Found using a syscall fuzzer.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 8832317f662c06f5c06e638f57bfe89a71c9b266 upstream.
Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
MSR: 1000000000081000 <HV,ME> CR: 00000000 XER: 00000000
CFAR: c000000000009c0c SOFTE: 0
NIP [0000000000000000] (null)
LR [0000000000009c14] 0x9c14
Call Trace:
[c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
[c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
[c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream.
The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
Call Trace:
[c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
[c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
[c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
[c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
[c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
[c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
[c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
[c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
[c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
[c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
[c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
[c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
[c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.
The EPOW sensor is defined to be "fast" in sPAPR - mpe.
Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream.
This function can leak kernel stack data when the user siginfo_t has a
positive si_code value. The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.
copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.
This fixes the following information leaks:
x86: 8 bytes leaked when sending a signal from a 32-bit process to
itself. This leak grows to 16 bytes if the process uses x32.
(si_code = __SI_CHLD)
x86: 100 bytes leaked when sending a signal from a 32-bit process to
a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
64-bit process. (si_code = any)
parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process. These bugs are also fixed for consistency.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 5e95235ccd5442d4a4fe11ec4eb99ba1b7959368 upstream.
Recent toolchains force the TOC to be 256 byte aligned. We need
to enforce this alignment in our linker script, otherwise pointers
to our TOC variables (__toc_start, __prom_init_toc_start) could
be incorrect.
If they are bad, we die a few hundred instructions into boot.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit f7e9e358362557c3aa2c1ec47490f29fe880a09e upstream.
This problem appears to have been introduced in 2.6.29 by commit
93197a36a9c1 "Rewrite sysfs processor cache info code".
This caused lscpu to error out on at least e500v2 devices, eg:
error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory
Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.
Fixes: 93197a36a9c1 ("powerpc: Rewrite sysfs processor cache info code")
Signed-off-by: Dave Olson <olson@cumulusnetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 7f8998c7aef3ac9c5f3f2943e083dfa6302e90d0 upstream.
The different architectures used their own (and different) declarations:
extern __visible const void __nosave_begin, __nosave_end;
extern const void __nosave_begin, __nosave_end;
extern long __nosave_begin, __nosave_end;
Consolidate them using the first variant in <asm/sections.h>.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 44d5f6f5901e996744858c175baee320ccf1eda3 upstream.
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.
This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.
Fixes: 2ba9f0d88750 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 4ad04e5987115ece5fa8a0cf1dc72fcd4707e33e upstream.
After d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier"), the
refcnt on the kobject backing the IOMMU group for a PCI device is
elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
set_iommu_table_base_and_group). When we go to dlpar a multi-function
PCI device out:
iommu_reconfig_notifier ->
iommu_free_table ->
iommu_group_put
BUG_ON(tbl->it_group)
We trip this BUG_ON, because there are still references on the table, so
it is not freed. Fix this by moving the powernv bus notifier to common
code and calling it for both powernv and pseries.
Fixes: d905c5df9aef ("PPC: POWERNV: move iommu_add_device earlier")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 875ebe940d77a41682c367ad799b4f39f128d3fa upstream.
Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:
BUG_ON(td->cpu != smp_processor_id());
Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:
CPU: 0
Comm: watchdog/130
The problem is that we aren't ensuring the CPU active bit is set for the
secondary before allowing the master to continue on. The master unparks
the secondary CPU's kthreads and the scheduler looks for a CPU to run
on. It calls select_task_rq() and realises the suggested CPU is not in
the cpus_allowed mask. It then ends up in select_fallback_rq(), and
since the active bit isnt't set we choose some other CPU to run on.
This seems to have been introduced by 6acbfb96976f "sched: Fix hotplug
vs. set_cpus_allowed_ptr()", which changed from setting active before
online to setting active after online. However that was in turn fixing a
bug where other code assumed an active CPU was also online, so we can't
just revert that fix.
The simplest fix is just to spin waiting for both active & online to be
set. We already have a barrier prior to set_cpu_online() (which also
sets active), to ensure all other setup is completed before online &
active are set.
Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 6f20e7f2e930211613a66d0603fa4abaaf3ce662 upstream.
When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.
This hasn't been observed to cause any trouble in practice, but is
obviously fishy.
Fixes: 6f4441ef7009 ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 63f13448d81c910a284b096149411a719cbed501 upstream.
Since both ppc and ppc64 have LE variants which are now reported by uname, add
that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE
variant.
Without this, perf trace and auditctl fail.
Mainline kernel reports ppc64le (per a058801) but there is no matching
AUDIT_ARCH_PPC64LE.
Since 32-bit PPC LE is not supported by audit, don't advertise it in
AUDIT_ARCH_PPC* variants.
See:
https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html
https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.16: arch is passed in by do_syscall_trace_enter()
rather than queried by calling syscall_get_arch()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit cd32e2dcc9de6c27ecbbfc0e2079fb64b42bad5f upstream.
We have some code in udbg_uart_getc_poll() that tries to protect
against a NULL udbg_uart_in, but gets it all wrong.
Found with the LLVM static analyzer (scan-build).
Fixes: 309257484cc1 ("powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Add some newlines for readability while we're here]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 8117ac6a6c2fa0f847ff6a21a1f32c8d2c8501d0 upstream.
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.
This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 682e77c861c4c60f79ffbeae5e1938ffed24a575 upstream.
The existing MCE code calls flush_tlb hook with IS=0 (single page) resulting
in partial invalidation of TLBs which is not right. This patch fixes
that by passing IS=0xc00 to invalidate whole TLB for successful recovery
from TLB and ERAT errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 152d44a853e42952f6c8a504fb1f8eefd21fd5fd upstream.
I used some 64 bit instructions when adding the 32 bit getcpu VDSO
function. Fix it.
Fixes: 18ad51dd342a ("powerpc: Add VDSO version of getcpu")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 808be31426af57af22268ef0fcb42617beb3d15b upstream.
Back in 7230c5644188 ("powerpc: Rework lazy-interrupt handling") we
added a call out to restore_interrupts() (written in c) before calling
do_notify_resume:
bl restore_interrupts
addi r3,r1,STACK_FRAME_OVERHEAD
bl do_notify_resume
Unfortunately do_notify_resume takes two arguments, the second one
being the thread_info flags:
void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
We do populate r4 (the second argument) earlier, but
restore_interrupts() is free to muck it up all it wants. My guess is
the gcc compiler gods shone down on us and its register allocator
never used r4. Sometimes, rarely, luck is on our side.
LLVM on the other hand did trample r4.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 22fca17924094113fe79c1db5135290e1a84ad4b upstream.
The problem was reported by Carol: In the scenario of passing mlx4
adapter to guest, EEH error could be recovered successfully. When
returning the device back to host, the driver (mlx4_core.ko)
couldn't be loaded successfully because of error number -5 (-EIO)
returned from mlx4_get_ownership(), which hits offlined PCI device.
The root cause is that we missed to put the affected devices into
normal state on clearing PE isolated state right after PE reset.
The patch fixes above issue by putting the affected devices to
normal state when clearing PE isolated state in eeh_pe_state_clear().
Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 763fe0addb8fe15ccea67c0aebddc06f4bb25439 upstream.
When we take full hotplug to recover from EEH errors, PCI buses
could be involved. For the case, the child devices of involved
PCI buses can't be attached to IOMMU group properly, which is
caused by commit 3f28c5a ("powerpc/powernv: Reduce multi-hit of
iommu_add_device()").
When adding the PCI devices of the newly created PCI buses to
the system, the IOMMU group is expected to be added in (C).
(A) fails to bind the IOMMU group because bus->is_added is
false. (B) fails because the device doesn't have binding IOMMU
table yet. bus->is_added is set to true at end of (C) and
pdev->is_added is set to true at (D).
pcibios_add_pci_devices()
pci_scan_bridge()
pci_scan_child_bus()
pci_scan_slot()
pci_scan_single_device()
pci_scan_device()
pci_device_add()
pcibios_add_device() A: Ignore
device_add() B: Ignore
pcibios_fixup_bus()
pcibios_setup_bus_devices()
pcibios_setup_device() C: Hit
pcibios_finish_adding_to_bus()
pci_bus_add_devices()
pci_bus_add_device() D: Add device
If the parent PCI bus isn't involved in hotplug, the IOMMU
group is expected to be bound in (B). (A) should fail as the
sysfs entries aren't populated.
The patch fixes the issue by reverting commit 3f28c5a and remove
WARN_ON() in iommu_add_device() to allow calling the function
even the specified device already has associated IOMMU group.
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9e5c6e5a3be0b2e17ff61b9b74adef4a2c9e6934 upstream.
pci_get_slot() is called with hold of PCI bus semaphore and it's not
safe to be called in interrupt context. However, we possibly checks
EEH error and calls the function in interrupt context. To avoid using
pci_get_slot(), we turn into device tree for fetching location code.
Otherwise, we might run into WARN_ON() as following messages indicate:
WARNING: at drivers/pci/search.c:223
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72
task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000
NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114
REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 48002422 XER: 20000000
CFAR: c00000000003752c SOFTE: 0
:
NIP [c000000000497b70] .pci_get_slot+0x40/0x110
LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190
Call Trace:
.of_get_property+0x30/0x60 (unreliable)
.eeh_pe_loc_get+0x150/0x190
.eeh_dev_check_failure+0x1b4/0x550
.eeh_check_failure+0x90/0xf0
.lpfc_sli_check_eratt+0x504/0x7c0 [lpfc]
.lpfc_poll_eratt+0x64/0x100 [lpfc]
.call_timer_fn+0x64/0x190
.run_timer_softirq+0x2cc/0x3e0
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 more small powerpc fixes that should still go into .16.
One is a recent regression (MMCR2 business), the other is a trivial
endian fix without which FW updates won't work on LE in IBM machines,
and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
LOT more friendly especially when the whole thing is about retrieving
error logs ..."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix endianness of flash_block_list in rtas_flash
powerpc/powernv: Change BUG_ON to WARN_ON in elog code
powerpc/perf: Fix MMCR2 handling for EBB
|
|
The function rtas_flash_firmware passes the address of a data structure,
flash_block_list, when making the update-flash-64-and-reboot rtas call.
While the endianness of the address is handled correctly, the endianness
of the data is not. This patch ensures that the data in flash_block_list
is big endian when passed to rtas on little endian hosts.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here is a handful of powerpc fixes for 3.16. They are all pretty
simple and self contained and should still make this release"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: use _GLOBAL_TOC for memmove
powerpc/pseries: dynamically added OF nodes need to call of_node_init
powerpc: subpage_protect: Increase the array size to take care of 64TB
powerpc: Fix bugs in emulate_step()
powerpc: Disable doorbells on Power8 DD1.x
|
|
These processors do not currently support doorbell IPIs, so remove them
from the feature list if we are at DD 1.xx for the 0x004d part.
This fixes a regression caused by d4e58e5928f8 (powerpc/powernv: Enable
POWER8 doorbell IPIs). With that patch the kernel would hang at boot
when calling smp_call_function_many, as the doorbell would not be
received by the target CPUs:
.smp_call_function_many+0x2bc/0x3c0 (unreliable)
.on_each_cpu_mask+0x30/0x100
.cpuidle_register_driver+0x158/0x1a0
.cpuidle_register+0x2c/0x110
.powernv_processor_idle_init+0x23c/0x2c0
.do_one_initcall+0xd4/0x260
.kernel_init_freeable+0x25c/0x33c
.kernel_init+0x1c/0x120
.ret_from_kernel_thread+0x58/0x7c
Fixes: d4e58e5928f8 (powerpc/powernv: Enable POWER8 doorbell IPIs)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A cpufreq lockup fix and a compiler warning fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix compiler warnings
x86, tsc: Fix cpufreq lockup
|
|
Commit 8d6f7c5a: "powerpc/powernv: Make it possible to skip the IRQHAPPENED
check in power7_nap()" added code that prevents cpus from checking for
pending interrupts just before entering sleep state, which is wrong. These
interrupts are delivered during the soft irq disabled state of the cpu.
A cpu cannot enter any idle state with pending interrupts because they will
never be serviced until the next time the cpu is woken up by some other
interrupt. Its only then that the pending interrupts are replayed. This can result
in device timeouts or warnings about this cpu being stuck.
This patch fixes ths issue by ensuring that cpus check for pending interrupts
just before entering any idle state as long as they are not in the path of split
core operations.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Commit 143e1e28cb (sched: Rework sched_domain topology definition)
introduced a number of functions with a return value of 'const int'.
gcc doesn't know what to do with that and, if the kernel is compiled
with W=1, complains with the following warnings whenever sched.h
is included.
include/linux/sched.h:875:25: warning: type qualifiers ignored on function return type
include/linux/sched.h:882:25: warning: type qualifiers ignored on function return type
include/linux/sched.h:889:25: warning: type qualifiers ignored on function return type
include/linux/sched.h:1002:21: warning: type qualifiers ignored on function return type
Commits fb2aa855 (sched, ARM: Create a dedicated scheduler topology table)
and 607b45e9a (sched, powerpc: Create a dedicated topology table) introduce
the same warning in the arm and powerpc code.
Drop 'const' from the function declarations to fix the problem.
The fix for all three patches has to be applied together to avoid
compilation failures for the affected architectures.
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403658329-13196-1-git-send-email-linux@roeck-us.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 59a53afe70fd530040bdc69581f03d880157f15a "powerpc: Don't setup
CPUs with bad status" broke ePAPR SMP booting. ePAPR says that CPUs
that aren't presently running shall have status of disabled, with
enable-method being used to determine whether the CPU can be enabled.
Fix by checking for spin-table, which is currently the only supported
enable-method.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Emil Medve <Emilian.Medve@Freescale.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The commit 71ec7c55ed91 introduced the magic symbol ".TOC." for ELFv2 ABI.
This symbol is built manually and has no CRC value computed. A zero value
is put in the CRC section to avoid modpost complaining about a missing CRC.
Unfortunately, this breaks the kernel module loading when the kernel is
relocated (kdump case for instance) because of the relocation applied to
the kcrctab values.
This patch compute a CRC value for the TOC symbol which will match the one
compute by the kernel when it is relocated - aka '0 - relocate_start' done in
maybe_relocated called by check_version (module.c).
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In commit 27f4488872d9 "Add OPAL takeover from PowerVM" we added support
for "takeover" on OPAL v1 machines.
This was a mode of operation where we would boot under pHyp, and query
for the presence of OPAL. If detected we would then do a special
sequence to take over the machine, and the kernel would end up running
in hypervisor mode.
OPAL v1 was never a supported product, and was never shipped outside
IBM. As far as we know no one is still using it.
Newer versions of OPAL do not use the takeover mechanism. Although the
query for OPAL should be harmless on machines with newer OPAL, we have
seen a machine where it causes a crash in Open Firmware.
The code in early_init_devtree() to copy boot_command_line into cmd_line
was added in commit 817c21ad9a1f "Get kernel command line accross OPAL
takeover", and AFAIK is only used by takeover, so should also be
removed.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In commit 721aeaa9 "Build little endian ppc64 kernel with ABIv2", we
missed some updates required in the kprobes code to make jprobes work
when the kernel is built with ABI v2.
Firstly update arch_deref_entry_point() to do the right thing. Now that
we have added ppc_global_function_entry() we can just always use that, it
will do the right thing for 32 & 64 bit and ABI v1 & v2.
Secondly we need to update the code that sets up the register state before
calling the jprobe handler. On ABI v1 we setup r2 to hold the TOC, on ABI
v2 we need to populate r12 with the function entry point address.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The printks() in our ftrace code have no prefix, so they appear on the
console with very little context, eg:
Branch out of range
Use pr_fmt() & pr_err() to add a prefix. While we're at it, collapse a
few split lines that don't need to be, and add a missing newline to one
message.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
There is a bug in the handling of the function entry when we are nopping
out a branch from a module in ftrace.
We compare the result of module_trampoline_target() with the value of
ppc_function_entry(), and expect them to be true. But they never will
be.
module_trampoline_target() will always return the global entry point of
the function, whereas ppc_function_entry() will always return the local.
Fix it by using the newly added ppc_global_function_entry().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton
changed the logic that creates and patches the branch, and added a
thinko in the check of create_branch(). create_branch() returns the
instruction that was generated, so if we get zero then it succeeded.
The result is we can't ftrace modules:
Branch out of range
WARNING: at ../kernel/trace/ftrace.c:1638
ftrace failed to modify [<d000000004ba001c>] fuse_req_init_context+0x1c/0x90 [fuse]
We should probably fix patch_instruction() to do that check and make the
API saner, but that's a separate patch. For now just invert the test.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton
changed the logic that checks for the expected code sequence when
patching a module.
We missed the typo in the mask, 0xffff00000 should be 0xffff0000, which
has the effect of making the test always true.
That makes it impossible to ftrace against modules, eg:
Unexpected call sequence: 48000008 e8410018
WARNING: at ../kernel/trace/ftrace.c:1638
ftrace failed to modify [<d000000007cf001c>] rng_dev_open+0x1c/0x70 [rng_core]
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We have some compile-time disabled debug code in signal_xx.c. It's from
some ancient time BG, almost certainly part of the original port, given
the very similar code on other arches.
The show_unhandled_signal logic, added in d0c3d534a438 (2.6.24) is
cleaner and prints more useful information, so drop the debug code.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In arch/powerpc/kernel/iomap.c, lots of IO reading accessors missed
to check EEH error as Ben pointed. The patch fixes it.
For the writing accessors, we change the called functions only for
making them look similar to the reading counterparts.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull more powerpc updates from Ben Herrenschmidt:
"Here are the remaining bits I was mentioning earlier. Mostly bug
fixes and new selftests from Michael (yay !). He also removed the WSP
platform and A2 core support which were dead before release, so less
clutter.
One little "feature" I snuck in is the doorbell IPI support for
non-virtualized P8 which speeds up IPIs significantly between threads
of a core"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
powerpc/book3s: Fix some ABIv2 issues in machine check code
powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest.
powerpc/book3s: Increment the mce counter during machine_check_early call.
powerpc/book3s: Add stack overflow check in machine check handler.
powerpc/book3s: Fix machine check handling for unhandled errors
powerpc/eeh: Dump PE location code
powerpc/powernv: Enable POWER8 doorbell IPIs
powerpc/cpuidle: Only clear LPCR decrementer wakeup bit on fast sleep entry
powerpc/powernv: Fix killed EEH event
powerpc: fix typo 'CONFIG_PMAC'
powerpc: fix typo 'CONFIG_PPC_CPU'
powerpc/powernv: Don't escalate non-existing frozen PE
powerpc/eeh: Report frozen parent PE prior to child PE
powerpc/eeh: Clear frozen state for child PE
powerpc/powernv: Reduce panic timeout from 180s to 10s
powerpc/xmon: avoid format string leaking to printk
selftests/powerpc: Add tests of PMU EBBs
selftests/powerpc: Add support for skipping tests
selftests/powerpc: Put the test in a separate process group
selftests/powerpc: Fix instruction loop for ABIv2 (LE)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more scheduler updates from Ingo Molnar:
"Second round of scheduler changes:
- try-to-wakeup and IPI reduction speedups, from Andy Lutomirski
- continued power scheduling cleanups and refactorings, from Nicolas
Pitre
- misc fixes and enhancements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Delete extraneous extern for to_ratio()
sched/idle: Optimize try-to-wake-up IPI
sched/idle: Simplify wake_up_idle_cpu()
sched/ |