linux.git/arch/sparc/include, branch v4.4.177

sparc64 mm: Fix more TSB sizing issues

2018-11-10T15:41:39+00:00

[ Upstream commit 1e953d846ac015fbfcf09c857e8f893924cb629c ]

Commit af1b1a9b36b8 ("sparc64 mm: Fix base TSB sizing when hugetlb
pages are used") addressed the difference between hugetlb and THP
pages when computing TSB sizes.  The following additional issues
were also discovered while working with the code.

In order to save memory, THP makes use of a huge zero page.  This huge
zero page does not count against a task's RSS, but it does consume TSB
entries.  This is similar to hugetlb pages.  Therefore, count huge
zero page entries in hugetlb_pte_count.

Accounting of THP pages is done in the routine set_pmd_at().
Unfortunately, this does not catch the case where a THP page is split.
To handle this case, decrement the count in pmdp_invalidate().
pmdp_invalidate is only called when splitting a THP.  However, 'sanity
checks' are added in case it is ever called for other purposes.

A more general issue exists with HPAGE_SIZE accounting.
hugetlb_pte_count tracks the number of HPAGE_SIZE (8M) pages.  This
value is used to size the TSB for HPAGE_SIZE pages.  However,
each HPAGE_SIZE page consists of two REAL_HPAGE_SIZE (4M) pages.
The TSB contains an entry for each REAL_HPAGE_SIZE page.  Therefore,
the number of REAL_HPAGE_SIZE pages should be used to size the huge
page TSB.  A new compile time constant REAL_HPAGE_PER_HPAGE is used
to multiply hugetlb_pte_count before sizing the TSB.

Changes from V1
- Fixed build issue if hugetlb or THP not configured

Signed-off-by: Mike Kravetz 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin

sparc64: Make atomic_xchg() an inline function rather than a macro.

2018-05-30T05:49:08+00:00

[ Upstream commit d13864b68e41c11e4231de90cf358658f6ecea45 ]

This avoids a lot of -Wunused warnings such as:

====================
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
 #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))

./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
                              ^~~~
kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
    atomic_xchg(&kgdb_active, cpu);
    ^~~~~~~~~~~
====================

Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

futex: Remove duplicated code and fix undefined behaviour

2018-05-26T06:48:50+00:00

commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.

There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby 
Signed-off-by: Thomas Gleixner 
Acked-by: Russell King 
Acked-by: Michael Ellerman  (powerpc)
Acked-by: Heiko Carstens  [s390]
Acked-by: Chris Metcalf  [for tile]
Reviewed-by: Darren Hart (VMware) 
Reviewed-by: Will Deacon  [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker 
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt 
Cc: Max Filippov 
Cc: Paul Mackerras 
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn 
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato 
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller 
Cc: "James E.J. Bottomley" 
Cc: Catalin Marinas 
Cc: Matt Turner 
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu 
Cc: Arnd Bergmann 
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson 
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky 
Cc: Stafford Horne 
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson 
Cc: Chris Zankel 
Cc: Michal Simek 
Cc: Tony Luck 
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta 
Cc: Ralf Baechle 
Cc: Richard Kuo 
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" 
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

sparc64: Migrate hvcons irq to panicked cpu

2017-10-21T15:09:05+00:00

[ Upstream commit 7dd4fcf5b70694dc961eb6b954673e4fc9730dbd ]

On panic, all other CPUs are stopped except the one which had
hit panic. To keep console alive, we need to migrate hvcons irq
to panicked CPU.

Signed-off-by: Vijay Kumar 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

sparc64: Prevent perf from running during super critical sections

2017-08-13T02:29:09+00:00

commit fc290a114fc6034b0f6a5a46e2fb7d54976cf87a upstream.

This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.

Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).

But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.

One particular critical section occurs in switch_mm:

        spin_lock_irqsave(&mm->context.lock, flags);
        ...
        load_secondary_context(mm);
        tsb_context_switch(mm);
        ...
        spin_unlock_irqrestore(&mm->context.lock, flags);

If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.

This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.

Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.

Orabug: 25577560

Signed-off-by: Dave Aldridge 
Signed-off-by: Rob Gardner 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc64: Measure receiver forward progress to avoid send mondo timeout

2017-08-11T16:08:56+00:00

[ Upstream commit 9d53caec84c7c5700e7c1ed744ea584fff55f9ac ]

A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.

But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.

This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.

In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.

Orabug: 25476541
Orabug: 26417466

Signed-off-by: Jane Chu 
Reviewed-by: Steve Sistare 
Reviewed-by: Anthony Yznaga 
Reviewed-by: Rob Gardner 
Reviewed-by: Thomas Tai 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc64: delete old wrap code

2017-06-14T11:16:20+00:00

[ Upstream commit 0197e41ce70511dc3b71f7fefa1a676e2b5cd60b ]

The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Bob Picco 
Reviewed-by: Steven Sistare 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc64: add per-cpu mm of secondary contexts

2017-06-14T11:16:20+00:00

[ Upstream commit 7a5b4bbf49fe86ce77488a70c5dccfe2d50d7a2d ]

The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Bob Picco 
Reviewed-by: Steven Sistare 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc64: redefine first version

2017-06-14T11:16:20+00:00

[ Upstream commit c4415235b2be0cc791572e8e7f7466ab8f73a2bf ]

CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Bob Picco 
Reviewed-by: Steven Sistare 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc64: combine activate_mm and switch_mm

2017-06-14T11:16:20+00:00

[ Upstream commit 14d0334c6748ff2aedb3f2f7fdc51ee90a9b54e7 ]

The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Bob Picco 
Reviewed-by: Steven Sistare 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman