linux.git/arch/s390/include, branch v4.4.12

s390/mm: fix asce_bits handling with dynamic pagetable levels

2016-05-19T00:06:44+00:00

commit 723cacbd9dc79582e562c123a0bacf8bfc69e72a upstream.

There is a race with multi-threaded applications between context switch and
pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
pagetable upgrade on another thread in crst_table_upgrade() could already
have set new asce_bits, but not yet the new mm->pgd. This would result in a
corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
translation exception.

Fix this by storing the complete asce instead of just the asce_bits, which
can then be read atomically from switch_mm(), so that it either sees the
old value or the new value, but no mixture. Both cases are OK. Having the
old value would result in a page fault on access to the higher level memory,
but the fault handler would see the new mm->pgd, if it was a valid access
after the mmap on the other thread has completed. So as worst-case scenario
we would have a page fault loop for the racing thread until the next time
slice.

Also remove dead code and simplify the upgrade/downgrade path, there are no
upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
There are also no concurrent upgrades, because the mmap_sem is held with
down_write() in do_mmap, so the flush and table checks during upgrade can
be removed.

Reported-by: Michael Munday 
Reviewed-by: Martin Schwidefsky 
Signed-off-by: Gerald Schaefer 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Greg Kroah-Hartman

s390/pci: add extra padding to function measurement block

2016-05-04T21:48:44+00:00

commit 9d89d9e61d361f3adb75e1aebe4bb367faf16cfa upstream.

Newer machines might use a different (larger) format for function
measurement blocks. To ensure that we comply with the alignment
requirement on these machines and prevent memory corruption (when
firmware writes more data than we expect) add 16 padding bytes
at the end of the fmb.

Signed-off-by: Sebastian Ott 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Greg Kroah-Hartman

s390/pci: enforce fmb page boundary rule

2016-04-12T16:08:37+00:00

commit 80c544ded25ac14d7cc3e555abb8ed2c2da99b84 upstream.

The function measurement block must not cross a page boundary. Ensure
that by raising the alignment requirement to the smallest power of 2
larger than the size of the fmb.

Fixes: d0b088531 ("s390/pci: performance statistics and debug infrastructure")
Signed-off-by: Sebastian Ott 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Greg Kroah-Hartman

s390/mm: four page table levels vs. fork

2016-03-16T15:42:58+00:00

commit 3446c13b268af86391d06611327006b059b8bab1 upstream.

The fork of a process with four page table levels is broken since
git commit 6252d702c5311ce9 "[S390] dynamic page tables."

All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.

The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.

This fixes CVE-2016-2143.

Reported-by: Marcin Kościelnicki 
Reviewed-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Greg Kroah-Hartman

s390/fpu: signals vs. floating point control register

2016-03-03T23:07:12+00:00

commit 1b17cb796f5d40ffa239c6926385abd83a77a49b upstream.

git commit 904818e2f229f3d94ec95f6932a6358c81e73d78
"s390/kernel: introduce fpu-internal.h with fpu helper functions"
introduced the fpregs_store / fp_regs_load helper. These function
fail to save and restore the floating pointer control registers.

The effect is that the FPC is not correctly handled on signal
delivery and signal return.

Signed-off-by: Martin Schwidefsky 
Signed-off-by: Greg Kroah-Hartman

KVM: s390: fix memory overwrites when vx is disabled

2016-03-03T23:07:11+00:00

commit 9abc2a08a7d665b02bdde974fd6c44aae86e923e upstream.

The kernel now always uses vector registers when available, however KVM
has special logic if support is really enabled for a guest. If support
is disabled, guest_fpregs.fregs will only contain memory for the fpu.
The kernel, however, will store vector registers into that area,
resulting in crazy memory overwrites.

Simply extending that area is not enough, because the format of the
registers also changes. We would have to do additional conversions, making
the code even more complex. Therefore let's directly use one place for
the vector/fpu registers + fpc (in kvm_run). We just have to convert the
data properly when accessing it. This makes current code much easier.

Please note that vector/fpu registers are now always stored to
vcpu->run->s.regs.vrs. Although this data is visible to QEMU and
used for migration, we only guarantee valid values to user space  when
KVM_SYNC_VRS is set. As that is only the case when we have vector
register support, we are on the safe side.

Fixes: b5510d9b68c3 ("s390/fpu: always enable the vector facility if it is available")
Cc: stable@vger.kernel.org # v4.4 d9a3a09af54d s390/kvm: remove dependency on struct save_area definition
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
[adopt to d9a3a09af54d]
Signed-off-by: Greg Kroah-Hartman

s390: wire up mlock2 system call

2015-11-16T11:51:07+00:00

Passes mlock2-tests test case in 64 bit and compat mode.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390: avoid cache aliasing under z/VM and KVM

2015-11-16T11:04:18+00:00

commit 1f6b83e5e4d3 ("s390: avoid z13 cache aliasing") checks for the
machine type to optimize address space randomization and zero page
allocation to avoid cache aliases.

This check might fail under a hypervisor with migration support.
z/VMs "Single System Image and Live Guest Relocation" facility will
"fake" the machine type of the oldest system in the group. For example
in a group of zEC12 and Z13 the guest appears to run on a zEC12
(architecture fencing within the relocation domain)

Remove the machine type detection and always use cache aliasing
rules that are known to work for all machines. These are the z13
aliasing rules.

Suggested-by: Christian Borntraeger 
Reviewed-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390: add support for ipl devices in subchannel sets > 0

2015-11-11T12:56:27+00:00

Allow to ipl from CCW based devices residing in any subchannel set.

Reviewed-by: Michael Holzheu 
Signed-off-by: Sebastian Ott 
Signed-off-by: Martin Schwidefsky

s390/pci_dma: handle dma table failures

2015-11-09T08:10:49+00:00

We use lazy allocation for translation table entries but don't handle
allocation (and other) failures during translation table updates.

Handle these failures and undo translation table updates when it's
meaningful.

Signed-off-by: Sebastian Ott 
Reviewed-by: Gerald Schaefer 
Signed-off-by: Martin Schwidefsky