linux.git/arch, branch v4.9.69

sparc64/mm: set fields in deferred pages

2017-12-14T08:28:23+00:00

[ Upstream commit 2a20aa171071a334d80c4e5d5af719d8374702fc ]

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to
first initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set
some fields prior to initializing:

mem_init() {
     register_page_bootmem_info();
     free_all_bootmem();
     ...
}

When register_page_bootmem_info() is called only non-deferred struct
pages are initialized.  But, this function goes through some reserved
pages which might be part of the deferred, and thus are not yet
initialized.

mem_init
register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

free_all_bootmem()
free_low_memory_core_early()
 for_each_reserved_mem_region()
  reserve_bootmem_region()
   init_reserved_page() <- Only if this is deferred reserved page
    __init_single_pfn()
     __init_single_page()
      memset(0) <-- Loose the set fields here

We end up with similar issue as in the previous patch, where currently
we do not observe problem as memory is zeroed.  But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page
memory during allocation, we must make sure that struct pages are
properly initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Link: http://lkml.kernel.org/r/20171013173214.27300-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
Acked-by: Michal Hocko 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Ard Biesheuvel 
Cc: Catalin Marinas 
Cc: Christian Borntraeger 
Cc: Dmitry Vyukov 
Cc: Heiko Carstens 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Mark Rutland 
Cc: Matthew Wilcox 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Sam Ravnborg 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested

2017-12-14T08:28:21+00:00

[ Upstream commit 7aafac11e308d37ed3c509829bb43d80c1811ac3 ]

The IODA2 specification says that a 64 DMA address cannot use top 4 bits
(3 are reserved and one is a "TVE select"); bottom page_shift bits
cannot be used for multilevel table addressing either.

The existing IODA2 table allocation code aligns the minimum TCE table
size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages,
we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs
13 bits, the maximum number of levels is 48/13 = 3 so we physically
cannot address more and EEH happens on DMA accesses.

This adds a check that too many levels were requested.

It is still possible to have 5 levels in the case of 4K system page size.

Signed-off-by: Alexey Kardashevskiy 
Acked-by: Gavin Shan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

axonram: Fix gendisk handling

2017-12-14T08:28:21+00:00

[ Upstream commit 672a2c87c83649fb0167202342ce85af9a3b4f1c ]

It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
handling in axon_ram_probe() to avoid doing that.

Also del_gendisk() does not drop a reference to gendisk allocated by
alloc_disk(). That has to be done by put_disk(). Add that call where
needed.

Reported-by: Dan Carpenter 
Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

arm64: KVM: Survive unknown traps from guests

2017-12-14T08:28:19+00:00

[ Upstream commit ba4dd156eabdca93501d92a980ba27fa5f4bbd27 ]

Currently we BUG() if we see an ESR_EL2.EC value we don't recognise. As
configurable disables/enables are added to the architecture (controlled
by RES1/RES0 bits respectively), with associated synchronous exceptions,
it may be possible for a guest to trigger exceptions with classes that
we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0487A.k_iss10775, page
D7-1937, EC values within the range 0x00 - 0x2c are reserved for future
use with synchronous exceptions, and EC values within the range 0x2d -
0x3f may be used for either synchronous or asynchronous exceptions.

The patch makes KVM handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the host dmesg. We could later improve on this with with a new (opt-in)
exit to the host userspace.

Cc: Dave Martin 
Cc: Suzuki K Poulose 
Reviewed-by: Christoffer Dall 
Signed-off-by: Mark Rutland 
Signed-off-by: Marc Zyngier 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

arm: KVM: Survive unknown traps from guests

2017-12-14T08:28:19+00:00

[ Upstream commit f050fe7a9164945dd1c28be05bf00e8cfb082ccf ]

Currently we BUG() if we see a HSR.EC value we don't recognise. As
configurable disables/enables are added to the architecture (controlled
by RES1/RES0 bits respectively), with associated synchronous exceptions,
it may be possible for a guest to trigger exceptions with classes that
we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0406C.c, all currently
unallocated HSR EC encodings are reserved, and per ARM DDI
0487A.k_iss10775, page G6-4395, EC values within the range 0x00 - 0x2c
are reserved for future use with synchronous exceptions, and EC values
within the range 0x2d - 0x3f may be used for either synchronous or
asynchronous exceptions.

The patch makes KVM handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the host dmesg. We could later improve on this with with a new (opt-in)
exit to the host userspace.

Cc: Dave Martin 
Cc: Suzuki K Poulose 
Reviewed-by: Christoffer Dall 
Signed-off-by: Mark Rutland 
Signed-off-by: Marc Zyngier 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset

2017-12-14T08:28:19+00:00

[ Upstream commit 2f707d97982286b307ef2a9b034e19aabc1abb56 ]

Reported by syzkaller:

    WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029
    nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
    CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:15 [inline]
     dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
     panic+0x1fb/0x412 kernel/panic.c:179
     __warn+0x1c4/0x1e0 kernel/panic.c:540
     warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
     nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
     vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline]
     vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324
     kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099
     do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128
     __msr_io arch/x86/kvm/x86.c:2577 [inline]
     msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614
     kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497
     kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721
     vfs_ioctl fs/ioctl.c:43 [inline]
     do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
     SYSC_ioctl fs/ioctl.c:698 [inline]
     SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
     entry_SYSCALL_64_fastpath+0x1f/0xc2

The syzkaller folks reported a nested_run_pending warning during userspace
clear VMX capability which is exposed to L1 before.

The warning gets thrown while doing

(*(uint32_t*)0x20aecfe8 = (uint32_t)0x1);
(*(uint32_t*)0x20aecfec = (uint32_t)0x0);
(*(uint32_t*)0x20aecff0 = (uint32_t)0x3a);
(*(uint32_t*)0x20aecff4 = (uint32_t)0x0);
(*(uint64_t*)0x20aecff8 = (uint64_t)0x0);
r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul,
		0x20aecfe8ul, 0, 0, 0, 0, 0, 0);

i.e. KVM_SET_MSR ioctl with

struct kvm_msrs {
	.nmsrs = 1,
		.pad = 0,
		.entries = {
			{.index = MSR_IA32_FEATURE_CONTROL,
			 .reserved = 0,
			 .data = 0}
		}
}

The VMLANCH/VMRESUME emulation should be stopped since the CPU is going to
reset here. This patch resets the nested_run_pending since the CPU is going
to be reset hence there should be nothing pending.

Reported-by: Dmitry Vyukov 
Suggested-by: Radim Krčmář 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Dmitry Vyukov 
Cc: David Hildenbrand 
Signed-off-by: Wanpeng Li 
Reviewed-by: David Hildenbrand 
Reviewed-by: Jim Mattson 
Signed-off-by: Radim Krčmář 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

kvm: nVMX: VMCLEAR should not cause the vCPU to shut down

2017-12-14T08:28:19+00:00

[ Upstream commit 587d7e72aedca91cee80c0a56811649c3efab765 ]

VMCLEAR should silently ignore a failure to clear the launch state of
the VMCS referenced by the operand.

Signed-off-by: Jim Mattson 
[Changed "kvm_write_guest(vcpu->kvm" to "kvm_vcpu_write_guest(vcpu".]
Signed-off-by: Radim Krčmář 

Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

ARM: OMAP2+: Release device node after it is no longer needed.

2017-12-14T08:28:18+00:00

[ Upstream commit b92675d998a9fa37fe9e0e35053a95b4a23c158b ]

The device node returned by of_find_node_by_name() needs to be released
after it is no longer needed to avoid a device node leak.

Signed-off-by: Guenter Roeck 
Signed-off-by: Tony Lindgren 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

ARM: OMAP2+: Fix device node reference counts

2017-12-14T08:28:18+00:00

[ Upstream commit 10e5778f54765c96fe0c8f104b7a030e5b35bc72 ]

After commit 0549bde0fcb1 ("of: fix of_node leak caused in
of_find_node_opts_by_path"), the following error may be
reported when running omap images.

OF: ERROR: Bad of_node_put() on /ocp@68000000
CPU: 0 PID: 0 Comm: swapper Not tainted 4.10.0-rc7-next-20170210 #1
Hardware name: Generic OMAP3-GP (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x98/0xac)
[] (dump_stack) from [] (kobject_release+0x48/0x7c)
[] (kobject_release)
	from [] (of_find_node_by_name+0x74/0x94)
[] (of_find_node_by_name)
	from [] (omap3xxx_hwmod_is_hs_ip_block_usable+0x24/0x2c)
[] (omap3xxx_hwmod_is_hs_ip_block_usable) from
[] (omap3xxx_hwmod_init+0x180/0x274)
[] (omap3xxx_hwmod_init)
	from [] (omap3_init_early+0xa0/0x11c)
[] (omap3_init_early)
	from [] (omap3430_init_early+0x8/0x30)
[] (omap3430_init_early)
	from [] (setup_arch+0xc04/0xc34)
[] (setup_arch) from [] (start_kernel+0x68/0x38c)
[] (start_kernel) from [<8020807c>] (0x8020807c)

of_find_node_by_name() drops the reference to the passed device node.
The commit referenced above exposes this problem.

To fix the problem, use of_get_child_by_name() instead of
of_find_node_by_name(); of_get_child_by_name() does not drop
the reference count of passed device nodes. While semantically
different, we only look for immediate children of the passed
device node, so of_get_child_by_name() is a more appropriate
function to use anyway.

Release the reference to the device node obtained with
of_get_child_by_name() after it is no longer needed to avoid
another device node leak.

While at it, clean up the code and change the return type of
omap3xxx_hwmod_is_hs_ip_block_usable() to bool to match its use
and the return type of of_device_is_available().

Cc: Qi Hou 
Cc: Peter Rosin 
Cc: Rob Herring 
Signed-off-by: Guenter Roeck 
Signed-off-by: Tony Lindgren 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

powerpc/64: Fix checksum folding in csum_add()

2017-12-14T08:28:17+00:00

[ Upstream commit 6ad966d7303b70165228dba1ee8da1a05c10eefe ]

Paul's patch to fix checksum folding, commit b492f7e4e07a ("powerpc/64:
Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold")
missed a case in csum_add(). Fix it.

Signed-off-by: Shile Zhang 
Acked-by: Paul Mackerras 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman