linux.git/include/linux, branch v4.8.10

ACPI/PCI: pci_link: penalize SCI correctly

2016-11-18T09:51:51+00:00

commit f1caa61df2a3dc4c58316295c5dc5edba4c68d85 upstream.

Ondrej reported that IRQs stopped working in v4.7 on several
platforms.  A typical scenario, from Ondrej's VT82C694X/694X, is:

ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI

We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.

We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.

Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().

Fixes: 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org
Reported-by: Ondrej Zary 
Acked-by: Bjorn Helgaas 
Signed-off-by: Sinan Kaya 
Tested-by: Jonathan Liu 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

svcrdma: Tail iovec leaves an orphaned DMA mapping

2016-11-18T09:51:50+00:00

commit cace564f8b6260e806f5e28d7f192fd0e0c603ed upstream.

The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.

However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.

This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.

krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.

Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: Chuck Lever 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

mm, frontswap: make sure allocated frontswap map is assigned

2016-11-18T09:51:44+00:00

commit 5e322beefc8699b5747cfb35539a9496034e4296 upstream.

Christian Borntraeger reports:

With commit 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to
static key") kmemleak complains about a memory leak in swapon

    unreferenced object 0x3e09ba56000 (size 32112640):
      comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
         __vmalloc_node_range+0x194/0x2d8
         vzalloc+0x58/0x68
         SyS_swapon+0xd60/0x12f8
         system_call+0xd6/0x270

Turns out kmemleak is right.  We now allocate the frontswap map
depending on the kernel config (and no longer on the enablement)

  swapfile.c:
  [...]
      if (IS_ENABLED(CONFIG_FRONTSWAP))
                frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));

but later on this is passed along
  --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);

and ignored if frontswap is disabled
  --> frontswap_init(p->type, frontswap_map);

  static inline void frontswap_init(unsigned type, unsigned long *map)
  {
        if (frontswap_enabled())
                __frontswap_init(type, map);
  }

Thing is, that frontswap map is never freed.

The leakage is relatively not that bad, because swapon is an infrequent
and privileged operation.  However, if the first frontswap backend is
registered after a swap type has been already enabled, it will WARN_ON
in frontswap_register_ops() and frontswap will not be available for the
swap type.

Fix this by making sure the map is assigned by frontswap_init() as long
as CONFIG_FRONTSWAP is enabled.

Fixes: 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to static key")
Link: http://lkml.kernel.org/r/20161026134220.2566-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Reported-by: Christian Borntraeger 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Juergen Gross 
Cc: "Kirill A. Shutemov" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

net: add recursion limit to GRO

2016-11-15T06:48:52+00:00

[ Upstream commit fcd91dd449867c6bfe56a81cabba76b829fd05cd ]

Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš  for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca 
Reviewed-by: Jiri Benc 
Acked-by: Hannes Frederic Sowa 
Acked-by: Tom Herbert 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: core: Correctly iterate over lower adjacency list

2016-11-15T06:48:52+00:00

[ Upstream commit e4961b0768852d9eb7383e1a5df178eacb714656 ]

Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
 task: ffff88017edfea40 task.stack: ffff88017ee10000
 RIP: 0010:[]  [] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
 Call Trace:
  
  [] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
  [] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
  [] notifier_call_chain+0x4a/0x70
  [] atomic_notifier_call_chain+0x1a/0x20
  [] call_netevent_notifiers+0x1b/0x20
  [] neigh_update+0x306/0x740
  [] neigh_event_ns+0x4e/0xb0
  [] arp_process+0x66f/0x700
  [] ? common_interrupt+0x8c/0x8c
  [] arp_rcv+0x139/0x1d0
  [] ? vlan_do_receive+0xda/0x320
  [] __netif_receive_skb_core+0x524/0xab0
  [] ? dev_queue_xmit+0x10/0x20
  [] ? br_forward_finish+0x3d/0xc0 [bridge]
  [] ? br_handle_vlan+0xf6/0x1b0 [bridge]
  [] __netif_receive_skb+0x18/0x60
  [] netif_receive_skb_internal+0x40/0xb0
  [] netif_receive_skb+0x1c/0x70
  [] br_pass_frame_up+0xc6/0x160 [bridge]
  [] ? deliver_clone+0x37/0x50 [bridge]
  [] ? br_flood+0xcc/0x160 [bridge]
  [] br_handle_frame_finish+0x224/0x4f0 [bridge]
  [] br_handle_frame+0x174/0x300 [bridge]
  [] __netif_receive_skb_core+0x329/0xab0
  [] ? find_next_bit+0x15/0x20
  [] ? cpumask_next_and+0x32/0x50
  [] ? load_balance+0x178/0x9b0
  [] __netif_receive_skb+0x18/0x60
  [] netif_receive_skb_internal+0x40/0xb0
  [] netif_receive_skb+0x1c/0x70
  [] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
  [] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
  [] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
  [] tasklet_action+0xf6/0x110
  [] __do_softirq+0xf6/0x280
  [] irq_exit+0xdf/0xf0
  [] do_IRQ+0x54/0xd0
  [] common_interrupt+0x8c/0x8c

The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.

Fix this by advancing the iterator and avoid the infinite loop.

Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel 
Reported-by: Tamir Winetroub 
Reviewed-by: Jiri Pirko 
Acked-by: David Ahern 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

pwm: Unexport children before chip removal

2016-11-10T15:38:56+00:00

commit 0733424c9ba9f42242409d1ece780777272f7ea1 upstream.

Exported pwm channels aren't removed before the pwmchip and are
leaked. This results in invalid sysfs files. This fix removes
all exported pwm channels before chip removal.

Signed-off-by: David Hsu 
Fixes: 76abbdde2d95 ("pwm: Add sysfs interface")
Signed-off-by: Thierry Reding 
Signed-off-by: Greg Kroah-Hartman

libnvdimm: clear the internal poison_list when clearing badblocks

2016-10-31T11:02:16+00:00

commit e046114af5fcafe8d6d3f0b6ccb99804bad34bfb upstream.

nvdimm_clear_poison cleared the user-visible badblocks, and sent
commands to the NVDIMM to clear the areas marked as 'poison', but it
neglected to clear the same areas from the internal poison_list which is
used to marshal ARS results before sorting them by namespace. As a
result, once on-demand ARS functionality was added:

37b137f nfit, libnvdimm: allow an ARS scrub to be triggered on demand

A scrub triggered from either sysfs or an MCE was found to be adding
stale entries that had been cleared from gendisk->badblocks, but were
still present in nvdimm_bus->poison_list. Additionally, the stale entries
could be triggered into producing stale disk->badblocks by simply disabling
and re-enabling the namespace or region.

This adds the missing step of clearing poison_list entries when clearing
poison, so that it is always in sync with badblocks.

Fixes: 37b137f ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
Signed-off-by: Vishal Verma 
Signed-off-by: Dan Williams 
Signed-off-by: Greg Kroah-Hartman

mm/hugetlb: check for reserved hugepages during memory offline

2016-10-31T11:02:11+00:00

commit 082d5b6b60e9f25e1511557fcfcb21eedd267446 upstream.

In dissolve_free_huge_pages(), free hugepages will be dissolved without
making sure that there are enough of them left to satisfy hugepage
reservations.

Fix this by adding a return value to dissolve_free_huge_pages() and
checking h->free_huge_pages vs.  h->resv_huge_pages.  Note that this may
lead to the situation where dissolve_free_huge_page() returns an error
and all free hugepages that were dissolved before that error are lost,
while the memory block still cannot be set offline.

Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Link: http://lkml.kernel.org/r/20160926172811.94033-3-gerald.schaefer@de.ibm.com
Signed-off-by: Gerald Schaefer 
Acked-by: Michal Hocko 
Acked-by: Naoya Horiguchi 
Cc: "Kirill A . Shutemov" 
Cc: Vlastimil Babka 
Cc: Mike Kravetz 
Cc: "Aneesh Kumar K . V" 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Rui Teng 
Cc: Dave Hansen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

posix_acl: Clear SGID bit when setting file permissions

2016-10-31T11:02:08+00:00

commit 073931017b49d9458aa351605b43a7e34598caef upstream.

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2).  Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jeff Layton 
Signed-off-by: Jan Kara 
Signed-off-by: Andreas Gruenbacher 
Signed-off-by: Juerg Haefliger 
Signed-off-by: Greg Kroah-Hartman

irqchip/gic-v3-its: Fix entry size mask for GITS_BASER

2016-10-28T07:45:28+00:00

commit 9224eb77e63f70f16c0b6b7a20ca7d395f3bc077 upstream.

Entry Size in GITS_BASER occupies 5 bits [52:48], but we mask out 8
bits.

Fixes: cc2d3216f53c ("irqchip: GICv3: ITS command queue")
Signed-off-by: Vladimir Murzin 
Signed-off-by: Marc Zyngier 
Signed-off-by: Greg Kroah-Hartman