linux.git/include, branch v3.18.4

genirq: Prevent proc race against freeing of irq descriptors

2015-01-27T16:29:37+00:00

commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0				CPU1
				show_interrupts()
				  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
				  raw_spinlock_irq(&desc->lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

uapi/linux/target_core_user.h: fix headers_install.sh badness

2015-01-27T16:29:37+00:00

commit 3875f15207f9ecb3f24a8e91e7ad196899139595 upstream.

scripts/headers_install.sh will transform __packed to
__attribute__((packed)), so the #ifndef is not necessary.
(and, in fact, it's problematic, because we'll end up with the header
 containing:
#ifndef __attribute__((packed))
#define __attribu...
and so forth.)

Signed-off-by: Kyle McMartin 
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Greg Kroah-Hartman

net: Generalize ndo_gso_check to ndo_features_check

2015-01-27T16:29:33+00:00

[ Upstream commit 5f35227ea34bb616c436d9da47fc325866c428f3 ]

GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.

This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.

By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.

CC: Tom Herbert 
CC: Joe Stringer 
CC: Eric Dumazet 
CC: Hayes Wang 
Signed-off-by: Jesse Gross 
Acked-by:  Tom Herbert 
Fixes: 04ffcb255f22 ("net: Add ndo_gso_check")
Tested-by: Hayes Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

in6: fix conflict with glibc

2015-01-27T16:29:33+00:00

[ Upstream commit 6d08acd2d32e3e877579315dc3202d7a5f336d98 ]

Resolve conflicts between glibc definition of IPV6 socket options
and those defined in Linux headers. Looks like earlier efforts to
solve this did not cover all the definitions.

It resolves warnings during iproute2 build.
Please consider for stable as well.

Signed-off-by: Stephen Hemminger 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

mm: propagate error from stack expansion even for guard page

2015-01-16T14:59:57+00:00

commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream.

Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.

This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.

And since /proc/maps explicitly ignores the guard page (commit
d7824370e263: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.

This is the minimal patch: it just propagates the error.  It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.

Let's see if anybody notices.  We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.

Reported-and-tested-by: Jay Foad 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: protect set_page_dirty() from ongoing truncation

2015-01-16T14:59:57+00:00

commit 2d6d7f98284648c5ed113fe22a132148950b140f upstream.

Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:

__set_page_dirty_nobuffers()       __delete_from_page_cache()
  if (TestSetPageDirty(page))
                                     page->mapping = NULL
				     if (PageDirty())
				       dec_zone_page_state(page, NR_FILE_DIRTY);
				       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
    if (page->mapping)
      account_page_dirtied(page)
        __inc_zone_page_state(page, NR_FILE_DIRTY);
	__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held.  The notable exception to this rule, though,
is do_wp_page(), for which this race exists.  However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes.  Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.

Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed.  Remove it.

Reported-by: Tejun Heo 
Signed-off-by: Johannes Weiner 
Acked-by: Kirill A. Shutemov 
Reviewed-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

Revert "mac80211: Fix accounting of the tailroom-needed counter"

2015-01-16T14:59:56+00:00

commit 1e359a5de861a57aa04d92bb620f52a5c1d7f8b1 upstream.

This reverts commit ca34e3b5c808385b175650605faa29e71e91991b.

It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: Christopher Chavez 
Bisected-by: Larry Finger 
Debugged-by: Christian Lamparter 
Signed-off-by: Johannes Berg 
Signed-off-by: Greg Kroah-Hartman

Drivers: hv: util: make struct hv_do_fcopy match Hyper-V host messages

2015-01-16T14:59:53+00:00

commit 31d4ea1a093fcf668d5f95af44b8d41488bdb7ec upstream.

An attempt to fix fcopy on i586 (bc5a5b0 Drivers: hv: util: Properly pack the data
for file copy functionality) led to a regression on x86_64 (and actually didn't fix
i586 breakage). Fcopy messages from Hyper-V host come in the following format:

struct do_fcopy_hdr   |   36 bytes
0000                  |    4 bytes
offset                |    8 bytes
size                  |    4 bytes
data                  | 6144 bytes

On x86_64 struct hv_do_fcopy matched this format without ' __attribute__((packed))'
and on i586 adding ' __attribute__((packed))' to it doesn't change anything. Keep
the structure packed and add padding to match re reality. Tested both i586 and x86_64
on Hyper-V Server 2012 R2.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Greg Kroah-Hartman

tracing/sched: Check preempt_count() for current when reading task->state

2015-01-16T14:59:52+00:00

commit aee4e5f3d3abb7a2239dd02f6d8fb173413fd02f upstream.

When recording the state of a task for the sched_switch tracepoint a check of
task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
because, technically, a task being preempted is really in the TASK_RUNNING
state, and that is what should be recorded when tracing a sched_switch,
even if the task put itself into another state (it hasn't scheduled out
in that state yet).

But with the change to use per_cpu preempt counts, the
task_thread_info(p)->preempt_count is no longer used, and instead
task_preempt_count(p) is used.

The problem is that this does not use the current preempt count but a stale
one from a previous sched_switch. The task_preempt_count(p) uses
saved_preempt_count and not preempt_count(). But for tracing sched_switch,
if p is current, we really want preempt_count().

I hit this bug when I was tracing sleep and the call from do_nanosleep()
scheduled out in the "RUNNING" state.

           sleep-4290  [000] 537272.259992: sched_switch:         sleep:4290 [120] R ==> swapper/0:0 [120]
           sleep-4290  [000] 537272.260015: kernel_stack:         
=> __schedule (ffffffff8150864a)
=> schedule (ffffffff815089f8)
=> do_nanosleep (ffffffff8150b76c)
=> hrtimer_nanosleep (ffffffff8108d66b)
=> SyS_nanosleep (ffffffff8108d750)
=> return_to_handler (ffffffff8150e8e5)
=> tracesys_phase2 (ffffffff8150c844)

After a bit of hair pulling, I found that the state was really
TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
set and caused the sched_switch tracepoint to show it as RUNNING.

Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.home

Acked-by: Ingo Molnar 
Cc: Peter Zijlstra 
Fixes: 01028747559a "sched: Create more preempt_count accessors"
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

pstore-ram: Allow optional mapping with pgprot_noncached

2015-01-16T14:59:47+00:00

commit 027bc8b08242c59e19356b4b2c189f2d849ab660 upstream.

On some ARMs the memory can be mapped pgprot_noncached() and still
be working for atomic operations. As pointed out by Colin Cross
, in some cases you do want to use
pgprot_noncached() if the SoC supports it to see a debug printk
just before a write hanging the system.

On ARMs, the atomic operations on strongly ordered memory are
implementation defined. So let's provide an optional kernel parameter
for configuring pgprot_noncached(), and use pgprot_writecombine() by
default.

Cc: Arnd Bergmann 
Cc: Rob Herring 
Cc: Randy Dunlap 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Olof Johansson 
Cc: Russell King 
Acked-by: Kees Cook 
Signed-off-by: Tony Lindgren 
Signed-off-by: Tony Luck 
Signed-off-by: Greg Kroah-Hartman