| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 6d6ea1e967a246f12cfe2f5fb743b70b2e608d4a upstream.
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
v6.
This is a followup to the discussion in [1], [2].
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For L1 tables that are bigger than a page, we can just use
__get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
use GFP_DMA).
For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
1. This series, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]
This series is the most memory-efficient approach.
stable@ note:
We confirmed that this is a regression, and IOMMU errors happen on 4.19
and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA
with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
platforms (and maybe others?).
[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
[3] https://patchwork.codeaurora.org/patch/671639/
This patch (of 3):
IOMMUs using ARMv7 short-descriptor format require page tables to be
allocated within the first 4GB of RAM, even on 64-bit systems. On arm64,
this is done by passing GFP_DMA32 flag to memory allocation functions.
For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
a full page using get_free_pages, so we considered 3 approaches:
1. This patch, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2
page tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable
to reuse freed fragments until the whole page is freed.
This change makes it possible to create a custom cache in DMA32 zone using
kmem_cache_create, then allocate memory using kmem_cache_alloc.
We do not create a DMA32 kmalloc cache array, as there are currently no
users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a
warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
unnecessary).
Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Huaisheng Ye <yehs1@lenovo.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Yong Wu <yong.wu@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Tomasz Figa <tfiga@google.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9b7ea46a82b31c74a37e6ff1c2a1df7d53e392ab upstream.
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.
However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to commit
2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
__first_valid_page() to skip offline pages, undo_isolate_page_range()
here just waste CPU cycles looping around the offlining PFN range while
doing nothing, because __first_valid_page() will return NULL as
offline_isolated_pages() has already marked all memory sections within
the pfn range as offline via offline_mem_sections().
Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone counter
which would make some paths of the page allocation slower that the above
commit introduced.
Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.
[cai@lca.pw: v4]
Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw
Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw
Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7f07e5f1f778605e98cf2156d4db1ff3a3a1a74a ]
With a recent link mode advertisement code update this helper
providing local pause capability translation used for flow
control link mode negotiation got broken.
For eth drivers using this helper, the issue is apparent only
if either PAUSE or ASYM_PAUSE is being advertised.
Fixes: 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.
Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:
"In more details - normally IOCB_CMD_POLL handling looks so:
1) io_submit(2) allocates aio_kiocb instance and passes it to
aio_poll()
2) aio_poll() resolves the descriptor to struct file by req->file =
fget(iocb->aio_fildes)
3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
aio_kiocb to 2 (bumps by 1, that is).
4) aio_poll() calls vfs_poll(). After sanity checks (basically,
"poll_wait() had been called and only once") it locks the queue.
That's what the extra reference to iocb had been for - we know we
can safely access it.
5) With queue locked, we check if ->woken has already been set to
true (by aio_poll_wake()) and, if it had been, we unlock the
queue, drop a reference to aio_kiocb and bugger off - at that
point it's a responsibility to aio_poll_wake() and the stuff
called/scheduled by it. That code will drop the reference to file
in req->file, along with the other reference to our aio_kiocb.
6) otherwise, we see whether we need to wait. If we do, we unlock the
queue, drop one reference to aio_kiocb and go away - eventual
wakeup (or cancel) will deal with the reference to file and with
the other reference to aio_kiocb
7) otherwise we remove ourselves from waitqueue (still under the
queue lock), so that wakeup won't get us. No async activity will
be happening, so we can safely drop req->file and iocb ourselves.
If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
won't get freed under us, so we can do all the checks and locking
safely. And we don't touch ->file if we detect that case.
However, vfs_poll() most certainly *does* touch the file it had been
given. So wakeup coming while we are still in ->poll() might end up
doing fput() on that file. That case is not too rare, and usually we
are saved by the still present reference from descriptor table - that
fput() is not the final one.
But if another thread closes that descriptor right after our fget()
and wakeup does happen before ->poll() returns, we are in trouble -
final fput() done while we are in the middle of a method:
Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bb229bbb3bf63d23128e851a1f3b85c083178fa1 upstream.
Because map updates are distributed lazily, an OSD may not know about
the new blacklist for quite some time after "osd blacklist add" command
is completed. This makes it possible for a blacklisted but still alive
client to overwrite a post-blacklist update, resulting in data
corruption.
Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
the post-blacklist epoch for all post-blacklist requests ensures that
all such requests "wait" for the blacklist to come into force on their
respective OSDs.
Cc: stable@vger.kernel.org
Fixes: 6305a3b41515 ("libceph: support for blacklisting clients")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.
kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.
Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.
Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0bdb50c531f7377a9da80d3ce2d61f389c84cb30 upstream.
A dm-raid array with devices larger than 4GB won't assemble on
a 32 bit host since _check_data_dev_sectors() was added in 4.16.
This is because to_sector() treats its argument as an "unsigned long"
which is 32bits (4GB) on a 32bit host. Using "unsigned long long"
is more correct.
Kernels as early as 4.2 can have other problems due to to_sector()
being used on the size of a device.
Fixes: 0cf4503174c1 ("dm raid: add support for the MD RAID0 personality")
cc: stable@vger.kernel.org (v4.2+)
Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5870970b9a828d8693aa6d15742573289d7dbcd0 upstream.
When using VHE, the host needs to clear HCR_EL2.TGE bit in order
to interact with guest TLBs, switching from EL2&0 translation regime
to EL1&0.
However, some non-maskable asynchronous event could happen while TGE is
cleared like SDEI. Because of this address translation operations
relying on EL2&0 translation regime could fail (tlb invalidation,
userspace access, ...).
Fix this by properly setting HCR_EL2.TGE when entering NMI context and
clear it if necessary when returning to the interrupted context.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 492366f7b4237257ef50ca9c431a6a0d50225aca upstream.
This function will be used from dma_direct code to determine
the maximum segment size of a dma mapping.
Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit abe420bfae528c92bd8cc5ecb62dc95672b1fd6f upstream.
The function returns the maximum size that can be remapped
by the SWIOTLB implementation. This function will be later
exposed to users through the DMA-API.
Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 133d624b1cee16906134e92d5befb843b58bcf31 upstream.
The function returns the maximum size that can be mapped
using DMA-API functions. The patch also adds the
implementation for direct DMA and a new dma_map_ops pointer
so that other implementations can expose their limit.
Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b6e492467c78183bb629bb0a100ea3509b615a5 upstream.
With string type property entries we need to use
sizeof(const char *) instead of the number of characters as
the length of the entry.
If the string was shorter then sizeof(const char *),
attempts to read it would have failed with -EOVERFLOW. The
problem has been hidden because all build-in string
properties have had a string longer then 8 characters until
now.
Fixes: a85f42047533 ("device property: helper macros for property entry creation")
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a0ce2f0aa6ad97c3d4927bf2ca54bcebdf062d55 upstream.
Before this patch, it was possible for two pipes to affect each other after
data had been transferred between them with tee():
============
$ cat tee_test.c
int main(void) {
int pipe_a[2];
if (pipe(pipe_a)) err(1, "pipe");
int pipe_b[2];
if (pipe(pipe_b)) err(1, "pipe");
if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write");
if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee");
if (write(pipe_b[1], "xx", 2) != 2) err(1, "write");
char buf[5];
if (read(pipe_a[0], buf, 4) != 4) err(1, "read");
buf[4] = 0;
printf("got back: '%s'\n", buf);
}
$ gcc -o tee_test tee_test.c
$ ./tee_test
got back: 'abxx'
$
============
As suggested by Al Viro, fix it by creating a separate type for
non-mergeable pipe buffers, then changing the types of buffers in
splice_pipe_to_pipe() and link_pipe().
Cc: <stable@vger.kernel.org>
Fixes: 7c77f0b3f920 ("splice: implement pipe to pipe splicing")
Fixes: 70524490ee2e ("[PATCH] splice: add support for sys_tee()")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream.
The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().
These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.
But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.
This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.
Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull networking fixes from David Miller:
1) Fix refcount leak in act_ipt during replace, from Davide Caratti.
2) Set task state properly in tun during blocking reads, from Timur
Celik.
3) Leaked reference in DSA, from Wen Yang.
4) NULL deref in act_tunnel_key, from Vlad Buslov.
5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts
thus referencing garbage, from Nazarov Sergey.
6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those
attributes make no sense.
7) Fix hung sendto in tipc, from Tung Nguyen.
8) Out-of-bounds access in netlabel, from Paul Moore.
9) Grant reference leak in xen-netback, from Igor Druzhinin.
10) Fix tx stalls with lan743x, from Bryan Whitehead.
11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit.
12) Memory leak in sit on device registry failure, from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: sit: fix memory leak in sit_init_net()
net: dsa: mv88e6xxx: Fix statistics on mv88e6161
geneve: correctly handle ipv6.disable module parameter
net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
bpf: fix sanitation rewrite in case of non-pointers
ipv4: Add ICMPv6 support when parse route ipproto
MIPS: eBPF: Fix icache flush end address
lan743x: Fix TX Stall Issue
net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
net: aquantia: regression on cpus with high cores: set mode with 8 queues
selftests: fixes for UDP GRO
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
net: dsa: mv88e6xxx: Fix u64 statistics
xen-netback: don't populate the hash cache on XenBus disconnect
xen-netback: fix occasional leak of grant ref mappings under memory pressure
sctp: chunk.c: correct format string for size_t in printk
net: netem: fix skb length BUG_ON in __skb_to_sgvec
netlabel: fix out-of-bounds memory accesses
ipv4: Pass original device to ip_rcv_finish_core
...
|
|
1 << 31 is Undefined Behaviour according to the C standard.
Use U type modifier to avoid theoretical overflow.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b.
It was well-intentioned, but wrong. Overriding the exception tables for
instructions for random reasons is just wrong, and that is what the new
code did.
It caused problems for tracing, and it caused problems for strncpy_from_user(),
because the new checks made perfectly valid use cases break, rather than
catch things that did bad things.
Unchecked user space accesses are a problem, but that's not a reason to
add invalid checks that then people have to work around with silly flags
(in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
odd way to say "this commit was wrong" and was sprinked into random
places to hide the wrongness).
The real fix to unchecked user space accesses is to get rid of the
special "let's not check __get_user() and __put_user() at all" logic.
Make __{get|put}_user() be just aliases to the regular {get|put}_user()
functions, and make it impossible to access user space without having
the proper checks in places.
The raison d'être of the special double-underscore versions used to be
that the range check was expensive, and if you did multiple user
accesses, you'd do the range check up front (like the signal frame
handling code, for example). But SMAP (on x86) and PAN (on ARM) have
made that optimization pointless, because the _real_ expense is the "set
CPU flag to allow user space access".
Do let's not break the valid cases to catch invalid cases that shouldn't
even exist.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tobin C. Harding <tobin@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
"Hopefully the last pull request for this release. Fingers crossed:
1) Only refcount ESP stats on full sockets, from Martin Willi.
2) Missing barriers in AF_UNIX, from Al Viro.
3) RCU protection fixes in ipv6 route code, from Paolo Abeni.
4) Avoid false positives in untrusted GSO validation, from Willem de
Bruijn.
5) Forwarded mesh packets in mac80211 need more tailroom allocated,
from Felix Fietkau.
6) Use operstate consistently for linkup in team driver, from George
Wilkie.
7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
and PF code paths.
8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.
9) nfp eBPF code gen fixes from Jiong Wang.
10) bnxt_en firmware timeout fix from Michael Chan.
11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.
12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
net: phy: realtek: Dummy IRQ calls for RTL8366RB
tcp: repaired skbs must init their tso_segs
net/x25: fix a race in x25_bind()
net: dsa: Remove documentation for port_fdb_prepare
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
selftests: fib_tests: sleep after changing carrier. again.
net: set static variable an initial value in atl2_probe()
net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
bpf, doc: add bpf list as secondary entry to maintainers file
udp: fix possible user after free in error handler
udpv6: fix possible user after free in error handler
fou6: fix proto error handler argument type
udpv6: add the required annotation to mib type
mdio_bus: Fix use-after-free on device_register fails
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
bnxt_en: Wait longer for the firmware message response to complete.
bnxt_en: Fix typo in firmware message timeout logic.
nfp: bpf: fix ALU32 high bits clearance bug
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
Documentation: networking: switchdev: Update port parent ID section
...
|
|
This fixes a regression introduced by
commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7
"net: phy: replace PHY_HAS_INTERRUPT with a check for
config_intr and ack_interrupt".
This assumes that a PHY cannot trigger interrupt unless
it has .config_intr() or .ack_interrupt() implemented.
A later patch makes the code assume both need to be
implemented for interrupts to be present.
But this PHY (which is inside a DSA) will happily
fire interrupts without either callback.
Implement dummy callbacks for .config_intr() and
.ack_interrupt() in the phy header to fix this.
Tested on the RTL8366RB on D-Link DIR-685.
Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
GSO packets with vnet_hdr must conform to a small set of gso_types.
The below commit uses flow dissection to drop packets that do not.
But it has false positives when the skb is not fully initialized.
Dissection needs skb->protocol and skb->network_header.
Infer skb->protocol from gso_type as the two must agree.
SKB_GSO_UDP can use both ipv4 and ipv6, so try both.
Exclude callers for which network header offset is not known.
Fixes: d5be7f632bad ("net: validate untrusted gso packets without csum offload")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull keys fixes from James Morris:
- Handle quotas better, allowing full quota to be reached.
- Fix the creation of shortcuts in the assoc_array internal
representation when the index key needs to be an exact multiple of
the machine word size.
- Fix a dependency loop between the request_key contruction record and
the request_key authentication key. The construction record isn't
really necessary and can be dispensed with.
- Set the timestamp on a new key rather than leaving it as 0. This
would ordinarily be fine - provided the system clock is never set to
a time before 1970
* 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
keys: Timestamp new keys
keys: Fix dependency loop between construction record and auth key
assoc_array: Fix shortcut creation
KEYS: allow reaching the keys quotas exactly
|
|
Pull networking fixes from David Miller:
1) Fix suspend and resume in mt76x0u USB driver, from Stanislaw
Gruszka.
2) Missing memory barriers in xsk, from Magnus Karlsson.
3) rhashtable fixes in mac80211 from Herbert Xu.
4) 32-bit MIPS eBPF JIT fixes from Paul Burton.
5) Fix for_each_netdev_feature() on big endian, from Hauke Mehrtens.
6) GSO validation fixes from Willem de Bruijn.
7) Endianness fix for dwmac4 timestamp handling, from Alexandre Torgue.
8) More strict checks in tcp_v4_err(), from Eric Dumazet.
9) af_alg_release should NULL out the sk after the sock_put(), from Mao
Wenan.
10) Missing unlock in mac80211 mesh error path, from Wei Yongjun.
11) Missing device put in hns driver, from Salil Mehta.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
sky2: Increase D3 delay again
vhost: correctly check the return value of translate_desc() in log_used()
net: netcp: Fix ethss driver probe issue
net: hns: Fixes the missing put_device in positive leg for roce reset
net: stmmac: Fix a race in EEE enable callback
qed: Fix iWARP syn packet mac address validation.
qed: Fix iWARP buffer size provided for syn packet processing.
r8152: Add support for MAC address pass through on RTL8153-BD
mac80211: mesh: fix missing unlock on error in table_path_del()
net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"
net: crypto set sk to NULL when af_alg_release.
net: Do not allocate page fragments that are not skb aligned
mm: Use fixed constant in page_frag_alloc instead of size + 1
tcp: tcp_v4_err() should be more careful
tcp: clear icsk_backoff in tcp_write_queue_purge()
net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
qmi_wwan: apply SET_DTR quirk to Sierra WP7607
net: stmmac: handle endianness in dwmac4_get_timestamp
doc: Mention MSG_ZEROCOPY implementation for UDP
mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"This tree reverts a GICv3 commit (which was broken) and fixes it in
another way, by adding a memblock build-time entries quirk for ARM64"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm: Revert "Defer persistent reservations until after paging_init()"
arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two fixes on the kernel side: fix an over-eager condition that failed
larger perf ring-buffer sizes, plus fix crashes in the Intel BTS code
for a corner case, found by fuzzing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix impossible ring-buffer sizes warning
perf/x86: Add check_period PMU callback
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-02-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix lockdep false positive in bpf_get_stackid(), from Alexei.
2) several AF_XDP fixes, from Bjorn, Magnus, Davidlohr.
3) fix narrow load from struct bpf_sock, from Martin.
4) mips JIT fixes, from Paul.
5) gso handling fix in bpf helpers, from Willem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: 3b89ea9c5902 ("net: Fix for_each_netdev_feature on Big endian")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://github.com/ojeda/linux
Pull compiler attributes fixes from Miguel Ojeda:
"Clean the new GCC 9 -Wmissing-attributes warnings
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
These patch series clean these new warnings. Most of them are caused
by the module_init/exit macros"
Link: https://lore.kernel.org/lkml/20190125104353.2791-1-labbott@redhat.com/
* tag 'compiler-attributes-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
Compiler Attributes: add support for __copy (gcc >= 9)
lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
|
|
This reverts commit eff896288872d687d9662000ec9ae11b6d61766f, which
deferred the processing of persistent memory reservations to a point
where the memory may have already been allocated and overwritten,
defeating the purpose.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190215123333.21209-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In the irqchip and EFI code, we have what basically amounts to a quirk
to work around a peculiarity in the GICv3 architecture, which permits
the system memory address of LPI tables to be programmable only once
after a CPU reset. This means kexec kernels must use the same memory
as the first kernel, and thus ensure that this memory has not been
given out for other purposes by the time the ITS init code runs, which
is not very early for secondary CPUs.
On systems with many CPUs, these reservations could overflow the
memblock reservation table, and this was addressed in commit:
eff896288872 ("efi/arm: Defer persistent reservations until after paging_init()")
However, this turns out to have made things worse, since the allocation
of page tables and heap space for the resized memblock reservation table
itself may overwrite the regions we are attempting to reserve, which may
cause all kinds of corruption, also considering that the ITS will still
be poking bits into that memory in response to incoming MSIs.
So instead, let's grow the static memblock reservation table on such
systems so it can accommodate these reservations at an earlier time.
This will permit us to revert the above commit in a subsequent patch.
[ mingo: Minor cleanups. ]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190215123333.21209-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Syzkaller again found a path to a kernel crash through bad gso input.
By building an excessively large packet to cause an skb field to wrap.
If VIRTIO_NET_HDR_F_NEEDS_CSUM was set this would have been dropped in
skb_partial_csum_set.
GSO packets that do not set checksum offload are suspicious and rare.
Most callers of virtio_net_hdr_to_skb already pass them to
skb_probe_transport_header.
Move that test forward, change it to detect parse failure and drop
packets on failure as those cleary are not one of the legitimate
VIRTIO_NET_HDR_GSO types.
Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The features attribute is of type u64 and stored in the native endianes on
the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
and goes over the bits in this area. On little Endian systems this also
works with an u64 as the most significant bit is on the highest address,
but on big endian the words are swapped. When we expect bit 15 here we get
bit 47 (15 + 32).
This patch converts it more or less to its own for_each_set_bit()
implementation which works on 64 bit integers directly. This is then
completely in host endianness and should work like expected.
Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Hauke Mehrtens <hauke.mehrtens@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the request_key() upcall mechanism there's a dependency loop by which if
a key type driver overrides the ->request_key hook and the userspace side
manages to lose the authorisation key, the auth key and the internal
construction record (struct key_construction) can keep each other pinned.
Fix this by the following changes:
(1) Killing off the construction record and using the auth key instead.
(2) Including the operation name in the auth key payload and making the
payload available outside of security/keys/.
(3) The ->request_key hook is given the authkey instead of the cons
record and operation name.
Changes (2) and (3) allow the auth key to naturally be cleaned up if the
keyring it is in is destroyed or cleared or the auth key is unlinked.
Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers for all the init/cleanup_module
aliases in the kernel (defined by the module_init/exit macros),
ending up being very noisy.
These aliases point to the __init/__exit functions of a module,
which are defined as __cold (among other attributes). However,
the aliases themselves do not have the __cold attribute.
Since the compiler behaves differently when compiling a __cold
function as well as when compiling paths leading to calls
to __cold functions, the warning is trying to point out
the possibly-forgotten attribute in the alias.
In order to keep the warning enabled, we decided to silence
this case. Ideally, we would mark the aliases directly
as __init/__exit. However, there are currently around 132 modules
in the kernel which are missing __init/__exit in their init/cleanup
functions (either because they are missing, or for other reasons,
e.g. the functions being called from somewhere else); and
a section mismatch is a hard error.
A conservative alternative was to mark the aliases as __cold only.
However, since we would like to eventually enforce __init/__exit
to be always marked, we chose to use the new __copy function
attribute (introduced by GCC 9 as well to deal with this).
With it, we copy the attributes used by the target functions
into the aliases. This way, functions that were not marked
as __init/__exit won't have their aliases marked either,
and therefore there won't be a section mismatch.
Note that the warning would go away marking either the extern
declaration, the definition, or both. However, we only mark
the definition of the alias, since we do not want callers
(which only see the declaration) to be compiled as if the function
was __cold (and therefore the paths leading to those calls
would be assumed to be unlikely).
Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/
Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
|
|
From the GCC manual:
copy
copy(function)
The copy attribute applies the set of attributes with which function
has been declared to the declaration of the function to which
the attribute is applied. The attribute is designed for libraries
that define aliases or function resolvers that are expected
to specify the same set of attributes as their targets. The copy
attribute can be used with functions, variables, or types. However,
the kind of symbol to which the attribute is applied (either
function or variable) must match the kind of symbol to which
the argument refers. The copy attribute copies only syntactic and
semantic attributes but not attributes that affect a symbol’s
linkage or visibility such as alias, visibility, or weak.
The deprecated attribute is also not copied.
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
Using __copy(f) we can copy the __cold attribute from f to g:
void __cold f(void) {}
void __copy(f) __alias("f") g(void);
This attribute is most useful to deal with situations where an alias
is declared but we don't know the exact attributes the target has.
For instance, in the kernel, the widely used module_init/exit macros
define the init/cleanup_module aliases, but those cannot be marked
always as __init/__exit since some modules do not have their
functions marked as such.
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes intended for v5.0-rc7.
MMC core:
- Fix deadlock bug for block I/O requests
MMC host:
- sunxi: Disable broken HS-DDR mode for H5 by default
- sunxi: Avoid unsupported speed modes declared via DT
- meson-gx: Restore interrupt name"
* tag 'mmc-v5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-gx: fix interrupt name
mmc: block: handle complete_work on separate workqueue
mmc: sunxi: Filter out unsupported modes declared in the device tree
mmc: sunxi: Disable HS-DDR mode for H5 eMMC controller by default
|
|
Pull networking fixes from David Miller:
1) Fix MAC address setting in mac80211 pmsr code, from Johannes Berg.
2) Probe SFP modules after being attached, from Russell King.
3) Byte ordering bug in SMC rx_curs_confirmed code, from Ursula Braun.
4) Revert some r8169 changes that are causing regressions, from Heiner
Kallweit.
5) Fix spurious connection timeouts in netfilter nat code, from Florian
Westphal.
6) SKB leak in tipc, from Hoang Le.
7) Short packet checkum issue in mlx4, similar to a previous mlx5
change, from Saeed Mahameed. The issue is that whilst padding bytes
are usually zero, it is not guarateed and the hardware doesn't take
the padding bytes into consideration when generating the checksum.
8) Fix various races in cls_tcindex, from Cong Wang.
9) Need to set stream ext to NULL before freeing in SCTP code, from Xin
Long.
10) Fix locking in phy_is_started, from Heiner Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
net: ethernet: freescale: set FEC ethtool regs version
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
mm: page_alloc: fix ref bias in page_frag_allo |