| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]
Currently, SO_REUSEPORT does not work well if connected sockets are in a
UDP reuseport group.
Then reuseport_has_conns() returns true and the result of
reuseport_select_sock() is discarded. Also, unconnected sockets have the
same score, hence only does the first unconnected socket in udp_hslot
always receive all packets sent to unconnected sockets.
So, the result of reuseport_select_sock() should be used for load
balancing.
The noteworthy point is that the unconnected sockets placed after
connected sockets in sock_reuseport.socks will receive more packets than
others because of the algorithm in reuseport_select_sock().
index | connected | reciprocal_scale | result
---------------------------------------------
0 | no | 20% | 40%
1 | no | 20% | 20%
2 | yes | 20% | 0%
3 | no | 20% | 40%
4 | yes | 20% | 0%
If most of the sockets are connected, this can be a problem, but it still
works better than now.
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]
If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.
However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.
Therefore, reuseport_grow() should copy has_conns.
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3ecdda3e9ad837cf9cb41b6faa11b1af3a5abc0c ]
When adding a stream with stream reconf, the new stream firstly is in
CLOSED state but new out chunks can still be enqueued. Then once gets
the confirmation from the peer, the state will change to OPEN.
However, if the peer denies, it needs to roll back the stream. But when
doing that, it only sets the stream outcnt back, and the chunks already
in the new stream don't get purged. It caused these chunks can still be
dequeued in sctp_outq_dequeue_data().
As its stream is still in CLOSE, the chunk will be enqueued to the head
again by sctp_outq_head_data(). This chunk will never be sent out, and
the chunks after it can never be dequeued. The assoc will be 'hung' in
a dead loop of sending this chunk.
To fix it, this patch is to purge these chunks already in the new
stream by calling sctp_stream_shrink_out() when failing to do the
addstream reconf.
Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8f13399db22f909a35735bf8ae2f932e0c8f0e30 ]
It's not necessary to go list_for_each for outq->out_chunk_list
when new outcnt >= old outcnt, as no chunk with higher sid than
new (outcnt - 1) exists in the outqueue.
While at it, also move the list_for_each code in a new function
sctp_stream_shrink_out(), which will be used in the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 17ad73e941b71f3bec7523ea4e9cbc3752461c2d ]
We recently added some bounds checking in ax25_connect() and
ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because
they were no longer required.
Unfortunately, I believe they are required to prevent integer overflows
so I have added them back.
Fixes: 8885bb0621f0 ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()")
Fixes: 2f2a7ffad5c6 ("AX.25: Fix out-of-bounds read in ax25_connect()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ]
Previously TLP may send multiple probes of new data in one
flight. This happens when the sender is cwnd limited. After the
initial TLP containing new data is sent, the sender receives another
ACK that acks partial inflight. It may re-arm another TLP timer
to send more, if no further ACK returns before the next TLP timeout
(PTO) expires. The sender may send in theory a large amount of TLP
until send queue is depleted. This only happens if the sender sees
such irregular uncommon ACK pattern. But it is generally undesirable
behavior during congestion especially.
The original TLP design restrict only one TLP probe per inflight as
published in "Reducing Web Latency: the Virtue of Gentle Aggression",
SIGCOMM 2013. This patch changes TLP to send at most one probe
per inflight.
Note that if the sender is app-limited, TLP retransmits old data
and did not have this issue.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 639f181f0ee20d3249dbc55f740f0167267180f0 ]
rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.
Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
as this particular error doesn't get stored in ->sk_err by the networking
core.
Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
errors (there's no way for it to report which call, if any, the error was
caused by).
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]
When vlan_newlink call register_vlan_dev fails, it might return error
with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.
BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
hex dump (first 32 bytes):
76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00 vlan2...........
00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00 .E(.............
backtrace:
[<0000000047527e31>] kmalloc_node include/linux/slab.h:578 [inline]
[<0000000047527e31>] kvmalloc_node+0x33/0xd0 mm/util.c:574
[<000000002b59e3bc>] kvmalloc include/linux/mm.h:753 [inline]
[<000000002b59e3bc>] kvzalloc include/linux/mm.h:761 [inline]
[<000000002b59e3bc>] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
[<000000006076752a>] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
[<00000000572b3be5>] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
[<00000000e84ea553>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
[<0000000052c7c0a9>] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
[<000000004b5cb379>] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
[<00000000c71c20d3>] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
[<00000000c71c20d3>] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
[<00000000cca72fa9>] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
[<000000009221ebf7>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<000000009221ebf7>] sock_sendmsg+0x109/0x140 net/socket.c:672
[<000000001c30ffe4>] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
[<00000000b71ca6f3>] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
[<0000000007297384>] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
[<000000000eb29b11>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<000000006839b4d0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit af9f691f0f5bdd1ade65a7b84927639882d7c3e5 ]
We have to detach sock from socket in qrtr_release(),
otherwise skb->sk may still reference to this socket
when the skb is released in tun->queue, particularly
sk->sk_wq still points to &sock->wq, which leads to
a UAF.
Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com
Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space")
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b0a422772fec29811e293c7c0e6f991c0fd9241d ]
We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is
checked.
Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]
When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.
root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 46ef5b89ec0ecf290d74c4aee844f063933c4da4 ]
KASAN report null-ptr-deref error when register_netdev() failed:
KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7]
CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ #12
Call Trace:
ip6gre_init_net+0x4ab/0x580
? ip6gre_tunnel_uninit+0x3f0/0x3f0
ops_init+0xa8/0x3c0
setup_net+0x2de/0x7e0
? rcu_read_lock_bh_held+0xb0/0xb0
? ops_init+0x3c0/0x3c0
? kasan_unpoison_shadow+0x33/0x40
? __kasan_kmalloc.constprop.0+0xc2/0xd0
copy_net_ns+0x27d/0x530
create_new_namespaces+0x382/0xa30
unshare_nsproxy_namespaces+0xa1/0x1d0
ksys_unshare+0x39c/0x780
? walk_process_tree+0x2a0/0x2a0
? trace_hardirqs_on+0x4a/0x1b0
? _raw_spin_unlock_irq+0x1f/0x30
? syscall_trace_enter+0x1a7/0x330
? do_syscall_64+0x1c/0xa0
__x64_sys_unshare+0x2d/0x40
do_syscall_64+0x56/0xa0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later
access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving
'ign->fb_tunnel_dev' to local variable ndev.
Fixes: dafabb6590cb ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]
IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.
Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8885bb0621f01a6c82be60a91e5fc0f6e2f71186 ]
Checks on `addr_len` and `usax->sax25_ndigis` are insufficient.
ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7
or 8. Fix it.
It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since
`addr_len` is guaranteed to be less than or equal to
`sizeof(struct full_sockaddr_ax25)`
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2f2a7ffad5c6cbf3d438e813cfdc88230e185ba6 ]
Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient.
ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis`
equals to 7 or 8. Fix it.
This issue has been reported as a KMSAN uninit-value bug, because in such
a case, ax25_connect() reaches into the uninitialized portion of the
`struct sockaddr_storage` statically allocated in __sys_connect().
It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because
`addr_len` is guaranteed to be less than or equal to
`sizeof(struct full_sockaddr_ax25)`.
Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8210e344ccb798c672ab237b1a4f241bda08909b ]
The sync_thread_backup only checks sk_receive_queue is empty or not,
there is a situation which cannot sync the connection entries when
sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf,
the sync packets are dropped in __udp_enqueue_schedule_skb, this is
because the packets in reader_queue is not read, so the rmem is
not reclaimed.
Here I add the check of whether the reader_queue of the udp sock is
empty or not to solve this problem.
Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
Reported-by: zhouxudong <zhouxudong8@huawei.com>
Signed-off-by: guodeqing <geffrey.guo@huawei.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f961134a612c793d5901a93d85a29337c74af978 ]
Commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free
on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock'
pointer, but we forgot to annotate it.
This patch adds the annotation to fix the following sparse errors:
net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock *
Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0b467b63870d9c05c81456aa9bfee894ab2db3b6 ]
Without this patch, eapol frames cannot be received in mesh
mode, when 802.1X should be used. Initially only a MGTK is
defined, which is found and set as rx->key, when there are
no other keys set. ieee80211_drop_unencrypted would then
drop these eapol frames, as they are data frames without
encryption and there exists some rx->key.
Fix this by differentiating between mesh eapol frames and
other data frames with existing rx->key. Allow mesh mesh
eapol frames only if they are for our vif address.
With this patch in-place, ieee80211_rx_h_mesh_fwding continues
after the ieee80211_drop_unencrypted check and notices, that
these eapol frames have to be delivered locally, as they should.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200625104214.50319-1-markus.theil@tu-ilmenau.de
[small code cleanups]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2f3fead62144002557f322c2a7c15e1255df0653 upstream.
Currently target_copy() is used only for sending linger pings, so
this doesn't come up, but generally omitting recovery_deletes can
result in unneeded resends (force_resend in calc_target()).
Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 912288442cb2f431bf3c8cb097a5de83bc6dbac1 ]
Currently the header size calculations are using an assignment
operator instead of a += operator when accumulating the header
size leading to incorrect sizes. Fix this by using the correct
operator.
Addresses-Coverity: ("Unused value")
Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0da7536fb47f51df89ccfcb1fa09f249d9accec5 ]
When no full socket is available, skbs are sent over a per-netns
control socket. Its sk_mark is temporarily adjusted to match that
of the real (request or timewait) socket or to reflect an incoming
skb, so that the outgoing skb inherits this in __ip_make_skb.
Introduction of the socket cookie mark field broke this. Now the
skb is set through the cookie and cork:
<caller> # init sockc.mark from sk_mark or cmsg
ip_append_data
ip_setup_cork # convert sockc.mark to cork mark
ip_push_pending_frames
ip_finish_skb
__ip_make_skb # set skb->mark to cork mark
But I missed these special control sockets. Update all callers of
__ip(6)_make_skb that were originally missed.
For IPv6, the same two icmp(v6) paths are affected. The third
case is not, as commit 92e55f412cff ("tcp: don't annotate
mark on control socket from tcp_v6_send_response()") replaced
the ctl_sk->sk_mark with passing the mark field directly as a
function argument. That commit predates the commit that
introduced the bug.
Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.
sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.
The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.
This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1ca0fafd73c5268e8fc4b997094b8bb2bfe8deea ]
This essentially reverts commit 721230326891 ("tcp: md5: reject TCP_MD5SIG
or TCP_MD5SIG_EXT on established sockets")
Mathieu reported that many vendors BGP implementations can
actually switch TCP MD5 on established flows.
Quoting Mathieu :
Here is a list of a few network vendors along with their behavior
with respect to TCP MD5:
- Cisco: Allows for password to be changed, but within the hold-down
timer (~180 seconds).
- Juniper: When password is initially set on active connection it will
reset, but after that any subsequent password changes no network
resets.
- Nokia: No notes on if they flap the tcp connection or not.
- Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
both sides are ok with new passwords.
- Meta-Switch: Expects the password to be set before a connection is
attempted, but no further info on whether they reset the TCP
connection on a change.
- Avaya: Disable the neighbor, then set password, then re-enable.
- Zebos: Would normally allow the change when socket connected.
We can revert my prior change because commit 9424e2e7ad93 ("tcp: md5: fix potential
overestimation of TCP option space") removed the leak of 4 kernel bytes to
the wire that was the main reason for my patch.
While doing my investigations, I found a bug when a MD5 key is changed, leading
to these commits that stable teams want to consider before backporting this revert :
Commit 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Commit e6ced831ef11 ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers")
Fixes: 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e6ced831ef11a2a06e8d00aad9d4fc05b610bf38 ]
My prior fix went a bit too far, according to Herbert and Mathieu.
Since we accept that concurrent TCP MD5 lookups might see inconsistent
keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()
Clearing all key->key[] is needed to avoid possible KMSAN reports,
if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.
data_race() was added in linux-5.8 and will prevent KCSAN reports,
this can safely be removed in stable backports, if data_race() is
not yet backported.
v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()
Fixes: 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Marco Elver <elver@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e114e1e8ac9d31f25b9dd873bab5d80c1fc482ca ]
Whenever cookie_init_timestamp() has been used to encode
ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK.
Otherwise, tcp_synack_options() will still advertize options like WSCALE
that we can not deduce later when receiving the packet from the client
to complete 3WHS.
Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet,
but we can not know for sure that all TCP stacks have the same logic.
Before the fix a tcpdump would exhibit this wrong exchange :
10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0
10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0
10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0
10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12
10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0
After this patch the exchange looks saner :
11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0
11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0
11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0
11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12
11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0
11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12
11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0
Of course, no SACK block will ever be added later, but nothing should break.
Technically, we could remove the 4 nops included in MD5+TS options,
but again some stacks could break seeing not conventional alignment.
Fixes: 4957faade11b ("TCPCT part 1g: Responder Cookie => Initiator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6a2febec338df7e7699a52d00b2e1207dcf65b28 ]
MD5 keys are read with RCU protection, and tcp_md5_do_add()
might update in-place a prior key.
Normally, typical RCU updates would allocate a new piece
of memory. In this case only key->key and key->keylen might
be updated, and we do not care if an incoming packet could
see the old key, the new one, or some intermediate value,
since changing the key on a live flow is known to be problematic
anyway.
We only want to make sure that in the case key->keylen
is changed, cpus in tcp_md5_hash_key() wont try to use
uninitialized data, or crash because key->keylen was
read twice to feed sg_init_one() and ahash_request_set_crypt()
Fixes: 9ea88a153001 ("tcp: md5: check md5 signature without socket lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ce69e563b325f620863830c246a8698ccea52048 ]
syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
just copies all the memory, the allocated pointer will be copied as
well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
If now the socket will be destroyed before the congestion-control
has properly been initialized (through a call to tcp_init_transfer), we
will end up freeing memory that does not belong to that particular
socket, opening the door to a double-free:
[ 11.413102] ==================================================================
[ 11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
[ 11.415329]
[ 11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
[ 11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 11.418148] Call Trace:
[ 11.418534] <IRQ>
[ 11.418834] dump_stack+0x7d/0xb0
[ 11.419297] print_address_description.constprop.0+0x1a/0x210
[ 11.422079] kasan_report_invalid_free+0x51/0x80
[ 11.423433] __kasan_slab_free+0x15e/0x170
[ 11.424761] kfree+0x8c/0x230
[ 11.425157] tcp_cleanup_congestion_control+0x58/0xd0
[ 11.425872] tcp_v4_destroy_sock+0x57/0x5a0
[ 11.426493] inet_csk_destroy_sock+0x153/0x2c0
[ 11.427093] tcp_v4_syn_recv_sock+0xb29/0x1100
[ 11.427731] tcp_get_cookie_sock+0xc3/0x4a0
[ 11.429457] cookie_v4_check+0x13d0/0x2500
[ 11.433189] tcp_v4_do_rcv+0x60e/0x780
[ 11.433727] tcp_v4_rcv+0x2869/0x2e10
[ 11.437143] ip_protocol_deliver_rcu+0x23/0x190
[ 11.437810] ip_local_deliver+0x294/0x350
[ 11.439566] __netif_receive_skb_one_core+0x15d/0x1a0
[ 11.441995] process_backlog+0x1b1/0x6b0
[ 11.443148] net_rx_action+0x37e/0xc40
[ 11.445361] __do_softirq+0x18c/0x61a
[ 11.445881] asm_call_on_stack+0x12/0x20
[ 11.446409] </IRQ>
[ 11.446716] do_softirq_own_stack+0x34/0x40
[ 11.447259] do_softirq.part.0+0x26/0x30
[ 11.447827] __local_bh_enable_ip+0x46/0x50
[ 11.448406] ip_finish_output2+0x60f/0x1bc0
[ 11.450109] __ip_queue_xmit+0x71c/0x1b60
[ 11.451861] __tcp_transmit_skb+0x1727/0x3bb0
[ 11.453789] tcp_rcv_state_process+0x3070/0x4d3a
[ 11.456810] tcp_v4_do_rcv+0x2ad/0x780
[ 11.457995] __release_sock+0x14b/0x2c0
[ 11.458529] release_sock+0x4a/0x170
[ 11.459005] __inet_stream_connect+0x467/0xc80
[ 11.461435] inet_stream_connect+0x4e/0xa0
[ 11.462043] __sys_connect+0x204/0x270
[ 11.465515] __x64_sys_connect+0x6a/0xb0
[ 11.466088] do_syscall_64+0x3e/0x70
[ 11.466617] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 11.467341] RIP: 0033:0x7f56046dc469
[ 11.467844] Code: Bad RIP value.
[ 11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[ 11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
[ 11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
[ 11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
[ 11.474321]
[ 11.474527] Allocated by task 4884:
[ 11.475031] save_stack+0x1b/0x40
[ 11.475548] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 11.476182] tcp_cdg_init+0xf0/0x150
[ 11.476744] tcp_init_congestion_control+0x9b/0x3a0
[ 11.477435] tcp_set_congestion_control+0x270/0x32f
[ 11.478088] do_tcp_setsockopt.isra.0+0x521/0x1a00
[ 11.478744] __sys_setsockopt+0xff/0x1e0
[ 11.479259] __x64_sys_setsockopt+0xb5/0x150
[ 11.479895] do_syscall_64+0x3e/0x70
[ 11.480395] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 11.481097]
[ 11.481321] Freed by task 4872:
[ 11.481783] save_stack+0x1b/0x40
[ 11.482230] __kasan_slab_free+0x12c/0x170
[ 11.482839] kfree+0x8c/0x230
[ 11.483240] tcp_cleanup_congestion_control+0x58/0xd0
[ 11.483948] tcp_v4_destroy_sock+0x57/0x5a0
[ 11.484502] inet_csk_destroy_sock+0x153/0x2c0
[ 11.485144] tcp_close+0x932/0xfe0
[ 11.485642] inet_release+0xc1/0x1c0
[ 11.486131] __sock_release+0xc0/0x270
[ 11.486697] sock_close+0xc/0x10
[ 11.487145] __fput+0x277/0x780
[ 11.487632] task_work_run+0xeb/0x180
[ 11.488118] __prepare_exit_to_usermode+0x15a/0x160
[ 11.488834] do_syscall_64+0x4a/0x70
[ 11.489326] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Wei Wang fixed a part of these CDG-malloc issues with commit c12014440750
("tcp: memset ca_priv data to 0 properly").
This patch here fixes the listener-scenario: We make sure that listeners
setting the congestion-control through setsockopt won't initialize it
(thus CDG never allocates on listeners). For those who use AF_UNSPEC to
reuse a socket, tcp_disconnect() is changed to cleanup afterwards.
(The issue can be reproduced at least down to v4.4.x.)
Cc: Wei Wang <weiwan@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ba3bb0e76ccd464bb66665a1941fabe55dadb3ba ]
Whenever tcp_try_rmem_schedule() returns an error, we are under
trouble and should make sure to wakeup readers so that they
can drain socket queues and eventually make room.
Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.
However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).
To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.
To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.
v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()
v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice
Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 306381aec7c2b5a658eebca008c8a1b666536cba ]
When tcf_block_get() fails inside atm_tc_init(),
atm_tc_put() is called to release the qdisc p->link.q.
But the flow->ref prevents it to do so, as the flow->ref
is still zero.
Fix this by moving the p->link.ref initialization before
tcf_block_get().
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a9b1110162357689a34992d5c925852948e5b9fd ]
syzbot was to trigger a bug by tricking AF_LLC with
non sensible addr->sllc_arphrd
It seems clear LLC requires an Ethernet device.
Back in commit abf9d537fea2 ("llc: add support for SO_BINDTODEVICE")
Octavian Purdila added possibility for application to use a zero
value for sllc_arphrd, convert it to ARPHRD_ETHER to not cause
regressions on existing applications.
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:199 [inline]
BUG: KASAN: use-after-free in list_empty include/linux/list.h:268 [inline]
BUG: KASAN: use-after-free in waitqueue_active include/linux/wait.h:126 [inline]
BUG: KASAN: use-after-free in wq_has_sleeper include/linux/wait.h:160 [inline]
BUG: KASAN: use-after-free in skwq_has_sleeper include/net/sock.h:2092 [inline]
BUG: KASAN: use-after-free in sock_def_write_space+0x642/0x670 net/core/sock.c:2813
Read of size 8 at addr ffff88801e0b4078 by task ksoftirqd/3/27
CPU: 3 PID: 27 Comm: ksoftirqd/3 Not tainted 5.5.0-rc1-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:639
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
__read_once_size include/linux/compiler.h:199 [inline]
list_empty include/linux/list.h:268 [inline]
waitqueue_active include/linux/wait.h:126 [inline]
wq_has_sleeper include/linux/wait.h:160 [inline]
skwq_has_sleeper include/net/sock.h:2092 [inline]
sock_def_write_space+0x642/0x670 net/core/sock.c:2813
sock_wfree+0x1e1/0x260 net/core/sock.c:1958
skb_release_head_state+0xeb/0x260 net/core/skbuff.c:652
skb_release_all+0x16/0x60 net/core/skbuff.c:663
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:838 [inline]
consume_skb+0xfb/0x410 net/core/skbuff.c:832
__dev_kfree_skb_any+0xa4/0xd0 net/core/dev.c:2967
dev_kfree_skb_any include/linux/netdevice.h:3650 [inline]
e1000_unmap_and_free_tx_resource.isra.0+0x21b/0x3a0 drivers/net/ethernet/intel/e1000/e1000_main.c:1963
e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3854 [inline]
e1000_clean+0x4cc/0x1d10 drivers/net/ethernet/intel/e1000/e1000_main.c:3796
napi_poll net/core/dev.c:6532 [inline]
net_rx_action+0x508/0x1120 net/core/dev.c:6600
__do_softirq+0x262/0x98c kernel/softirq.c:292
run_ksoftirqd kernel/softirq.c:603 [inline]
run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Allocated by task 8247:
save_stack+0x23/0x90 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
__kasan_kmalloc mm/kasan/common.c:513 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
slab_post_alloc_hook mm/slab.h:584 [inline]
slab_alloc mm/slab.c:3320 [inline]
kmem_cache_alloc+0x121/0x710 mm/slab.c:3484
sock_alloc_inode+0x1c/0x1d0 net/socket.c:240
alloc_inode+0x68/0x1e0 fs/inode.c:230
new_inode_pseudo+0x19/0xf0 fs/inode.c:919
sock_alloc+0x41/0x270 net/socket.c:560
__sock_create+0xc2/0x730 net/socket.c:1384
sock_create net/socket.c:1471 [inline]
__sys_socket+0x103/0x220 net/socket.c:1513
__do_sys_socket net/socket.c:1522 [inline]
__se_sys_socket net/socket.c:1520 [inline]
__ia32_sys_socket+0x73/0xb0 net/socket.c:1520
do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline]
do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
Freed by task 17:
save_stack+0x23/0x90 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
kasan_set_free_info mm/kasan/common.c:335 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
__cache_free mm/slab.c:3426 [inline]
kmem_cache_free+0x86/0x320 mm/slab.c:3694
sock_free_inode+0x20/0x30 net/socket.c:261
i_callback+0x44/0x80 fs/inode.c:219
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch kernel/rcu/tree.c:2183 [inline]
rcu_core+0x570/0x1540 kernel/rcu/tree.c:2408
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2417
__do_softirq+0x262/0x98c kernel/softirq.c:292
The buggy address belongs to the object at ffff88801e0b4000
which belongs to the cache sock_inode_cache of size 1152
The buggy address is located 120 bytes inside of
1152-byte region [ffff88801e0b4000, ffff88801e0b4480)
The buggy address belongs to the page:
page:ffffea0000782d00 refcount:1 mapcount:0 mapping:ffff88807aa59c40 index:0xffff88801e0b4ffd
raw: 00fffe0000000200 ffffea00008e6c88 ffffea0000782d48 ffff88807aa59c40
raw: ffff88801e0b4ffd ffff88801e0b4000 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88801e0b3f00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
ffff88801e0b3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88801e0b4000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88801e0b4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801e0b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: abf9d537fea2 ("llc: add support for SO_BINDTODEVICE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 27d53323664c549b5bb2dfaaf6f7ad6e0376a64e ]
In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
skb's dst. However, it will eventually call inet6_csk_xmit() or
ip_queue_xmit() where skb's dst will be overwritten by:
skb_dst_set_noref(skb, dst);
without releasing the old dst in skb. Then it causes dst/dev refcnt leak:
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
This can be reproduced by simply running:
# modprobe l2tp_eth && modprobe l2tp_ip
# sh ./tools/testing/selftests/net/l2tp.sh
So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
should be dropped. This patch is to fix it by removing skb_dst_set()
from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().
Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Tested-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|