summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2021-07-25tcp: consistently disable header prediction for mptcpPaolo Abeni1-0/+4
commit 71158bb1f2d2da61385c58fc1114e1a1c19984ba upstream. The MPTCP receive path is hooked only into the TCP slow-path. The DSS presence allows plain MPTCP traffic to hit that consistently. Since commit e1ff9e82e2ea ("net: mptcp: improve fallback to TCP"), when an MPTCP socket falls back to TCP, it can hit the TCP receive fast-path, and delay or stop triggering the event notification. Address the issue explicitly disabling the header prediction for MPTCP sockets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/200 Fixes: e1ff9e82e2ea ("net: mptcp: improve fallback to TCP") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-25net: validate lwtstate->data before returning from skb_tunnel_info()Taehee Yoo1-1/+3
commit 67a9c94317402b826fc3db32afc8f39336803d97 upstream. skb_tunnel_info() returns pointer of lwtstate->data as ip_tunnel_info type without validation. lwtstate->data can have various types such as mpls_iptunnel_encap, etc and these are not compatible. So skb_tunnel_info() should validate before returning that pointer. Splat looks like: BUG: KASAN: slab-out-of-bounds in vxlan_get_route+0x418/0x4b0 [vxlan] Read of size 2 at addr ffff888106ec2698 by task ping/811 CPU: 1 PID: 811 Comm: ping Not tainted 5.13.0+ #1195 Call Trace: dump_stack_lvl+0x56/0x7b print_address_description.constprop.8.cold.13+0x13/0x2ee ? vxlan_get_route+0x418/0x4b0 [vxlan] ? vxlan_get_route+0x418/0x4b0 [vxlan] kasan_report.cold.14+0x83/0xdf ? vxlan_get_route+0x418/0x4b0 [vxlan] vxlan_get_route+0x418/0x4b0 [vxlan] [ ... ] vxlan_xmit_one+0x148b/0x32b0 [vxlan] [ ... ] vxlan_xmit+0x25c5/0x4780 [vxlan] [ ... ] dev_hard_start_xmit+0x1ae/0x6e0 __dev_queue_xmit+0x1f39/0x31a0 [ ... ] neigh_xmit+0x2f9/0x940 mpls_xmit+0x911/0x1600 [mpls_iptunnel] lwtunnel_xmit+0x18f/0x450 ip_finish_output2+0x867/0x2040 [ ... ] Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-25net: ipv6: fix return value of ip6_skb_dst_mtuVadim Fedorenko1-1/+1
commit 40fc3054b45820c28ea3c65e2c86d041dc244a8a upstream. Commit 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") introduced ip6_skb_dst_mtu with return value of signed int which is inconsistent with actually returned values. Also 2 users of this function actually assign its value to unsigned int variable and only __xfrm6_output assigns result of this function to signed variable but actually uses as unsigned in further comparisons and calls. Change this function to return unsigned int value. Fixes: 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-19sctp: validate from_addr_param returnMarcelo Ricardo Leitner1-1/+1
[ Upstream commit 0c5dc070ff3d6246d22ddd931f23a6266249e3db ] Ilja reported that, simply putting it, nothing was validating that from_addr_param functions were operating on initialized memory. That is, the parameter itself was being validated by sctp_walk_params, but it doesn't check for types and their specific sizes and it could be a 0-length one, causing from_addr_param to potentially work over the next parameter or even uninitialized memory. The fix here is to, in all calls to from_addr_param, check if enough space is there for the wanted IP address type. Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19flow_offload: action should not be NULL when it is referencedgushengxian1-5/+7
[ Upstream commit 9ea3e52c5bc8bb4a084938dc1e3160643438927a ] "action" should not be NULL when it is referenced. Signed-off-by: gushengxian <13145886936@163.com> Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: lwtunnel: handle MTU calculation in forwadingVadim Fedorenko2-8/+20
[ Upstream commit fade56410c22cacafb1be9f911a0afd3701d8366 ] Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved fragmentation logic away from lwtunnel by carry encap headroom and use it in output MTU calculation. But the forwarding part was not covered and created difference in MTU for output and forwarding and further to silent drops on ipv4 forwarding path. Fix it by taking into account lwtunnel encap headroom. The same commit also introduced difference in how to treat RTAX_MTU in IPv4 and IPv6 where latter explicitly removes lwtunnel encap headroom from route MTU. Make IPv4 version do the same. Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14Bluetooth: Fix Set Extended (Scan Response) DataLuiz Augusto von Dentz2-6/+8
[ Upstream commit c9ed0a7077306f9d41d74fb006ab5dbada8349c5 ] These command do have variable length and the length can go up to 251, so this changes the struct to not use a fixed size and then when creating the PDU only the actual length of the data send to the controller. Fixes: a0fb3726ba551 ("Bluetooth: Use Set ext adv/scan rsp data if controller supports") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: macsec: fix the length used to copy the key for offloadingAntoine Tenart1-1/+1
[ Upstream commit 1f7fe5121127e037b86592ba42ce36515ea0e3f7 ] The key length used when offloading macsec to Ethernet or PHY drivers was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length accepted in uAPI) and secy->key_len to copy it. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: sched: add barrier to ensure correct ordering for lockless qdiscYunsheng Lin1-0/+12
[ Upstream commit 89837eb4b2463c556a123437f242d6c2bc62ce81 ] The spin_trylock() was assumed to contain the implicit barrier needed to ensure the correct ordering between STATE_MISSED setting/clearing and STATE_MISSED checking in commit a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc"). But it turns out that spin_trylock() only has load-acquire semantic, for strongly-ordered system(like x86), the compiler barrier implicitly contained in spin_trylock() seems enough to ensure the correct ordering. But for weakly-orderly system (like arm64), the store-release semantic is needed to ensure the correct ordering as clear_bit() and test_bit() is store operation, see queued_spin_lock(). So add the explicit barrier to ensure the correct ordering for the above case. Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14xsk: Fix missing validation for skb and unaligned modeMagnus Karlsson1-2/+7
[ Upstream commit 2f99619820c2269534eb2c0cde44870313c6d353 ] Fix a missing validation of a Tx descriptor when executing in skb mode and the umem is in unaligned mode. A descriptor could point to a buffer straddling the end of the umem, thus effectively tricking the kernel to read outside the allowed umem region. This could lead to a kernel crash if that part of memory is not mapped. In zero-copy mode, the descriptor validation code rejects such descriptors by checking a bit in the DMA address that tells us if the next page is physically contiguous or not. For the last page in the umem, this bit is not set, therefore any descriptor pointing to a packet straddling this last page boundary will be rejected. However, the skb path does not use this bit since it copies out data and can do so to two different pages. (It also does not have the array of DMA address, so it cannot even store this bit.) The code just returned that the packet is always physically contiguous. But this is unfortunately also returned for the last page in the umem, which means that packets that cross the end of the umem are being allowed, which they should not be. Fix this by introducing a check for this in the SKB path only, not penalizing the zero-copy path. Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net/sched: act_vlan: Fix modify to allow 0Boris Sukholitko1-0/+1
[ Upstream commit 9c5eee0afca09cbde6bd00f77876754aaa552970 ] Currently vlan modification action checks existence of vlan priority by comparing it to 0. Therefore it is impossible to modify existing vlan tag to have priority 0. For example, the following tc command will change the vlan id but will not affect vlan priority: tc filter add dev eth1 ingress matchall action vlan modify id 300 \ priority 0 pipe mirred egress redirect dev eth2 The incoming packet on eth1: ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4 will be changed to: ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4 although the user has intended to have p == 0. The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params and rely on it when deciding to set the priority. Fixes: 45a497f2d149a4a8061c (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14xfrm: xfrm_state_mtu should return at least 1280 for ipv6Sabrina Dubroca1-0/+1
[ Upstream commit b515d2637276a3810d6595e10ab02c13bfd0b63a ] Jianwen reported that IPv6 Interoperability tests are failing in an IPsec case where one of the links between the IPsec peers has an MTU of 1280. The peer generates a packet larger than this MTU, the router replies with a "Packet too big" message indicating an MTU of 1280. When the peer tries to send another large packet, xfrm_state_mtu returns 1280 - ipsec_overhead, which causes ip6_setup_cork to fail with EINVAL. We can fix this by forcing xfrm_state_mtu to return IPV6_MIN_MTU when IPv6 is used. After going through IPsec, the packet will then be fragmented to obey the actual network's PMTU, just before leaving the host. Currently, TFC padding is capped to PMTU - overhead to avoid fragementation: after padding and encapsulation, we still fit within the PMTU. That behavior is preserved in this patch. Fixes: 91657eafb64b ("xfrm: take net hdr len into account for esp payload size calculation") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30inet: annotate date races around sk->sk_txhashEric Dumazet1-3/+7
[ Upstream commit b71eaed8c04f72a919a9c44e83e4ee254e69e7f3 ] UDP sendmsg() path can be lockless, it is possible for another thread to re-connect an change sk->sk_txhash under us. There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE() pair to document the race. BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1: sk_set_txhash include/net/sock.h:1937 [inline] __ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75 __ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189 ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272 inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580 __sys_connect_file net/socket.c:1837 [inline] __sys_connect+0x245/0x280 net/socket.c:1854 __do_sys_connect net/socket.c:1864 [inline] __se_sys_connect net/socket.c:1861 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1861 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0: skb_set_hash_from_sk include/net/sock.h:2211 [inline] skb_set_owner_w+0x118/0x220 net/core/sock.c:2101 sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359 sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373 __ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621 ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983 udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xbca3c43d -> 0xfdb309e0 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30net: annotate data race in sock_error()Eric Dumazet1-1/+6
[ Upstream commit f13ef10059ccf5f4ed201cd050176df62ec25bb8 ] sock_error() is known to be racy. The code avoids an atomic operation is sk_err is zero, and this field could be changed under us, this is fine. Sysbot reported: BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock write to 0xffff888131855630 of 4 bytes by task 9365 on cpu 1: unix_release_sock+0x2e9/0x6e0 net/unix/af_unix.c:550 unix_release+0x2f/0x50 net/unix/af_unix.c:859 __sock_release net/socket.c:599 [inline] sock_close+0x6c/0x150 net/socket.c:1258 __fput+0x25b/0x4e0 fs/file_table.c:280 ____fput+0x11/0x20 fs/file_table.c:313 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888131855630 of 4 bytes by task 9385 on cpu 0: sock_error include/net/sock.h:2269 [inline] sock_alloc_send_pskb+0xe4/0x4e0 net/core/sock.c:2336 unix_dgram_sendmsg+0x478/0x1610 net/unix/af_unix.c:1671 unix_seqpacket_sendmsg+0xc2/0x100 net/unix/af_unix.c:2055 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 __sys_sendmsg_sock+0x25/0x30 net/socket.c:2416 io_sendmsg fs/io_uring.c:4367 [inline] io_issue_sqe+0x231a/0x6750 fs/io_uring.c:6135 __io_queue_sqe+0xe9/0x360 fs/io_uring.c:6414 __io_req_task_submit fs/io_uring.c:2039 [inline] io_async_task_func+0x312/0x590 fs/io_uring.c:5074 __tctx_task_work fs/io_uring.c:1910 [inline] tctx_task_work+0x1d4/0x3d0 fs/io_uring.c:1924 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_signal include/linux/tracehook.h:212 [inline] handle_signal_work kernel/entry/common.c:145 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xf8/0x190 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0x00000068 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 9385 Comm: syz-executor.3 Not tainted 5.13.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23mac80211: Fix NULL ptr deref for injected rate infoMathy Vanhoef1-1/+6
commit bddc0c411a45d3718ac535a070f349be8eca8d48 upstream. The commit cb17ed29a7a5 ("mac80211: parse radiotap header when selecting Tx queue") moved the code to validate the radiotap header from ieee80211_monitor_start_xmit to ieee80211_parse_tx_radiotap. This made is possible to share more code with the new Tx queue selection code for injected frames. But at the same time, it now required the call of ieee80211_parse_tx_radiotap at the beginning of functions which wanted to handle the radiotap header. And this broke the rate parser for radiotap header parser. The radiotap parser for rates is operating most of the time only on the data in the actual radiotap header. But for the 802.11a/b/g rates, it must also know the selected band from the chandef information. But this information is only written to the ieee80211_tx_info at the end of the ieee80211_monitor_start_xmit - long after ieee80211_parse_tx_radiotap was already called. The info->band information was therefore always 0 (NL80211_BAND_2GHZ) when the parser code tried to access it. For a 5GHz only device, injecting a frame with 802.11a rates would cause a NULL pointer dereference because local->hw.wiphy->bands[NL80211_BAND_2GHZ] would most likely have been NULL when the radiotap parser searched for the correct rate index of the driver. Cc: stable@vger.kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Fixes: cb17ed29a7a5 ("mac80211: parse radiotap header when selecting Tx queue") Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> [sven@narfation.org: added commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20210530133226.40587-1-sven@narfation.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23net: make get_net_ns return error if NET_NS is disabledChangbin Du1-0/+7
[ Upstream commit ea6932d70e223e02fea3ae20a4feff05d7c1ea9a ] There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case. [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c The same to tun SIOCGSKNS command. To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: c62cce2caee5 ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10net: caif: add proper error handlingPavel Skripkin2-2/+2
commit a2805dca5107d5603f4bbc027e81e20d93476e96 upstream. caif_enroll_dev() can fail in some cases. Ingnoring these cases can lead to memory leak due to not assigning link_support pointer to anywhere. Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10net: caif: added cfserl_release functionPavel Skripkin1-0/+1
commit bce130e7f392ddde8cfcb09927808ebd5f9c8669 upstream. Added cfserl_release() function. Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10net/tls: Fix use-after-free after the TLS device goes down and upMaxim Mikityanskiy1-0/+9
[ Upstream commit c55dcdd435aa6c6ad6ccac0a4c636d010ee367a4 ] When a netdev with active TLS offload goes down, tls_device_down is called to stop the offload and tear down the TLS context. However, the socket stays alive, and it still points to the TLS context, which is now deallocated. If a netdev goes up, while the connection is still active, and the data flow resumes after a number of TCP retransmissions, it will lead to a use-after-free of the TLS context. This commit addresses this bug by keeping the context alive until its normal destruction, and implements the necessary fallbacks, so that the connection can resume in software (non-offloaded) kTLS mode. On the TX side tls_sw_fallback is used to encrypt all packets. The RX side already has all the necessary fallbacks, because receiving non-decrypted packets is supported. The thing needed on the RX side is to block resync requests, which are normally produced after receiving non-decrypted packets. The necessary synchronization is implemented for a graceful teardown: first the fallbacks are deployed, then the driver resources are released (it used to be possible to have a tls_dev_resync after tls_dev_del). A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback mode. It's used to skip the RX resync logic completely, as it becomes useless, and some objects may be released (for example, resync_async, which is allocated and freed by the driver). Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10net/tls: Replace TLS_RX_SYNC_RUNNING with RCUMaxim Mikityanskiy1-1/+0
[ Upstream commit 05fc8b6cbd4f979a6f25759c4a17dd5f657f7ecd ] RCU synchronization is guaranteed to finish in finite time, unlike a busy loop that polls a flag. This patch is a preparation for the bugfix in the next patch, where the same synchronize_net() call will also be used to sync with the TX datapath. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: zero-initialize tc skb extension on allocationVlad Buslov1-0/+11
[ Upstream commit 9453d45ecb6c2199d72e73c993e9d98677a2801b ] Function skb_ext_add() doesn't initialize created skb extension with any value and leaves it up to the user. However, since extension of type TC_SKB_EXT originally contained only single value tc_skb_ext->chain its users used to just assign the chain value without setting whole extension memory to zero first. This assumption changed when TC_SKB_EXT extension was extended with additional fields but not all users were updated to initialize the new fields which leads to use of uninitialized memory afterwards. UBSAN log: [ 778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28 [ 778.301495] load of value 107 is not a valid value for type '_Bool' [ 778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2 [ 778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 778.307901] Call Trace: [ 778.308680] <IRQ> [ 778.309358] dump_stack+0xbb/0x107 [ 778.310307] ubsan_epilogue+0x5/0x40 [ 778.311167] __ubsan_handle_load_invalid_value.cold+0x43/0x48 [ 778.312454] ? memset+0x20/0x40 [ 778.313230] ovs_flow_key_extract.cold+0xf/0x14 [openvswitch] [ 778.314532] ovs_vport_receive+0x19e/0x2e0 [openvswitch] [ 778.315749] ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch] [ 778.317188] ? create_prof_cpu_mask+0x20/0x20 [ 778.318220] ? arch_stack_walk+0x82/0xf0 [ 778.319153] ? secondary_startup_64_no_verify+0xb0/0xbb [ 778.320399] ? stack_trace_save+0x91/0xc0 [ 778.321362] ? stack_trace_consume_entry+0x160/0x160 [ 778.322517] ? lock_release+0x52e/0x760 [ 778.323444] netdev_frame_hook+0x323/0x610 [openvswitch] [ 778.324668] ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch] [ 778.325950] __netif_receive_skb_core+0x771/0x2db0 [ 778.327067] ? lock_downgrade+0x6e0/0x6f0 [ 778.328021] ? lock_acquire+0x565/0x720 [ 778.328940] ? generic_xdp_tx+0x4f0/0x4f0 [ 778.329902] ? inet_gro_receive+0x2a7/0x10a0 [ 778.330914] ? lock_downgrade+0x6f0/0x6f0 [ 778.331867] ? udp4_gro_receive+0x4c4/0x13e0 [ 778.332876] ? lock_release+0x52e/0x760 [ 778.333808] ? dev_gro_receive+0xcc8/0x2380 [ 778.334810] ? lock_downgrade+0x6f0/0x6f0 [ 778.335769] __netif_receive_skb_list_core+0x295/0x820 [ 778.336955] ? process_backlog+0x780/0x780 [ 778.337941] ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core] [ 778.339613] ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0 [ 778.341033] ? kvm_clock_get_cycles+0x14/0x20 [ 778.342072] netif_receive_skb_list_internal+0x5f5/0xcb0 [ 778.343288] ? __kasan_kmalloc+0x7a/0x90 [ 778.344234] ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core] [ 778.345676] ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core] [ 778.347140] ? __netif_receive_skb_list_core+0x820/0x820 [ 778.348351] ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core] [ 778.349688] ? napi_gro_flush+0x26c/0x3c0 [ 778.350641] napi_complete_done+0x188/0x6b0 [ 778.351627] mlx5e_napi_poll+0x373/0x1b80 [mlx5_core] [ 778.352853] __napi_poll+0x9f/0x510 [ 778.353704] ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core] [ 778.355158] net_rx_action+0x34c/0xa40 [ 778.356060] ? napi_threaded_poll+0x3d0/0x3d0 [ 778.357083] ? sched_clock_cpu+0x18/0x190 [ 778.358041] ? __common_interrupt+0x8e/0x1a0 [ 778.359045] __do_softirq+0x1ce/0x984 [ 778.359938] __irq_exit_rcu+0x137/0x1d0 [ 778.360865] irq_exit_rcu+0xa/0x20 [ 778.361708] common_interrupt+0x80/0xa0 [ 778.362640] </IRQ> [ 778.363212] asm_common_interrupt+0x1e/0x40 [ 778.364204] RIP: 0010:native_safe_halt+0xe/0x10 [ 778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00 [ 778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246 [ 778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468 [ 778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e [ 778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb [ 778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000 [ 778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000 [ 778.378491] ? rcu_eqs_enter.constprop.0+0xb8/0xe0 [ 778.379606] ? default_idle_call+0x5e/0xe0 [ 778.380578] default_idle+0xa/0x10 [ 778.381406] default_idle_call+0x96/0xe0 [ 778.382350] do_idle+0x3d4/0x550 [ 778.383153] ? arch_cpu_idle_exit+0x40/0x40 [ 778.384143] cpu_startup_entry+0x19/0x20 [ 778.385078] start_kernel+0x3c7/0x3e5 [ 778.385978] secondary_startup_64_no_verify+0xb0/0xbb Fix the issue by providing new function tc_skb_ext_alloc() that allocates tc skb extension and initializes its memory to 0 before returning it to the caller. Change all existing users to use new API instead of calling skb_ext_add() directly. Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: sched: fix tx action rescheduling issue during deactivationYunsheng Lin1-6/+1
[ Upstream commit 102b55ee92f9fda4dde7a45d2b20538e6e3e3d1e ] Currently qdisc_run() checks the STATE_DEACTIVATED of lockless qdisc before calling __qdisc_run(), which ultimately clear the STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED is set before clearing STATE_MISSED, there may be rescheduling of net_tx_action() at the end of qdisc_run_end(), see below: CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate) . . . . set STATE_MISSED . . __netif_schedule() . . . set STATE_DEACTIVATED . . qdisc_reset() . . . .<--------------- . synchronize_net() clear __QDISC_STATE_SCHED | . . . | . . . | . some_qdisc_is_busy() . | . return *false* . | . . test STATE_DEACTIVATED | . . __qdisc_run() *not* called | . . . | . . test STATE_MISS | . . __netif_schedule()--------| . . . . . . . . __qdisc_run() is not called by net_tx_atcion() in CPU0 because CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule is called at the end of qdisc_run_end(), causing tx action rescheduling problem. qdisc_run() called by net_tx_action() runs in the softirq context, which should has the same semantic as the qdisc_run() called by __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a synchronize_net() between STATE_DEACTIVATED flag being set and qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely bail out for the deactived lockless qdisc in net_tx_action(), and qdisc_reset() will reset all skb not dequeued yet. So add the rcu_read_lock() explicitly to protect the qdisc_run() and do the STATE_DEACTIVATED checking in net_tx_action() before calling qdisc_run_begin(). Another option is to do the checking in the qdisc_run_end(), but it will add unnecessary overhead for non-tx_action case, because __dev_queue_xmit() will not see qdisc with STATE_DEACTIVATED after synchronize_net(), the qdisc with STATE_DEACTIVATED can only be seen by net_tx_action() because of __netif_schedule(). The STATE_DEACTIVATED checking in qdisc_run() is to avoid race between net_tx_action() and qdisc_reset(), see: commit d518d2ed8640 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc"). As the bailout added above for deactived lockless qdisc in net_tx_action() provides better protection for the race without calling qdisc_run() at all, so remove the STATE_DEACTIVATED checking in qdisc_run(). After qdisc_reset(), there is no skb in qdisc to be dequeued, so clear the STATE_MISSED in dev_reset_queue() too. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> V8: Clearing STATE_MISSED before calling __netif_schedule() has avoid the endless rescheduling problem, but there may still be a unnecessary rescheduling, so adjust the commit log. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: sched: fix packet stuck problem for lockless qdiscYunsheng Lin1-1/+34
[ Upstream commit a90c57f2cedd52a511f739fb55e6244e22e1a2fb ] Lockless qdisc has below concurrent problem: cpu0 cpu1 . . q->enqueue . . . qdisc_run_begin() . . . dequeue_skb() . . . sch_direct_xmit() . . . . q->enqueue . qdisc_run_begin() . return and do nothing . . qdisc_run_end() . cpu1 enqueue a skb without calling __qdisc_run() because cpu0 has not released the lock yet and spin_trylock() return false for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb enqueued by cpu1 when calling dequeue_skb() because cpu1 may enqueue the skb after cpu0 calling dequeue_skb() and before cpu0 calling qdisc_run_end(). Lockless qdisc has below another concurrent problem when tx_action is involved: cpu0(serving tx_action) cpu1 cpu2 . . . . q->enqueue . . qdisc_run_begin() . . dequeue_skb() . . . q->enqueue . . . . sch_direct_xmit() . . . qdisc_run_begin() . . return and do nothing . . . clear __QDISC_STATE_SCHED . . qdisc_run_begin() . . return and do nothing . . . . . . qdisc_run_end() . This patch fixes the above data race by: 1. If the first spin_trylock() return false and STATE_MISSED is not set, set STATE_MISSED and retry another spin_trylock() in case other CPU may not see STATE_MISSED after it releases the lock. 2. reschedule if STATE_MISSED is set after the lock is released at the end of qdisc_run_end(). For tx_action case, STATE_MISSED is also set when cpu1 is at the end if qdisc_run_end(), so tx_action will be rescheduled again to dequeue the skb enqueued by cpu2. Clear STATE_MISSED before retrying a dequeuing when dequeuing returns NULL in order to reduce the overhead of the second spin_trylock() and __netif_schedule() calling. Also clear the STATE_MISSED before calling __netif_schedule() at the end of qdisc_run_end() to avoid doing another round of dequeuing in the pfifo_fast_dequeue(). The performance impact of this patch, tested using pktgen and dummy netdev with pfifo_fast qdisc attached: threads without+this_patch with+this_patch delta 1 2.61Mpps 2.60Mpps -0.3% 2 3.97Mpps 3.82Mpps -3.7% 4 5.62Mpps 5.59Mpps -0.5% 8 2.78Mpps 2.77Mpps -0.3% 16 2.22Mpps 2.22Mpps -0.0% Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: really orphan skbs tied to closing skPaolo Abeni1-1/+3
[ Upstream commit 098116e7e640ba677d9e345cbee83d253c13d556 ] If the owing socket is shutting down - e.g. the sock reference count already dropped to 0 and only sk_wmem_alloc is keeping the sock alive, skb_orphan_partial() becomes a no-op. When forwarding packets over veth with GRO enabled, the above causes refcount errors. This change addresses the issue with a plain skb_orphan() call in the critical scenario. Fixes: 9adc89af724f ("net: let skb_orphan_partial wake-up waiters.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03netfilter: flowtable: Remove redundant hw refresh bitRoi Dayan1-1/+0
commit c07531c01d8284aedaf95708ea90e76d11af0e21 upstream. Offloading conns could fail for multiple reasons and a hw refresh bit is set to try to reoffload it in next sw packet. But it could be in some cases and future points that the hw refresh bit is not set but a refresh could succeed. Remove the hw refresh bit and do offload refresh if requested. There won't be a new work entry if a work is already pending anyway as there is the hw pending bit. Fixes: 8b3646d6e0c4 ("net/sched: act_ct: Support refreshing the flow table entries") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03mac80211: properly handle A-MSDUs that start with an RFC 1042 headerMathy Vanhoef1-2/+2
commit a1d5ff5651ea592c67054233b14b30bf4452999c upstream. Properly parse A-MSDUs whose first 6 bytes happen to equal a rfc1042 header. This can occur in practice when the destination MAC address equals AA:AA:03:00:00:00. More importantly, this simplifies the next patch to mitigate A-MSDU injection attacks. Cc: stable@vger.kernel.org Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20210511200110.0b2b886492f0.I23dd5d685fe16d3b0ec8106e8f01b59f499dffed@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-28NFC: nci: fix memory leak in nci_allocate_deviceDongliang Mu1-0/+1
commit e0652f8bb44d6294eeeac06d703185357f25d50b upstream. nfcmrvl_disconnect fails to free the hci_dev field in struct nci_dev. Fix this by freeing hci_dev in nci_free_device. BUG: memory leak unreferenced object 0xffff888111ea6800 (size 1024): comm "kworker/1:0", pid 19, jiffies 4294942308 (age 13.580s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 60 fd 0c 81 88 ff ff .........`...... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000004bc25d43>] kmalloc include/linux/slab.h:552 [inline] [<000000004bc25d43>] kzalloc include/linux/slab.h:682 [inline] [<000000004bc25d43>] nci_hci_allocate+0x21/0xd0 net/nfc/nci/hci.c:784 [<00000000c59cff92>] nci_allocate_device net/nfc/nci/core.c:1170 [inline] [<00000000c59cff92>] nci_allocate_device+0x10b/0x160 net/nfc/nci/core.c:1132 [<00000000006e0a8e>] nfcmrvl_nci_register_dev+0x10a/0x1c0 drivers/nfc/nfcmrvl/main.c:153 [<000000004da1b57e>] nfcmrvl_probe+0x223/0x290 drivers/nfc/nfcmrvl/usb.c:345 [<00000000d506aed9>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396 [<00000000bc632c92>] really_probe+0x159/0x4a0 drivers/base/dd.c:554 [<00000000f5009125>] driver_probe_device+0x84/0x100 drivers/base/dd.c:740 [<000000000ce658ca>] __device_attach_driver+0xee/0x110 drivers/base/dd.c:846 [<000000007067d05f>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:431 [<00000000f8e13372>] __device_attach+0x122/0x250 drivers/base/dd.c:914 [<000000009cf68860>] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:491 [<00000000359c965a>] device_add+0x5be/0xc30 drivers/base/core.c:3109 [<00000000086e4bd3>] usb_set_configuration+0x9d9/0xb90 drivers/usb/core/message.c:2164 [<00000000ca036872>] usb_generic_driver_probe+0x8c/0xc0 drivers/usb/core/generic.c:238 [<00000000d40d36f6>] usb_probe_device+0x5c/0x140 drivers/usb/core/driver.c:293 [<00000000bc632c92>] really_probe+0x159/0x4a0 drivers/base/dd.c:554 Reported-by: syzbot+19bcfc64a8df1318d1c3@syzkaller.appspotmail.com Fixes: 11f54f228643 ("NFC: nci: Add HCI over NCI protocol support") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19mm: fix struct page layout on 32-bit systemsMatthew Wilcox (Oracle)1-1/+11
commit 9ddb3c14afba8bc5950ed297f02d4ae05ff35cd1 upstream. 32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long. We always store the low bits in the first word to prevent the PageTail bit from being inadvertently set on a big endian platform. If that happened, get_user_pages_fast() racing against a page which was freed and reallocated to the page_pool could dereference a bogus compound_head(), which would be hard to trace back to this cause. Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Matteo Croce <mcroce@linux.microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14net: bridge: mcast: fix broken length + header check for MRDv6 Adv.Linus Lüssing1-1/+0
[ Upstream commit 99014088156cd78867d19514a0bc771c4b86b93b ] The IPv6 Multicast Router Advertisements parsing has the following two issues: For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes, assuming MLDv2 Reports with at least one multicast address entry). When ipv6_mc_check_mld_msg() tries to parse an Multicast Router Advertisement its MLD length check will fail - and it will wrongly return -EINVAL, even if we have a valid MRD Advertisement. With the returned -EINVAL the bridge code will assume a broken packet and will wrongly discard it, potentially leading to multicast packet loss towards multicast routers. The second issue is the MRD header parsing in br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header immediately after the IPv6 header (IPv6 next header type). However according to RFC4286, section 2 all MRD messages contain a Router Alert option (just like MLD). So instead there is an IPv6 Hop-by-Hop option for the Router Alert between the IPv6 and ICMPv6 header, again leading to the bridge wrongly discarding Multicast Router Advertisements. To fix these two issues, introduce a new return value -ENODATA to ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop option which is not an MLD but potentially an MRD packet. This also simplifies further parsing in the bridge code, as ipv6_mc_check_mld() already fully checks the ICMPv6 header and hop-by-hop option. These issues were found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14netfilter: nftables_offload: VLAN id needs host byteorder in flow dissectorPablo Neira Ayuso1-1/+10
[ Upstream commit ff4d90a89d3d4d9814e0a2696509a7d495be4163 ] The flow dissector representation expects the VLAN id in host byteorder. Add the NFT_OFFLOAD_F_NETWORK2HOST flag to swap the bytes from nft_cmp. Fixes: a82055af5959 ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14netfilter: nft_payload: fix C-VLAN offload supportPablo Neira Ayuso1-0/+1
[ Upstream commit 14c20643ef9457679cc6934d77adc24296505214 ] - add another struct flow_dissector_key_vlan for C-VLAN - update layer 3 dependency to allow to match on IPv4/IPv6 Fixes: 89d8fd44abfb ("netfilter: nft_payload: add C-VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14Bluetooth: verify AMP hci_chan before amp_destroyArchie Pusaka1-0/+1
commit 5c4c8c9544099bb9043a10a5318130a943e32fc3 upstream. hci_chan can be created in 2 places: hci_loglink_complete_evt() if it is an AMP hci_chan, or l2cap_conn_add() otherwise. In theory, Only AMP hci_chan should be removed by a call to hci_disconn_loglink_complete_evt(). However, the controller might mess up, call that function, and destroy an hci_chan which is not initiated by hci_loglink_complete_evt(). This patch adds a verification that the destroyed hci_chan must have been init'd by hci_loglink_complete_evt(). Example crash call trace: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe3/0x144 lib/dump_stack.c:118 print_address_description+0x67/0x22a mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report mm/kasan/report.c:412 [inline] kasan_report+0x251/0x28f mm/kasan/report.c:396 hci_send_acl+0x3b/0x56e net/bluetooth/hci_core.c:4072 l2cap_send_cmd+0x5af/0x5c2 net/bluetooth/l2cap_core.c:877 l2cap_send_move_chan_cfm_icid+0x8e/0xb1 net/bluetooth/l2cap_core.c:4661 l2cap_move_fail net/bluetooth/l2cap_core.c:5146 [inline] l2cap_move_channel_rsp net/bluetooth/l2cap_core.c:5185 [inline] l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5464 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:5799 [inline] l2cap_recv_frame+0x1d12/0x51aa net/bluetooth/l2cap_core.c:7023 l2cap_recv_acldata+0x2ea/0x693 net/bluetooth/l2cap_core.c:7596 hci_acldata_packet net/bluetooth/hci_core.c:4606 [inline] hci_rx_work+0x2bd/0x45e net/bluetooth/hci_core.c:4796 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Allocated by task 38: set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0x8d/0x9a mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x102/0x129 mm/slub.c:2787 kmalloc include/linux/slab.h:515 [inline] kzalloc include/linux/slab.h:709 [inline] hci_chan_create+0x86/0x26d net/bluetooth/hci_conn.c:1674 l2cap_conn_add.part.0+0x1c/0x814 net/bluetooth/l2cap_core.c:7062 l2cap_conn_add net/bluetooth/l2cap_core.c:7059 [inline] l2cap_connect_cfm+0x134/0x852 net/bluetooth/l2cap_core.c:7381 hci_connect_cfm+0x9d/0x122 include/net/bluetooth/hci_core.h:1404 hci_remote_ext_features_evt net/bluetooth/hci_event.c:4161 [inline] hci_event_packet+0x463f/0x72fa net/bluetooth/hci_event.c:5981 hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791 process_one_work+0x6f8/0xb50 k