Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 7a1592bcb15d71400a98632727791d1e68ea0ee8 ]
When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.
v5: No Change.
v4: Update commit description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 675d76ad0ad5bf41c9a129772ef0aba8f57ea9a7 ]
When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
This will cause the neigh cache be refreshed and set to REACHABLE before
xmit.
But if the remote mac address changed, e.g. device is deleted and recreated,
we will not able to notice this and still use the old mac address as the neigh
cache is REACHABLE.
Fix this by disable neigh confirm when do pmtu update
v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bd085ef678b2cc8c38c105673dfe8ff8f5ec0c57 ]
The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.
But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()
If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.
So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.
On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.
To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.
v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 896585d48e8e9ba44cd1754fbce8537feffcc1a5 ]
When we add a new IPv6 address, we should also join corresponding solicited-node
multicast address, unless the interface has IFF_NOARP flag, as function
addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
not do dad and add the mcast address. So we will drop corresponding neighbour
discovery message that came from other nodes.
A typical example is after creating a ipvlan with mode l3, setting up an ipv6
address and changing the mode to l2. Then we will not be able to ping this
address as the interface doesn't join related solicited-node mcast address.
Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
corresponding mcast group and check if there is a duplicate address on the
network.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 103d0244d29fcaf38f1339d4538919bbbc051490 ]
Report erspan version field to userspace in ip6gre_fill_info just for
erspan_v6 tunnels. Moreover report IFLA_GRE_ERSPAN_INDEX only for
erspan version 1.
The issue can be triggered with the following reproducer:
$ip link add name gre6 type ip6gre local 2001::1 remote 2002::2
$ip link set gre6 up
$ip -d link sh gre6
14: grep6@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/gre6 2001::1 peer 2002::2 promiscuity 0 minmtu 0 maxmtu 0
ip6gre remote 2002::2 local 2001::1 hoplimit 64 encaplimit 4 tclass 0x00 flowlabel 0x00000 erspan_index 0 erspan_ver 0 addrgenmode eui64
Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c2027d1e17582903e368abf5d4838b22a98f2b7b ]
A recent commit allows sockets bound to a VRF to receive ipv6 link local
packets. However, it only works for UDP and worse TCP connection attempts
to the LLA with the only listener bound to the VRF just hang where as
before the client gets a reset and connection refused. Fix by adjusting
ir_iif for LL addresses and packets received through a device enslaved
to a VRF.
Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cc: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
deleted on device
[ Upstream commit 097f95d319f817e651bd51f8846aced92a55a6a1 ]
We configured iptables as below, which only allowed incoming data on
established connections:
iptables -t mangle -A PREROUTING -m state --state ESTABLISHED -j ACCEPT
iptables -t mangle -P PREROUTING DROP
When deleting a secondary address, current masquerade implements would
flush all conntracks on this device. All the established connections on
primary address also be deleted, then subsequent incoming data on the
connections would be dropped wrongly because it was identified as NEW
connection.
So when an address was delete, it should only flush connections related
with the address.
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1bef4c223b8588cf50433bdc2c6953d82949b3b3 ]
While looking at a syzbot KCSAN report [1], I found multiple
issues in this code :
1) fib6_nh->last_probe has an initial value of 0.
While probably okay on 64bit kernels, this causes an issue
on 32bit kernels since the time_after(jiffies, 0 + interval)
might be false ~24 days after boot (for HZ=1000)
2) The data-race found by KCSAN
I could use READ_ONCE() and WRITE_ONCE(), but we also can
take the opportunity of not piling-up too many rt6_probe_deferred()
works by using instead cmpxchg() so that only one cpu wins the race.
[1]
BUG: KCSAN: data-race in find_match / find_match
write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1:
rt6_probe net/ipv6/route.c:663 [inline]
find_match net/ipv6/route.c:757 [inline]
find_match+0x5bd/0x790 net/ipv6/route.c:733
__find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
find_rr_leaf net/ipv6/route.c:852 [inline]
rt6_select net/ipv6/route.c:896 [inline]
fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735
read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0:
rt6_probe net/ipv6/route.c:657 [inline]
find_match net/ipv6/route.c:757 [inline]
find_match+0x521/0x790 net/ipv6/route.c:733
__find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
find_rr_leaf net/ipv6/route.c:852 [inline]
rt6_select net/ipv6/route.c:896 [inline]
fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: cc3a86c802f0 ("ipv6: Change rt6_probe to take a fib6_nh")
Fixes: f547fac624be ("ipv6: rate-limit probes for neighbourless routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7170a977743b72cf3eb46ef6ef89885dc7ad3621 ]
This socket field can be read and written by concurrent cpus.
Use READ_ONCE() and WRITE_ONCE() annotations to document this,
and avoid some compiler 'optimizations'.
KCSAN reported :
BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv
write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
sk_incoming_cpu_update include/net/sock.h:953 [inline]
tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
process_backlog+0x1d3/0x420 net/core/dev.c:5955
napi_poll net/core/dev.c:6392 [inline]
net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
__do_softirq+0x115/0x33f kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
do_softirq kernel/softirq.c:329 [inline]
__local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
sk_incoming_cpu_update include/net/sock.h:952 [inline]
tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
process_backlog+0x1d3/0x420 net/core/dev.c:5955
napi_poll net/core/dev.c:6392 [inline]
net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2eb8d6d2910cfe3dc67dc056f26f3dd9c63d47cd ]
The check for !md doens't really work for ip_tunnel_info_opts(info) which
only does info + 1. Also to avoid out-of-bounds access on info, it should
ensure options_len is not less than erspan_metadata in both erspan_xmit()
and ip6erspan_tunnel_xmit().
Fixes: 1a66a836da ("gre: add collect_md mode to ERSPAN tunnel")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c7a42eb49212f93a800560662d17d5293960d3c3 ]
We need a similar fix for ipv6 as Commit 0761680d5215 ("net: ipv4: fix
listify ip_rcv_finish in case of forwarding") does for ipv4.
This issue can be reprocuded by syzbot since Commit 323ebb61e32b ("net:
use listified RX for handling GRO_NORMAL skbs") on net-next. The call
trace was:
kernel BUG at include/linux/skbuff.h:2225!
RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
Call Trace:
sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
dst_input include/net/dst.h:442 [inline]
ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
__netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
__netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
__netif_receive_skb_list net/core/dev.c:5149 [inline]
netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
gro_normal_list net/core/dev.c:5755 [inline]
gro_normal_one net/core/dev.c:5769 [inline]
napi_frags_finish net/core/dev.c:5782 [inline]
napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4094871db1d65810acab3d57f6089aa39ef7f648 ]
Prior to this change an application sending <= 1MSS worth of data and
enabling UDP GSO would fail if the system had SW GSO enabled, but the
same send would succeed if HW GSO offload is enabled. In addition to this
inconsistency the error in the SW GSO case does not get back to the
application if sending out of a real device so the user is unaware of this
failure.
With this change we only perform GSO if the # of segments is > 1 even
if the application has enabled segmentation. I've also updated the
relevant udpgso selftests.
Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 44b321e5020d782ad6e8ae8183f09b163be6e6e2 ]
Commit dfec0ee22c0a ("udp: Record gso_segs when supporting UDP segmentation offload")
added gso_segs calculation, but incorrectly got sizeof() the pointer and
not the underlying data type. In addition let's fix the v6 case.
Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Fixes: dfec0ee22c0a ("udp: Record gso_segs when supporting UDP segmentation offload")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2d819d250a1393a3e725715425ab70a0e0772a71 ]
Rajendra reported a kernel panic when a link was taken down:
[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
<snip>
[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in __ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.
Since the DAD sequence can not be aborted, add a check for the missing
host route in __ipv6_ifa_notify. The only way this should happen is due
to the previously mentioned race. The host route is created when the
address is added to an interface; it is only removed on a down event
where the address is kept. Add a warning if the host route is missing
AND the device is up; this is a situation that should never happen.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6af1799aaf3f1bc8defedddfa00df3192445bbf3 ]
This began with a syzbot report. syzkaller was injecting
IPv6 TCP SYN packets having a v4mapped source address.
After an unsuccessful 4-tuple lookup, TCP creates a request
socket (SYN_RECV) and calls reqsk_queue_hash_req()
reqsk_queue_hash_req() calls sk_ehashfn(sk)
At this point we have AF_INET6 sockets, and the heuristic
used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
For the particular spoofed packet, we end up hashing V4 addresses
which were not initialized by the TCP IPv6 stack, so KMSAN fired
a warning.
I first fixed sk_ehashfn() to test both source and destination addresses,
but then faced various problems, including user-space programs
like packetdrill that had similar assumptions.
Instead of trying to fix the whole ecosystem, it is better
to admit that we have a dual stack behavior, and that we
can not build linux kernels without V4 stack anyway.
The dual stack API automatically forces the traffic to be IPv4
if v4mapped addresses are used at bind() or connect(), so it makes
no sense to allow IPv6 traffic to use the same v4mapped class.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit acdcecc61285faed359f1a3568c32089cc3a8329 ]
UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.
Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.
Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.
New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.
Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 28e486037747c2180470b77c290d4090ad42f259 ]
In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to
be freed on the tx_err path. Otherwise when deleting a netns, it would
cause dst/dev to leak, and dmesg shows:
unregister_netdevice: waiting for lo to become free. Usage count = 1
Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d23dbc479a8e813db4161a695d67da0e36557846 ]
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes: d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a84d016479896b5526a2cc54784e6ffc41c9d6f6 ]
Similar to the fix done for IPv4 in commit e5b1c6c6277d
("igmp: fix memory leak in igmpv3_del_delrec()"), we need to
make sure mca_tomb and mca_sources are not blindly overwritten.
Using swap() then a call to ip6_mc_clear_src() will take care
of the missing free.
BUG: memory leak
unreferenced object 0xffff888117d9db00 (size 64):
comm "syz-executor247", pid 6918, jiffies 4294943989 (age 25.350s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 fe 88 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000005b463030>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<000000005b463030>] slab_post_alloc_hook mm/slab.h:522 [inline]
[<000000005b463030>] slab_alloc mm/slab.c:3319 [inline]
[<000000005b463030>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
[<00000000939cbf94>] kmalloc include/linux/slab.h:552 [inline]
[<00000000939cbf94>] kzalloc include/linux/slab.h:748 [inline]
[<00000000939cbf94>] ip6_mc_add1_src net/ipv6/mcast.c:2236 [inline]
[<00000000939cbf94>] ip6_mc_add_src+0x31f/0x420 net/ipv6/mcast.c:2356
[<00000000d8972221>] ip6_mc_source+0x4a8/0x600 net/ipv6/mcast.c:449
[<000000002b203d0d>] do_ipv6_setsockopt.isra.0+0x1b92/0x1dd0 net/ipv6/ipv6_sockglue.c:748
[<000000001f1e2d54>] ipv6_setsockopt+0x89/0xd0 net/ipv6/ipv6_sockglue.c:944
[<00000000c8f7bdf9>] udpv6_setsockopt+0x4e/0x90 net/ipv6/udp.c:1558
[<000000005a9a0c5e>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3139
[<00000000910b37b2>] __sys_setsockopt+0x10f/0x220 net/socket.c:2084
[<00000000e9108023>] __do_sys_setsockopt net/socket.c:2100 [inline]
[<00000000e9108023>] __se_sys_setsockopt net/socket.c:2097 [inline]
[<00000000e9108023>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2097
[<00000000f4818160>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296
[<000000008d367e8f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down")
Fixes: 9c8bb163ae78 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c7036d97acd2527cef145b5ef9ad1a37ed21bbe6 ]
A user reported that routes are getting installed with type 0 (RTN_UNSPEC)
where before the routes were RTN_UNICAST. One example is from accel-ppp
which apparently still uses the ioctl interface and does not set
rtmsg_type. Another is the netlink interface where ipv6 does not require
rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the
ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior.
Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f17f7648a49aa6728649ddf79bdbcac4f1970ce4 ]
In commit 93a714d6b53d ("multicast: Extend ip address command to enable
multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
to make user able to add multicast address on ethernet interface.
This works for IPv4, but not for IPv6. See the inet6_addr_add code.
static int inet6_addr_add()
{
...
if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
}
ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr
if (!IS_ERR(ifp)) {
...
} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
}
}
But in ipv6_add_addr() it will check the address type and reject multicast
address directly. So this feature is never worked for IPv6.
We should not remove the multicast address check totally in ipv6_add_addr(),
but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.
v2: update commit description
Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b575b24b8eee37f10484e951b62ce2a31c579775 ]
When firewalld is enabled with ipv4/ipv6 rpfilter, vrf
ipv4/ipv6 packets will be dropped. Vrf device will pass
through netfilter hook twice. One with enslaved device
and another one with l3 master device. So in device may
dismatch witch out device because out device is always
enslaved device.So failed with the check of the rpfilter
and drop the packets by mistake.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 01f5bffad555f8e22a61f4b1261fe09cf1b96994 ]
ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
can cause a possible use-after-free accessing iph/ipv6h pointer
since the packet will be 'uncloned' running pskb_expand_head if
it is a cloned gso skb.
Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3bc817d665ac6d9de89f59df522ad86f5b5dfc03 ]
Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
which may change skb->data, so we need to re-load ipv6h at
the right place.
Fixes: 898b29798e36 ("ip6_gre: Refactor ip6gre xmit codes")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 54851aa90cf27041d64b12f65ac72e9f97bd90fd ]
When a route needs to be appended to an existing multipath route,
fib6_add_rt2node() first appends it to the siblings list and increments
the number of sibling routes on each sibling.
Later, the function notifies the route via call_fib6_entry_notifiers().
In case the notification is vetoed, the route is not unlinked from the
siblings list, which can result in a use-after-free.
Fix this by unlinking the route from the siblings list before returning
an error.
Audited the rest of the call sites from which the FIB notification chain
is called and could not find more problems.
Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 49d05fe2c9d1b4a27761c9807fec39b8155bef9e ]
Paul reported that l2tp sessions were broken after the commit referenced
in the Fixes tag. Prior to this commit rt6_check returned NULL if the
rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB
entry. Restore that behavior.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Paul Donohue <linux-kernel@PaulSD.com>
Tested-by: Paul Donohue <linux-kernel@PaulSD.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8a3dca632538c550930ce8bafa8c906b130d35cf ]
When fixing the skb leak introduced by the conversion to rbtree, I
forgot about the special case of duplicate fragments. The condition
under the 'insert_error' label isn't effective anymore as
nf_ct_frg6_gather() doesn't override the returned value anymore. So
duplicate fragments now get NF_DROP verdict.
To accept duplicate fragments again, handle them specially as soon as
inet_frag_queue_insert() reports them. Return -EINPROGRESS which will
translate to NF_STOLEN verdict, like any accepted fragment. However,
such packets don't carry any new information and aren't queued, so we
just drop them immediately.
Fixes: a0d56cb911ca ("netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a0d56cb911ca301de81735f1d73c2aab424654ba ]
With commit 997dd9647164 ("net: IP6 defrag: use rbtrees in
nf_conntrack_reasm.c"), nf_ct_frag6_reasm() is now called from
nf_ct_frag6_queue(). With this change, nf_ct_frag6_queue() can fail
after the skb has been added to the fragment queue and
nf_ct_frag6_gather() was adapted to handle this case.
But nf_ct_frag6_queue() can still fail before the fragment has been
queued. nf_ct_frag6_gather() can't handle this case anymore, because it
has no way to know if nf_ct_frag6_queue() queued the fragment before
failing. If it didn't, the skb is lost as the error code is overwritten
with -EINPROGRESS.
Fix this by setting -EINPROGRESS directly in nf_ct_frag6_queue(), so
that nf_ct_frag6_gather() can propagate the error as is.
Fixes: 997dd9647164 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 4ac30c4b3659efac031818c418beb51e630d512d upstream.
__udp6_lib_err() may be called when handling icmpv6 message. For example,
the icmpv6 toobig(type=2). __udp6_lib_lookup() is then called
which may call reuseport_select_sock(). reuseport_select_sock() will
call into a bpf_prog (if there is one).
reuseport_select_sock() is expecting the skb->data pointing to the
transport header (udphdr in this case). For example, run_bpf_filter()
is pulling the transport header.
However, in the __udp6_lib_err() path, the skb->data is pointing to the
ipv6hdr instead of the udphdr.
One option is to pull and push the ipv6hdr in __udp6_lib_err().
Instead of doing this, this patch follows how the original
commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
was done in IPv4, which has passed a NULL skb pointer to
reuseport_select_sock().
Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Cc: Craig Gallek <kraig@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 257a525fe2e49584842c504a92c27097407f778f upstream.
When the commit a6024562ffd7 ("udp: Add GRO functions to UDP socket")
added udp[46]_lib_lookup_skb to the udp_gro code path, it broke
the reuseport_select_sock() assumption that skb->data is pointing
to the transport header.
This patch follows an earlier __udp6_lib_err() fix by
passing a NULL skb to avoid calling the reuseport's bpf_prog.
Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 983695fa676568fc0fe5ddd995c7267aabc24632 upstream.
Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently
to applications as also stated in original motivation in 7828f20e3779 ("Merge
branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter
two hooks into Cilium to enable host based load-balancing with Kubernetes,
I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes
typically sets up DNS as a service and is thus subject to load-balancing.
Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API
is currently insufficient and thus not usable as-is for standard applications
shipped with most distros. To break down the issue we ran into with a simple
example:
# cat /etc/resolv.conf
nameserver 147.75.207.207
nameserver 147.75.207.208
For the purpose of a simple test, we set up above IPs as service IPs and
transparently redirect traffic to a different DNS backend server for that
node:
# cilium service list
ID Frontend Backend
1 147.75.207.207:53 1 => 8.8.8.8:53
2 147.75.207.208:53 1 => 8.8.8.8:53
The attached BPF program is basically selecting one of the backends if the
service IP/port matches on the cgroup hook. DNS breaks here, because the
hooks are not transparent enough to applications which have built-in msg_name
address checks:
# nslookup 1.1.1.1
;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
[...]
;; connection timed out; no servers could be reached
# dig 1.1.1.1
;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
[...]
; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
;; global options: +cmd
;; connection timed out; no servers could be reached
For comparison, if none of the service IPs is used, and we tell nslookup
to use 8.8.8.8 directly it works just fine, of course:
# nslookup 1.1.1.1 8.8.8.8
1.1.1.1.in-addr.arpa name = one.one.one.one.
In order to fix this and thus act more transparent to the application,
this needs reverse translation on recvmsg() side. A minimal fix for this
API is to add similar recvmsg() hooks behind the BPF cgroups static key
such that the program can track state and replace the current sockaddr_in{,6}
with the original service IP. From BPF side, this basically tracks the
service tuple plus socket cookie in an LRU map where the reverse NAT can
then be retrieved via map value as one example. Side-note: the BPF cgroups
static key should be converted to a per-hook static key in future.
Same example after this fix:
# cilium service list
ID Frontend Backend
1 147.75.207.207:53 1 => 8.8.8.8:53
2 147.75.207.208:53 1 => 8.8.8.8:53
Lookups work fine now:
# nslookup 1.1.1.1
1.1.1.1.in-addr.arpa name = one.one.one.one.
Authoritative answers can be found from:
# dig 1.1.1.1
; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;1.1.1.1. IN A
;; AUTHORITY SECTION:
. 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400
;; Query time: 17 msec
;; SERVER: 147.75.207.207#53(147.75.207.207)
;; WHEN: Tue May 21 12:59:38 UTC 2019
;; MSG SIZE rcvd: 111
And from an actual packet level it shows that we're using the back end
server when talking via 147.75.207.20{7,8} front end:
# tcpdump -i any udp
[...]
12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
[...]
In order to be flexible and to have same semantics as in sendmsg BPF
programs, we only allow return codes in [1,1] range. In the sendmsg case
the program is called if msg->msg_name is present which can be the case
in both, connected and unconnected UDP.
The former only relies on the sockaddr_in{,6} passed via connect(2) if
passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar
way to call into the BPF program whenever a non-NULL msg->msg_name was
passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note
that for TCP case, the msg->msg_name is ignored in the regular recvmsg
path and therefore not relevant.
For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE,
the hook is not called. This is intentional as it aligns with the same
semantics as in case of TCP cgroup BPF hooks right now. This might be
better addressed in future through a different bpf_attach_type such
that this case can be distinguished from the regular recvmsg paths,
for example.
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 65a3c497c0e965a552008db8bc2653f62bc925a1 ]
Before taking a refcount, make sure the object is not already
scheduled for deletion.
Same fix is needed in ipv6_flowlabel_opt()
Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b9aa52c4cb457e7416cc0c95f475e72ef4a61336 ]
The following code returns EFAULT (Bad address):
s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */
The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
instead of IPPROTO_ICMPV6.
The failure happens because 2 bytes are eaten from the msghdr by
rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6
equivalent of "ipv4: Avoid reading user iov twice after
raw_probe_proto_opt""), but at that time it was not a problem because
IPV6_HDRINCL was not yet introduced.
Only eat these 2 bytes if hdrincl == 0.
Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 59e3e4b52663a9d97efbce7307f62e4bc5c9ce91 ]
As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|