summaryrefslogtreecommitdiff
path: root/net/ipv4/route.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-08ipv4: Fix tos mask in inet_rtm_getroute()Guillaume Nault1-3/+4
[ Upstream commit 1ebf179037cb46c19da3a9c1e2ca16e7a754b75e ] When inet_rtm_getroute() was converted to use the RCU variants of ip_route_input() and ip_route_output_key(), the TOS parameters stopped being masked with IPTOS_RT_MASK before doing the route lookup. As a result, "ip route get" can return a different route than what would be used when sending real packets. For example: $ ip route add 192.0.2.11/32 dev eth0 $ ip route add unreachable 192.0.2.11/32 tos 2 $ ip route get 192.0.2.11 tos 2 RTNETLINK answers: No route to host But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would actually be routed using the first route: $ ping -c 1 -Q 2 192.0.2.11 PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms --- 192.0.2.11 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to return results consistent with real route lookups. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29ipv4: Restore flowi4_oif update before call to xfrm_lookup_routeDavid Ahern1-1/+3
[ Upstream commit 874fb9e2ca949b443cc419a4f2227cafd4381d39 ] Tobias reported regressions in IPsec tests following the patch referenced by the Fixes tag below. The root cause is dropping the reset of the flowi4_oif after the fib_lookup. Apparently it is needed for xfrm cases, so restore the oif update to ip_route_output_flow right before the call to xfrm_lookup_route. Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01rt_cpu_seq_next should increase position indexVasily Averin1-0/+1
[ Upstream commit a3ea86739f1bc7e121d921842f0f4a8ab1af94d9 ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-26ipv4: Update exception handling for multipath routes via same deviceDavid Ahern1-5/+8
[ Upstream commit 2fbc6e89b2f1403189e624cabaf73e189c5e50c6 ] Kfir reported that pmtu exceptions are not created properly for deployments where multipath routes use the same device. After some digging I see 2 compounding problems: 1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after* the route lookup. This is the second use case where this has been a problem (the first is related to use of vti devices with VRF). I can not find any reason for the oif to be changed after the lookup; the code goes back to the start of git. It does not seem logical so remove it. 2. fib_lookups for exceptions do not call fib_select_path to handle multipath route selection based on the hash. The end result is that the fib_lookup used to add the exception always creates it based using the first leg of the route. An example topology showing the problem: | host1 +------+ | eth0 | .209 +------+ | +------+ switch | br0 | +------+ | +---------+---------+ | host2 | host3 +------+ +------+ | eth0 | .250 | eth0 | 192.168.252.252 +------+ +------+ +-----+ +-----+ | vti | .2 | vti | 192.168.247.3 +-----+ +-----+ \ / ================================= tunnels 192.168.247.1/24 for h in host1 host2 host3; do ip netns add ${h} ip -netns ${h} link set lo up ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 done ip netns add switch ip -netns switch li set lo up ip -netns switch link add br0 type bridge stp 0 ip -netns switch link set br0 up for n in 1 2 3; do ip -netns switch link add eth-sw type veth peer name eth-h${n} ip -netns switch li set eth-h${n} master br0 up ip -netns switch li set eth-sw netns host${n} name eth0 done ip -netns host1 addr add 192.168.252.209/24 dev eth0 ip -netns host1 link set dev eth0 up ip -netns host1 route add 192.168.247.0/24 \ nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0 ip -netns host2 addr add 192.168.252.250/24 dev eth0 ip -netns host2 link set dev eth0 up ip -netns host2 addr add 192.168.252.252/24 dev eth0 ip -netns host3 link set dev eth0 up ip netns add tunnel ip -netns tunnel li set lo up ip -netns tunnel li add br0 type bridge ip -netns tunnel li set br0 up for n in $(seq 11 20); do ip -netns tunnel addr add dev br0 192.168.247.${n}/24 done for n in 2 3 do ip -netns tunnel link add vti${n} type veth peer name eth${n} ip -netns tunnel link set eth${n} mtu 1360 master br0 up ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24 done ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3 ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11 ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15 ip -netns host1 ro ls cache Before this patch the cache always shows exceptions against the first leg in the multipath route; 192.168.252.250 per this example. Since the hash has an initial random seed, you may need to vary the final octet more than what is listed. In my tests, using addresses between 11 and 19 usually found 1 that used both legs. With this patch, the cache will have exceptions for both legs. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions") Reported-by: Kfir Itzhak <mastertheknife@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"Yuqi Jin1-8/+6
[ Upstream commit a6211caa634da39d861a47437ffcda8b38ef421b ] Commit adb03115f459 ("net: get rid of an signed integer overflow in ip_idents_reserve()") used atomic_cmpxchg to replace "atomic_add_return" inside the function "ip_idents_reserve". The reason was to avoid UBSAN warning. However, this change has caused performance degrade and in GCC-8, fno-strict-overflow is now mapped to -fwrapv -fwrapv-pointer and signed integer overflow is now undefined by default at all optimization levels[1]. Moreover, it was a bug in UBSAN vs -fwrapv /-fno-strict-overflow, so Let's revert it safely. [1] https://gcc.gnu.org/gcc-8/changes.html Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiong Wang <jiongwang@huawei.com> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net: ipv4: really enforce backoff for redirectsPaolo Abeni1-1/+1
[ Upstream commit 57644431a6c2faac5d754ebd35780cf43a531b1a ] In commit b406472b5ad7 ("net: ipv4: avoid mixed n_redirects and rate_tokens usage") I missed the fact that a 0 'rate_tokens' will bypass the backoff algorithm. Since rate_tokens is cleared after a redirect silence, and never incremented on redirects, if the host keeps receiving packets requiring redirect it will reply ignoring the backoff. Additionally, the 'rate_last' field will be updated with the cadence of the ingress packet requiring redirect. If that rate is high enough, that will prevent the host from generating any other kind of ICMP messages The check for a zero 'rate_tokens' value was likely a shortcut to avoid the more complex backoff algorithm after a redirect silence period. Address the issue checking for 'n_redirects' instead, which is incremented on successful redirect, and does not interfere with other ICMP replies. Fixes: b406472b5ad7 ("net: ipv4: avoid mixed n_redirects and rate_tokens usage") Reported-and-tested-by: Colin Walters <walters@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04net: add bool confirm_neigh parameter for dst_ops.update_pmtuHangbin Liu1-3/+6
[ Upstream commit bd085ef678b2cc8c38c105673dfe8ff8f5ec0c57 ] The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29ipv4: Return -ENETUNREACH if we can't create route but saddr is validStefano Brivio1-3/+6
[ Upstream commit 595e0651d0296bad2491a4a29a7a43eae6328b02 ] ...instead of -EINVAL. An issue was found with older kernel versions while unplugging a NFS client with pending RPCs, and the wrong error code here prevented it from recovering once link is back up with a configured address. Incidentally, this is not an issue anymore since commit 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context"), included in 5.2-rc7, had the effect of decoupling the forwarding of this error by using SO_ERROR in xs_wake_error(), as pointed out by Benjamin Coddington. To the best of my knowledge, this isn't currently causing any further issue, but the error code doesn't look appropriate anyway, and we might hit this in other paths as well. In detail, as analysed by Gonzalo Siero, once the route is deleted because the interface is down, and can't be resolved and we return -EINVAL here, this ends up, courtesy of inet_sk_rebuild_header(), as the socket error seen by tcp_write_err(), called by tcp_retransmit_timer(). In turn, tcp_write_err() indirectly calls xs_error_report(), which wakes up the RPC pending tasks with a status of -EINVAL. This is then seen by call_status() in the SUN RPC implementation, which aborts the RPC call calling rpc_exit(), instead of handling this as a potentially temporary condition, i.e. as a timeout. Return -EINVAL only if the input parameters passed to ip_route_output_key_hash_rcu() are actually invalid (this is the case if the specified source address is multicast, limited broadcast or all zeroes), but return -ENETUNREACH in all cases where, at the given moment, the given source address doesn't allow resolving the route. While at it, drop the initialisation of err to -ENETUNREACH, which was added to __ip_route_output_key() back then by commit 0315e3827048 ("net: Fix behaviour of unreachable, blackhole and prohibit routes"), but actually had no effect, as it was, and is, overwritten by the fib_lookup() return code assignment, and anyway ignored in all other branches, including the if (fl4->saddr) one: I find this rather confusing, as it would look like -ENETUNREACH is the "default" error, while that statement has no effect. Also note that after commit fc75fc8339e7 ("ipv4: dont create routes on down devices"), we would get -ENETUNREACH if the device is down, but -EINVAL if the source address is specified and we can't resolve the route, and this appears to be rather inconsistent. Reported-by: Stefan Walter <walteste@inf.ethz.ch> Analysed-by: Benjamin Coddington <bcodding@redhat.com> Analysed-by: Gonzalo Siero <gsierohu@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29ipv4: fix race condition between route lookup and invalidationWei Wang1-1/+1
[ Upstream commit 5018c59607a511cdee743b629c76206d9c9e6d7b ] Jesse and Ido reported the following race condition: <CPU A, t0> - Received packet A is forwarded and cached dst entry is taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set() <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables from multiple ISPs"), route is added / deleted and rt_cache_flush() is called <CPU B, t2> - Received packet B tries to use the same cached dst entry from t0, but rt_cache_valid() is no longer true and it is replaced in rt_cache_route() by the newer one. This calls dst_dev_put() on the original dst entry which assigns the blackhole netdev to 'dst->dev' <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due to 'dst->dev' being the blackhole netdev There are 2 issues in the v4 routing code: 1. A per-netns counter is used to do the validation of the route. That means whenever a route is changed in the netns, users of all routes in the netns needs to redo lookup. v6 has an implementation of only updating fn_sernum for routes that are affected. 2. When rt_cache_valid() returns false, rt_cache_route() is called to throw away the current cache, and create a new one. This seems unnecessary because as long as this route does not change, the route cache does not need to be recreated. To fully solve the above 2 issues, it probably needs quite some code changes and requires careful testing, and does not suite for net branch. So this patch only tries to add the deleted cached rt into the uncached list, so user could still be able to use it to receive packets until it's done. Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly") Signed-off-by: Wei Wang <weiwan@google.com> Reported-by: Ido Schimmel <idosch@idosch.org> Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org> Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07net: ipv4: avoid mixed n_redirects and rate_tokens usagePaolo Abeni1-3/+2
[ Upstream commit b406472b5ad79ede8d10077f0c8f05505ace8b6d ] Since commit c09551c6ff7f ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets") we use 'n_redirects' to account for redirect packets, but we still use 'rate_tokens' to compute the redirect packets exponential backoff. If the device sent to the relevant peer any ICMP error packet after sending a redirect, it will also update 'rate_token' according to the leaking bucket schema; typically 'rate_token' will raise above BITS_PER_LONG and the redirect packets backoff algorithm will produce undefined behavior. Fix the issue using 'n_redirects' to compute the exponential backoff in ip_rt_send_redirect(). Note that we still clear rate_tokens after a redirect silence period, to avoid changing an established behaviour. The root cause predates git history; before the mentioned commit in the critical scenario, the kernel stopped sending redirects, after the mentioned commit the behavior more randomic. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: c09551c6ff7f ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11ipv4: not do cache for local delivery if bc_forwarding is enabledXin Long1-11/+11
[ Upstream commit 0a90478b93a46bdcd56ba33c37566a993e455d54 ] With the topo: h1 ---| rp1 | | route rp3 |--- h3 (192.168.200.1) h2 ---| rp2 | If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on h2, and the packets can still be forwared. This issue was caused by the input route cache. It should only do the cache for either bc forwarding or local delivery. Otherwise, local delivery can use the route cache for bc forwarding of other interfaces. This patch is to fix it by not doing cache for local delivery if all.bc_forwarding is enabled. Note that we don't fix it by checking route cache local flag after rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the common route code shouldn't be touched for bc_forwarding. Fixes: 5cbf777cfdf6 ("route: add support for directed broadcast forwarding") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-04inet: switch IP ID generator to siphashEric Dumazet1-5/+7
[ Upstream commit df453700e8d81b1bdafdf684365ee2b9431fb702 ] According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ipv4: add sanity checks in ipv4_link_failure()Eric Dumazet1-9/+23
[ Upstream commit 20ff83f10f113c88d0bb74589389b05250994c16 ] Before calling __ip_options_compile(), we need to ensure the network header is a an IPv4 one, and that it is already pulled in skb->head. RAW sockets going through a tunnel can end up calling ipv4_link_failure() with total garbage in the skb, or arbitrary lengthes. syzbot report : BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204 CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x123/0x190 mm/kasan/generic.c:191 memcpy+0x38/0x50 mm/kasan/common.c:133 memcpy include/linux/string.h:355 [inline] __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204 dst_link_failure include/net/dst.h:427 [inline] vti6_xmit net/ipv6/ip6_vti.c:514 [inline] vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553 __netdev_start_xmit include/linux/netdevice.h:4414 [inline] netdev_start_xmit include/linux/netdevice.h:4423 [inline] xmit_one net/core/dev.c:3292 [inline] dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527 neigh_output include/net/neighbour.h:508 [inline] ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip_output+0x21f/0x670 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:289 [inline] raw_send_hdrinc net/ipv4/raw.c:432 [inline] raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:661 sock_write_iter+0x27c/0x3e0 net/socket.c:988 call_write_iter include/linux/fs.h:1866 [inline] new_sync_write+0x4c7/0x760 fs/read_write.c:474 __vfs_write+0xe4/0x110 fs/read_write.c:487 vfs_write+0x20c/0x580 fs/read_write.c:549 ksys_write+0x14f/0x2d0 fs/read_write.c:599 __do_sys_write fs/read_write.c:611 [inline] __se_sys_write fs/read_write.c:608 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:608 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458c29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29 RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4 R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff The buggy address belongs to the page: page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1fffc0000000000() raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 ^ ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27ipv4: ensure rcu_read_lock() in ipv4_link_failure()Eric Dumazet1-2/+8
[ Upstream commit c543cb4a5f07e09237ec0fc2c60c9f131b2c79ad ] fib_compute_spec_dst() needs to be called under rcu protection. syzbot reported : WARNING: suspicious RCU usage 5.1.0-rc4+ #165 Not tainted include/linux/inetdevice.h:220 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by swapper/0/0: #0: 0000000051b67925 ((&n->timer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:170 [inline] #0: 0000000051b67925 ((&n->timer)){+.-.}, at: call_timer_fn+0xda/0x720 kernel/time/timer.c:1315 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.1.0-rc4+ #165 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5162 __in_dev_get_rcu include/linux/inetdevice.h:220 [inline] fib_compute_spec_dst+0xbbd/0x1030 net/ipv4/fib_frontend.c:294 spec_dst_fill net/ipv4/ip_options.c:245 [inline] __ip_options_compile+0x15a7/0x1a10 net/ipv4/ip_options.c:343 ipv4_link_failure+0x172/0x400 net/ipv4/route.c:1195 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xd1/0x1c0 net/ipv4/arp.c:297 neigh_invalidate+0x24b/0x570 net/core/neighbour.c:995 neigh_timer_handler+0xc35/0xf30 net/core/neighbour.c:1081 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers kernel/time/timer.c:1681 [inline] __run_timers kernel/time/timer.c:1649 [inline] run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 __do_softirq+0x266/0x95a kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27ipv4: recompile ip options in ipv4_link_failureStephen Suryaputra1-1/+9
[ Upstream commit ed0de45a1008991fdaa27a0152befcb74d126a8b ] Recompile IP options since IPCB may not be valid anymore when ipv4_link_failure is called from arp_error_report. Refer to the commit 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error") and the commit before that (9ef6b42ad6fd) for a similar issue. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-19route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a raceXin Long1-0/+4
[ Upstream commit ee60ad219f5c7c4fb2f047f88037770063ef785f ] The race occurs in __mkroute_output() when 2 threads lookup a dst: CPU A CPU B find_exception() find_exception() [fnhe expires] ip_del_fnhe() [fnhe is deleted] rt_bind_exception() In rt_bind_exception() it will bind a deleted fnhe with the new dst, and this dst will get no chance to be freed. It causes a dev defcnt leak and consecutive dmesg warnings: unregister_netdevice: waiting for ethX to become free. Usage count = 1 Especially thanks Jon to identify the issue. This patch fixes it by setting fnhe_daddr to 0 in ip_del_fnhe() to stop binding the deleted fnhe with a new dst when checking fnhe's fnhe_daddr and daddr in rt_bind_exception(). It works as both ip_del_fnhe() and rt_bind_exception() are protected by fnhe_lock and the fhne is freed by kfree_rcu(). Fixes: deed49df7390 ("route: check and remove route cache when we get route") Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-19ipv4/route: fail early when inet dev is missingPaolo Abeni1-4/+5
[ Upstream commit 22c74764aa2943ecdf9f07c900d8a9c8ba6c9265 ] If a non local multicast packet reaches ip_route_input_rcu() while the ingress device IPv4 private data (in_dev) is NULL, we end up doing a NULL pointer dereference in IN_DEV_MFORWARD(). Since the later call to ip_route_input_mc() is going to fail if !in_dev, we can fail early in such scenario and avoid the dangerous code path. v1 -> v2: - clarified the commit message, no code changes Reported-by: Tianhao Zhao <tizhao@redhat.com> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-10ipv4: Add ICMPv6 support when parse route ipprotoHangbin Liu1-1/+1
[ Upstream commit 5e1a99eae84999a2536f50a0beaf5d5262337f40 ] For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers. But for ip -6 route, currently we only support tcp, udp and icmp. Add ICMPv6 support so we can match ipv6-icmp rules for route lookup. v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-23net: ipv4: use a dedicated counter for icmp_v4 redirect packetsLorenzo Bianconi1-2/+5
[ Upstream commit c09551c6ff7fe16a79a42133bcecba5fc2fc3291 ] According to the algorithm described in the comment block at the beginning of ip_rt_send_redirect, the host should try to send 'ip_rt_redirect_number' ICMP redirect packets with an exponential backoff and then stop sending them at all assuming that the destination ignores redirects. If the device has previously sent some ICMP error packets that are rate-limited (e.g TTL expired) and continues to receive traffic, the redirect packets will never be transmitted. This happens since peer->rate_tokens will be typically greater than 'ip_rt_redirect_number' and so it will never be reset even if the redirect silence timeout (ip_rt_redirect_silence) has elapsed without receiving any packet requiring redirects. Fix it by using a dedicated counter for the number of ICMP redirect packets that has been sent by the host I have not been able to identify a given commit that introduced the issue since ip_rt_send_redirect implements the same rate-limiting algorithm from commit 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-10net: ipv4: don't let PMTU updates increase route MTUSabrina Dubroca1-3/+4
When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29route: add support for directed broadcast forwardingXin Long1-1/+5
This patch implements the feature described in rfc1812#section-5.3.5.2 and rfc2644. It allows the router to forward directed broadcast when sysctl bc_forwarding is enabled. Note that this feature could be done by iptables -j TEE, but it would cause some problems: - target TEE's gateway param has to be set with a specific address, and it's not flexible especially when the route wants forward all directed broadcasts. - this duplicates the directed broadcasts so this may cause side effects to applications. Besides, to keep consistent with other os router like BSD, it's also necessary to implement it in the route rx path. Note that route cache needs to be flushed when bc_forwarding is changed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook1-1/+1
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook1-1/+2
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-39/+138
Pull networking updates from David Miller: 1) Add Maglev hashing scheduler to IPVS, from Inju Song. 2) Lots of new TC subsystem tests from Roman Mashak. 3) Add TCP zero copy receive and fix delayed acks and autotuning with SO_RCVLOWAT, from Eric Dumazet. 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard Brouer. 5) Add ttl inherit support to vxlan, from Hangbin Liu. 6) Properly separate ipv6 routes into their logically independant components. fib6_info for the routing table, and fib6_nh for sets of nexthops, which thus can be shared. From David Ahern. 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP messages from XDP programs. From Nikita V. Shirokov. 8) Lots of long overdue cleanups to the r8169 driver, from Heiner Kallweit. 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau. 10) Add traffic condition monitoring to iwlwifi, from Luca Coelho. 11) Plumb extack down into fib_rules, from Roopa Prabhu. 12) Add Flower classifier offload support to igb, from Vinicius Costa Gomes. 13) Add UDP GSO support, from Willem de Bruijn. 14) Add documentation for eBPF helpers, from Quentin Monnet. 15) Add TLS tx offload to mlx5, from Ilya Lesokhin. 16) Allow applications to be given the number of bytes available to read on a socket via a control message returned from recvmsg(), from Soheil Hassas Yeganeh. 17) Add x86_32 eBPF JIT compiler, from Wang YanQing. 18) Add AF_XDP sockets, with zerocopy support infrastructure as well. From Björn Töpel. 19) Remove indirect load support from all of the BPF JITs and handle these operations in the verifier by translating them into native BPF instead. From Daniel Borkmann. 20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha. 21) Allow XDP programs to do lookups in the main kernel routing tables for forwarding. From David Ahern. 22) Allow drivers to store hardware state into an ELF section of kernel dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy. 23) Various RACK and loss detection improvements in TCP, from Yuchung Cheng. 24) Add TCP SACK compression, from Eric Dumazet. 25) Add User Mode Helper support and basic bpfilter infrastructure, from Alexei Starovoitov. 26) Support ports and protocol values in RTM_GETROUTE, from Roopa Prabhu. 27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard Brouer. 28) Add lots of forwarding selftests, from Petr Machata. 29) Add generic network device failover driver, from Sridhar Samudrala. * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits) strparser: Add __strp_unpause and use it in ktls. rxrpc: Fix terminal retransmission connection ID to include the channel net: hns3: Optimize PF CMDQ interrupt switching process net: hns3: Fix for VF mailbox receiving unknown message net: hns3: Fix for VF mailbox cannot receiving PF response bnx2x: use the right constant Revert "net: sched: cls: Fix offloading when ingress dev is vxlan" net: dsa: b53: Fix for brcm tag issue in Cygnus SoC enic: fix UDP rss bits netdev-FAQ: clarify DaveM's position for stable backports rtnetlink: validate attributes in do_setlink() mlxsw: Add extack messages for port_{un, }split failures netdevsim: Add extack error message for devlink reload devlink: Add extack to reload and port_{un, }split operations net: metrics: add proper netlink validation ipmr: fix error path when ipmr_new_table fails ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds net: hns3: remove unused hclgevf_cfg_func_mta_filter netfilter: provide udp*_lib_lookup for nf_tproxy qed*: Utilize FW 8.37.2.0 ...
2018-06-04Merge branch 'hch.procfs' of ↵Linus Torvalds1-13/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_sho