From d2615bf450694c1302d86b9cc8a8958edfe4c3a4 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich Date: Tue, 19 Nov 2013 20:47:15 -0500 Subject: net: core: Always propagate flag changes to interfaces The following commit: b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 net: only invoke dev->change_rx_flags when device is UP tried to fix a problem with VLAN devices and promiscuouse flag setting. The issue was that VLAN device was setting a flag on an interface that was down, thus resulting in bad promiscuity count. This commit blocked flag propagation to any device that is currently down. A later commit: deede2fabe24e00bd7e246eb81cd5767dc6fcfc7 vlan: Don't propagate flag changes on down interfaces fixed VLAN code to only propagate flags when the VLAN interface is up, thus fixing the same issue as above, only localized to VLAN. The problem we have now is that if we have create a complex stack involving multiple software devices like bridges, bonds, and vlans, then it is possible that the flags would not propagate properly to the physical devices. A simple examle of the scenario is the following: eth0----> bond0 ----> bridge0 ---> vlan50 If bond0 or eth0 happen to be down at the time bond0 is added to the bridge, then eth0 will never have promisc mode set which is currently required for operation as part of the bridge. As a result, packets with vlan50 will be dropped by the interface. The only 2 devices that implement the special flag handling are VLAN and DSA and they both have required code to prevent incorrect flag propagation. As a result we can remove the generic solution introduced in b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 and leave it to the individual devices to decide whether they will block flag propagation or not. Reported-by: Stefan Priebe Suggested-by: Veaceslav Falico Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/core') diff --git a/net/core/dev.c b/net/core/dev.c index 7e00a7342ee6..ba3b7ea5ebb3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4996,7 +4996,7 @@ static void dev_change_rx_flags(struct net_device *dev, int flags) { const struct net_device_ops *ops = dev->netdev_ops; - if ((dev->flags & IFF_UP) && ops->ndo_change_rx_flags) + if (ops->ndo_change_rx_flags) ops->ndo_change_rx_flags(dev, flags); } -- cgit v1.2.3 From f3d3342602f8bcbf37d7c46641cb9bca7618eb1c Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Thu, 21 Nov 2013 03:14:22 +0100 Subject: net: rework recvmsg handler msg_name and msg_namelen logic This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller Suggested-by: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- net/core/iovec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net/core') diff --git a/net/core/iovec.c b/net/core/iovec.c index 4cdb7c48dad6..b61869429f4c 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c @@ -48,7 +48,8 @@ int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr_storage *a if (err < 0) return err; } - m->msg_name = address; + if (m->msg_name) + m->msg_name = address; } else { m->msg_name = NULL; } -- cgit v1.2.3 From 9d8506cc2d7ea1f911c72c100193a3677f6668c3 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 21 Nov 2013 11:10:04 -0800 Subject: gso: handle new frag_list of frags GRO packets Recently GRO started generating packets with frag_lists of frags. This was not handled by GSO, thus leading to a crash. Thankfully these packets are of a regular form and are easy to handle. This patch handles them in two ways. For completely non-linear frag_list entries, we simply continue to iterate over the frag_list frags once we exhaust the normal frags. For frag_list entries with linear parts, we call pskb_trim on the first part of the frag_list skb, and then process the rest of the frags in the usual way. This patch also kills a chunk of dead frag_list code that has obviously never ever been run since it ends up generating a bogus GSO-segmented packet with a frag_list entry. Future work is planned to split super big packets into TSO ones. Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb") Reported-by: Christoph Paasch Reported-by: Jerry Chu Reported-by: Sander Eikelenboom Signed-off-by: Herbert Xu Signed-off-by: Eric Dumazet Tested-by: Sander Eikelenboom Tested-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/skbuff.c | 75 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 25 deletions(-) (limited to 'net/core') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8cec1e6b844d..2718fed53d8c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2796,6 +2796,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) struct sk_buff *segs = NULL; struct sk_buff *tail = NULL; struct sk_buff *fskb = skb_shinfo(skb)->frag_list; + skb_frag_t *skb_frag = skb_shinfo(skb)->frags; unsigned int mss = skb_shinfo(skb)->gso_size; unsigned int doffset = skb->data - skb_mac_header(skb); unsigned int offset = doffset; @@ -2835,16 +2836,38 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) if (hsize > len || !sg) hsize = len; - if (!hsize && i >= nfrags) { - BUG_ON(fskb->len != len); + if (!hsize && i >= nfrags && skb_headlen(fskb) && + (skb_headlen(fskb) == len || sg)) { + BUG_ON(skb_headlen(fskb) > len); + + i = 0; + nfrags = skb_shinfo(fskb)->nr_frags; + skb_frag = skb_shinfo(fskb)->frags; + pos += skb_headlen(fskb); + + while (pos < offset + len) { + BUG_ON(i >= nfrags); + + size = skb_frag_size(skb_frag); + if (pos + size > offset + len) + break; + + i++; + pos += size; + skb_frag++; + } - pos += len; nskb = skb_clone(fskb, GFP_ATOMIC); fskb = fskb->next; if (unlikely(!nskb)) goto err; + if (unlikely(pskb_trim(nskb, len))) { + kfree_skb(nskb); + goto err; + } + hsize = skb_end_offset(nskb); if (skb_cow_head(nskb, doffset + headroom)) { kfree_skb(nskb); @@ -2881,7 +2904,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) nskb->data - tnl_hlen, doffset + tnl_hlen); - if (fskb != skb_shinfo(skb)->frag_list) + if (nskb->len == len + doffset) goto perform_csum_check; if (!sg) { @@ -2899,8 +2922,28 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) skb_shinfo(nskb)->tx_flags = skb_shinfo(skb)->tx_flags & SKBTX_SHARED_FRAG; - while (pos < offset + len && i < nfrags) { - *frag = skb_shinfo(skb)->frags[i]; + while (pos < offset + len) { + if (i >= nfrags) { + BUG_ON(skb_headlen(fskb)); + + i = 0; + nfrags = skb_shinfo(fskb)->nr_frags; + skb_frag = skb_shinfo(fskb)->frags; + + BUG_ON(!nfrags); + + fskb = fskb->next; + } + + if (unlikely(skb_shinfo(nskb)->nr_frags >= + MAX_SKB_FRAGS)) { + net_warn_ratelimited( + "skb_segment: too many frags: %u %u\n", + pos, mss); + goto err; + } + + *frag = *skb_frag; __skb_frag_ref(frag); size = skb_frag_size(frag); @@ -2913,6 +2956,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) if (pos + size <= offset + len) { i++; + skb_frag++; pos += size; } else { skb_frag_size_sub(frag, pos + size - (offset + len)); @@ -2922,25 +2966,6 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) frag++; } - if (pos < offset + len) { - struct sk_buff *fskb2 = fskb; - - BUG_ON(pos + fskb->len != offset + len); - - pos += fskb->len; - fskb = fskb->next; - - if (fskb2->next) { - fskb2 = skb_clone(fskb2, GFP_ATOMIC); - if (!fskb2) - goto err; - } else - skb_get(fskb2); - - SKB_FRAG_ASSERT(nskb); - skb_shinfo(nskb)->frag_list = fskb2; - } - skip_fraglist: nskb->data_len = len - hsize; nskb->len += nskb->data_len; -- cgit v1.2.3 From 3868204d6b89ea373a273e760609cb08020beb1a Mon Sep 17 00:00:00 2001 From: "fan.du" Date: Sun, 1 Dec 2013 16:28:48 +0800 Subject: {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation commit a553e4a6317b2cfc7659542c10fe43184ffe53da ("[PKTGEN]: IPSEC support") tried to support IPsec ESP transport transformation for pktgen, but acctually this doesn't work at all for two reasons(The orignal transformed packet has bad IPv4 checksum value, as well as wrong auth value, reported by wireshark) - After transpormation, IPv4 header total length needs update, because encrypted payload's length is NOT same as that of plain text. - After transformation, IPv4 checksum needs re-caculate because of payload has been changed. With this patch, armmed pktgen with below cofiguration, Wireshark is able to decrypted ESP packet generated by pktgen without any IPv4 checksum error or auth value error. pgset "flag IPSEC" pgset "flows 1" Signed-off-by: Fan Du Signed-off-by: David S. Miller --- net/core/pktgen.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'net/core') diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 261357a66300..a797fff7f222 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -2527,6 +2527,8 @@ static int process_ipsec(struct pktgen_dev *pkt_dev, if (x) { int ret; __u8 *eth; + struct iphdr *iph; + nhead = x->props.header_len - skb_headroom(skb); if (nhead > 0) { ret = pskb_expand_head(skb, nhead, 0, GFP_ATOMIC); @@ -2548,6 +2550,11 @@ static int process_ipsec(struct pktgen_dev *pkt_dev, eth = (__u8 *) skb_push(skb, ETH_HLEN); memcpy(eth, pkt_dev->hh, 12); *(u16 *) ð[12] = protocol; + + /* Update IPv4 header len as well as checksum value */ + iph = ip_hdr(skb); + iph->tot_len = htons(skb->len - ETH_HLEN); + ip_send_check(iph); } } return 1; -- cgit v1.2.3 From 239c78db9c41a8f524cce60507440d72229d73bc Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Thu, 5 Dec 2013 23:29:19 +0100 Subject: net: clear local_df when passing skb between namespaces We must clear local_df when passing the skb between namespaces as the packet is not local to the new namespace any more and thus may not get fragmented by local rules. Fred Templin noticed that other namespaces do fragment IPv6 packets while forwarding. Instead they should have send back a PTB. The same problem should be present when forwarding DF-IPv4 packets between namespaces. Reported-by: Templin, Fred L Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- net/core/skbuff.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net/core') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2718fed53d8c..06e72d3cdf60 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3584,6 +3584,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) skb->tstamp.tv64 = 0; skb->pkt_type = PACKET_HOST; skb->skb_iif = 0; + skb->local_df = 0; skb_dst_drop(skb); skb->mark = 0; secpath_reset(skb); -- cgit v1.2.3 From d323e92cc3f4edd943610557c9ea1bb4bb5056e8 Mon Sep 17 00:00:00 2001 From: Changli Gao Date: Sun, 8 Dec 2013 09:36:56 -0500 Subject: net: drop_monitor: fix the value of maxattr maxattr in genl_family should be used to save the max attribute type, but not the max command type. Drop monitor doesn't support any attributes, so we should leave it as zero. Signed-off-by: David S. Miller --- net/core/drop_monitor.c | 1 - 1 file changed, 1 deletion(-) (limited to 'net/core') diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 95897183226e..e70301eb7a4a 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -64,7 +64,6 @@ static struct genl_family net_drop_monitor_family = { .hdrsize = 0, .name = "NET_DM", .version = 2, - .maxattr = NET_DM_CMD_MAX, }; static DEFINE_PER_CPU(struct per_cpu_dm_data, dm_cpu_data); -- cgit v1.2.3 From 12663bfc97c8b3fdb292428105dd92d563164050 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Sat, 7 Dec 2013 17:26:27 -0500 Subject: net: unix: allow set_peek_off to fail unix_dgram_recvmsg() will hold the readlock of the socket until recv is complete. In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until unix_dgram_recvmsg() will complete (which can take a while) without allowing us to break out of it, triggering a hung task spew. Instead, allow set_peek_off to fail, this way userspace will not hang. Signed-off-by: Sasha Levin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/core') diff --git a/net/core/sock.c b/net/core/sock.c index ab20ed9b0f31..5393b4b719d7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -882,7 +882,7 @@ set_rcvbuf: case SO_PEEK_OFF: if (sock->ops->set_peek_off) - sock->ops->set_peek_off(sk, val); + ret = sock->ops->set_peek_off(sk, val); else ret = -EOPNOTSUPP; break; -- cgit v1.2.3