<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/net/ipv6.h, branch v3.10.100</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>ipv6: distinguish frag queues by device for multicast and link-local packets</title>
<updated>2016-01-23T03:47:52+00:00</updated>
<author>
<name>Michal Kubeček</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2015-11-24T14:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7cd9f6022097acdeb1f14bc9a5d44c40629d11a9'/>
<id>7cd9f6022097acdeb1f14bc9a5d44c40629d11a9</id>
<content type='text'>
[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]

If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]

If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: get rid of ip_id_count</title>
<updated>2014-08-14T01:24:15+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-02T12:26:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ff1f69a89a613223c57c13190a6c9be928ac4b9d'/>
<id>ff1f69a89a613223c57c13190a6c9be928ac4b9d</id>
<content type='text'>
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: fix addr_len/msg-&gt;msg_namelen assignment in recv_error and rxpmtu functions</title>
<updated>2013-12-08T15:29:25+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2013-11-22T23:46:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=08c62a109ed5f716556b2211f8cfd0d5fe6d18d2'/>
<id>08c62a109ed5f716556b2211f8cfd0d5fe6d18d2</id>
<content type='text'>
[ Upstream commit 85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 ]

Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Reported-by: Tom Labanowski
Cc: mpb &lt;mpb.mail@gmail.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 ]

Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Reported-by: Tom Labanowski
Cc: mpb &lt;mpb.mail@gmail.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling</title>
<updated>2013-03-24T21:16:30+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2013-03-22T08:24:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=eec2e6185ff6eab18c2cae9b01a9fbc5c33248fc'/>
<id>eec2e6185ff6eab18c2cae9b01a9fbc5c33248fc</id>
<content type='text'>
Hello!

After patch 1 got accepted to net-next I will also send a patch to
netfilter-devel to make the corresponding changes to the netfilter
reassembly logic.

Thanks,

  Hannes

-- &gt;8 --
[PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This patch also ensures that INET_ECN_CE is propagated if one fragment
had the codepoint set.

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jesper Dangaard Brouer &lt;jbrouer@redhat.com&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hello!

After patch 1 got accepted to net-next I will also send a patch to
netfilter-devel to make the corresponding changes to the netfilter
reassembly logic.

Thanks,

  Hannes

-- &gt;8 --
[PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This patch also ensures that INET_ECN_CE is propagated if one fragment
had the codepoint set.

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jesper Dangaard Brouer &lt;jbrouer@redhat.com&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: introdcue __ipv6_addr_needs_scope_id and ipv6_iface_scope_id helper functions</title>
<updated>2013-03-08T17:29:22+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2013-03-08T02:07:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=b7ef213ef65256168df83ddfbb8131ed9adc10f9'/>
<id>b7ef213ef65256168df83ddfbb8131ed9adc10f9</id>
<content type='text'>
__ipv6_addr_needs_scope_id checks if an ipv6 address needs to supply
a 'sin6_scope_id != 0'. 'sin6_scope_id != 0' was enforced in case
of link-local addresses. To support interface-local multicast these
checks had to be enhanced and are now consolidated into these new helper
functions.

v2:
a) migrated to struct ipv6_addr_props

v3:
a) reverted changes for ipv6_addr_props
b) test for address type instead of comparing scope

v4:
a) unchanged

Suggested-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__ipv6_addr_needs_scope_id checks if an ipv6 address needs to supply
a 'sin6_scope_id != 0'. 'sin6_scope_id != 0' was enforced in case
of link-local addresses. To support interface-local multicast these
checks had to be enhanced and are now consolidated into these new helper
functions.

v2:
a) migrated to struct ipv6_addr_props

v3:
a) reverted changes for ipv6_addr_props
b) test for address type instead of comparing scope

v4:
a) unchanged

Suggested-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6 flowlabel: add __rcu annotations</title>
<updated>2013-03-07T21:33:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-03-07T04:20:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7f0e44ac9f7f12a2519bfed9ea4df3c1471bd8bb'/>
<id>7f0e44ac9f7f12a2519bfed9ea4df3c1471bd8bb</id>
<content type='text'>
Commit 18367681a10b (ipv6 flowlabel: Convert np-&gt;ipv6_fl_list to RCU.)
omitted proper __rcu annotations.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 18367681a10b (ipv6 flowlabel: Convert np-&gt;ipv6_fl_list to RCU.)
omitted proper __rcu annotations.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: use a stronger hash for tcp</title>
<updated>2013-02-21T23:15:58+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-02-21T12:18:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664'/>
<id>08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664</id>
<content type='text'>
It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.

We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.

inet6_ehashfn() can also separately use the ports, instead
of xoring them.

Reported-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.

We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.

inet6_ehashfn() can also separately use the ports, instead
of xoring them.

Reported-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6 flowlabel: Convert np-&gt;ipv6_fl_list to RCU.</title>
<updated>2013-01-31T03:41:13+00:00</updated>
<author>
<name>YOSHIFUJI Hideaki / 吉藤英明</name>
<email>yoshfuji@linux-ipv6.org</email>
</author>
<published>2013-01-30T09:27:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=18367681a10bd29c3f2305e6b7b984de5b33d548'/>
<id>18367681a10bd29c3f2305e6b7b984de5b33d548</id>
<content type='text'>
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6 flowlabel: Convert hash list to RCU.</title>
<updated>2013-01-31T03:41:13+00:00</updated>
<author>
<name>YOSHIFUJI Hideaki / 吉藤英明</name>
<email>yoshfuji@linux-ipv6.org</email>
</author>
<published>2013-01-30T09:27:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d3aedd5ebd4b0b925b0bcda548066803e1318499'/>
<id>d3aedd5ebd4b0b925b0bcda548066803e1318499</id>
<content type='text'>
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: frag helper functions for mem limit tracking</title>
<updated>2013-01-29T18:36:24+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-01-28T23:45:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d433673e5f9180e05a770c4b2ab18c08ad51cc21'/>
<id>d433673e5f9180e05a770c4b2ab18c08ad51cc21</id>
<content type='text'>
This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
