<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4/af_inet.c, branch v4.3</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>net: Make table id type u32</title>
<updated>2015-09-01T21:32:44+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2015-09-01T20:26:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=9b8ff51822893e743eee09350c1928daa3ef503f'/>
<id>9b8ff51822893e743eee09350c1928daa3ef503f</id>
<content type='text'>
A number of VRF patches used 'int' for table id. It should be u32 to be
consistent with the rest of the stack.

Fixes:
4e3c89920cd3a ("net: Introduce VRF related flags and helpers")
15be405eb2ea9 ("net: Add inet_addr lookup by table")
30bbaa1950055 ("net: Fix up inet_addr_type checks")
021dd3b8a142d ("net: Add routes to the table associated with the device")
dc028da54ed35 ("inet: Move VRF table lookup to inlined function")
f6d3c19274c74 ("net: FIB tracepoints")

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A number of VRF patches used 'int' for table id. It should be u32 to be
consistent with the rest of the stack.

Fixes:
4e3c89920cd3a ("net: Introduce VRF related flags and helpers")
15be405eb2ea9 ("net: Add inet_addr lookup by table")
30bbaa1950055 ("net: Fix up inet_addr_type checks")
021dd3b8a142d ("net: Add routes to the table associated with the device")
dc028da54ed35 ("inet: Move VRF table lookup to inlined function")
f6d3c19274c74 ("net: FIB tracepoints")

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fix 32b build</title>
<updated>2015-08-31T18:32:41+00:00</updated>
<author>
<name>Madalin Bucur</name>
<email>madalin.bucur@freescale.com</email>
</author>
<published>2015-08-31T12:46:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d1bfc62591a0a5144dc380976e737fbbb4f40f4f'/>
<id>d1bfc62591a0a5144dc380976e737fbbb4f40f4f</id>
<content type='text'>
Address remaining issue after 80ec192.

Signed-off-by: Madalin Bucur &lt;madalin.bucur@freescale.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Address remaining issue after 80ec192.

Signed-off-by: Madalin Bucur &lt;madalin.bucur@freescale.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Fix 32-bit build.</title>
<updated>2015-08-31T05:40:44+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-08-31T05:40:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=80ec1927b102480f89436e7d7f961a263e916a44'/>
<id>80ec1927b102480f89436e7d7f961a263e916a44</id>
<content type='text'>
   net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64':
&gt;&gt; net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function)
      v = *(((u64 *)bhptr) + offt);
                             ^
   net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in
   net/ipv4/af_inet.c: In function 'snmp_fold_field64':
&gt;&gt; net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function)
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
                                          ^
&gt;&gt; net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field'
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
             ^
   net/ipv4/af_inet.c:1455:5: note: declared here
    u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
        ^

Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
   net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64':
&gt;&gt; net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function)
      v = *(((u64 *)bhptr) + offt);
                             ^
   net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in
   net/ipv4/af_inet.c: In function 'snmp_fold_field64':
&gt;&gt; net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function)
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
                                          ^
&gt;&gt; net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field'
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
             ^
   net/ipv4/af_inet.c:1455:5: note: declared here
    u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
        ^

Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Introduce helper functions to get the per cpu data</title>
<updated>2015-08-31T04:48:58+00:00</updated>
<author>
<name>Raghavendra K T</name>
<email>raghavendra.kt@linux.vnet.ibm.com</email>
</author>
<published>2015-08-30T05:59:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c4c6bc314618f60ba69b0cbf93e506e4c38a11d2'/>
<id>c4c6bc314618f60ba69b0cbf93e506e4c38a11d2</id>
<content type='text'>
Signed-off-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: Move VRF table lookup to inlined function</title>
<updated>2015-08-17T22:58:57+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2015-08-16T23:13:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=dc028da54ed353edd44dca88b7eb19fd5126c354'/>
<id>dc028da54ed353edd44dca88b7eb19fd5126c354</id>
<content type='text'>
Table lookup compiles out when VRF is not enabled.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Table lookup compiles out when VRF is not enabled.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix up inet_addr_type checks</title>
<updated>2015-08-14T05:43:21+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2015-08-13T20:59:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=30bbaa19500559d7625c65632195413f639b3b97'/>
<id>30bbaa19500559d7625c65632195413f639b3b97</id>
<content type='text'>
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ip_tunnel: Call ip_tunnel_core_init() from inet_init()</title>
<updated>2015-07-23T08:28:21+00:00</updated>
<author>
<name>Thomas Graf</name>
<email>tgraf@suug.ch</email>
</author>
<published>2015-07-23T08:08:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=045a0fa0c5f5ea0f16c009f924ea579634afbba8'/>
<id>045a0fa0c5f5ea0f16c009f924ea579634afbba8</id>
<content type='text'>
Convert the module_init() to a invocation from inet_init() since
ip_tunnel_core is part of the INET built-in.

Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert the module_init() to a invocation from inet_init() since
ip_tunnel_core is part of the INET built-in.

Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-06-24T09:58:51+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-06-24T09:58:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=3a07bd6fead4f00f67b1bf5f551e686661c4f52c'/>
<id>3a07bd6fead4f00f67b1bf5f551e686661c4f52c</id>
<content type='text'>
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Do not call tcp_fastopen_reset_cipher from interrupt context</title>
<updated>2015-06-23T09:38:10+00:00</updated>
<author>
<name>Christoph Paasch</name>
<email>cpaasch@apple.com</email>
</author>
<published>2015-06-18T16:15:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=dfea2aa654243f70dc53b8648d0bbdeec55a7df1'/>
<id>dfea2aa654243f70dc53b8648d0bbdeec55a7df1</id>
<content type='text'>
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.

Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:

[   36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   36.008250]  00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[   36.009630]  ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[   36.011076]  0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[   36.012494] Call Trace:
[   36.012953]  &lt;IRQ&gt;  [&lt;ffffffff8171d53a&gt;] dump_stack+0x4f/0x6d
[   36.014085]  [&lt;ffffffff810967d3&gt;] ___might_sleep+0x103/0x170
[   36.015117]  [&lt;ffffffff81096892&gt;] __might_sleep+0x52/0x90
[   36.016117]  [&lt;ffffffff8118e887&gt;] kmem_cache_alloc_trace+0x47/0x190
[   36.017266]  [&lt;ffffffff81680d82&gt;] ? tcp_fastopen_reset_cipher+0x42/0x130
[   36.018485]  [&lt;ffffffff81680d82&gt;] tcp_fastopen_reset_cipher+0x42/0x130
[   36.019679]  [&lt;ffffffff81680f01&gt;] tcp_fastopen_init_key_once+0x61/0x70
[   36.020884]  [&lt;ffffffff81680f2c&gt;] __tcp_fastopen_cookie_gen+0x1c/0x60
[   36.022058]  [&lt;ffffffff816814ff&gt;] tcp_try_fastopen+0x58f/0x730
[   36.023118]  [&lt;ffffffff81671788&gt;] tcp_conn_request+0x3e8/0x7b0
[   36.024185]  [&lt;ffffffff810e3872&gt;] ? __module_text_address+0x12/0x60
[   36.025327]  [&lt;ffffffff8167b2e1&gt;] tcp_v4_conn_request+0x51/0x60
[   36.026410]  [&lt;ffffffff816727e0&gt;] tcp_rcv_state_process+0x190/0xda0
[   36.027556]  [&lt;ffffffff81661f97&gt;] ? __inet_lookup_established+0x47/0x170
[   36.028784]  [&lt;ffffffff8167c2ad&gt;] tcp_v4_do_rcv+0x16d/0x3d0
[   36.029832]  [&lt;ffffffff812e6806&gt;] ? security_sock_rcv_skb+0x16/0x20
[   36.030936]  [&lt;ffffffff8167cc8a&gt;] tcp_v4_rcv+0x77a/0x7b0
[   36.031875]  [&lt;ffffffff816af8c3&gt;] ? iptable_filter_hook+0x33/0x70
[   36.032953]  [&lt;ffffffff81657d22&gt;] ip_local_deliver_finish+0x92/0x1f0
[   36.034065]  [&lt;ffffffff81657f1a&gt;] ip_local_deliver+0x9a/0xb0
[   36.035069]  [&lt;ffffffff81657c90&gt;] ? ip_rcv+0x3d0/0x3d0
[   36.035963]  [&lt;ffffffff81657569&gt;] ip_rcv_finish+0x119/0x330
[   36.036950]  [&lt;ffffffff81657ba7&gt;] ip_rcv+0x2e7/0x3d0
[   36.037847]  [&lt;ffffffff81610652&gt;] __netif_receive_skb_core+0x552/0x930
[   36.038994]  [&lt;ffffffff81610a57&gt;] __netif_receive_skb+0x27/0x70
[   36.040033]  [&lt;ffffffff81610b72&gt;] process_backlog+0xd2/0x1f0
[   36.041025]  [&lt;ffffffff81611482&gt;] net_rx_action+0x122/0x310
[   36.042007]  [&lt;ffffffff81076743&gt;] __do_softirq+0x103/0x2f0
[   36.042978]  [&lt;ffffffff81723e3c&gt;] do_softirq_own_stack+0x1c/0x30

This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch &lt;cpaasch@apple.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.

Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:

[   36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   36.008250]  00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[   36.009630]  ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[   36.011076]  0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[   36.012494] Call Trace:
[   36.012953]  &lt;IRQ&gt;  [&lt;ffffffff8171d53a&gt;] dump_stack+0x4f/0x6d
[   36.014085]  [&lt;ffffffff810967d3&gt;] ___might_sleep+0x103/0x170
[   36.015117]  [&lt;ffffffff81096892&gt;] __might_sleep+0x52/0x90
[   36.016117]  [&lt;ffffffff8118e887&gt;] kmem_cache_alloc_trace+0x47/0x190
[   36.017266]  [&lt;ffffffff81680d82&gt;] ? tcp_fastopen_reset_cipher+0x42/0x130
[   36.018485]  [&lt;ffffffff81680d82&gt;] tcp_fastopen_reset_cipher+0x42/0x130
[   36.019679]  [&lt;ffffffff81680f01&gt;] tcp_fastopen_init_key_once+0x61/0x70
[   36.020884]  [&lt;ffffffff81680f2c&gt;] __tcp_fastopen_cookie_gen+0x1c/0x60
[   36.022058]  [&lt;ffffffff816814ff&gt;] tcp_try_fastopen+0x58f/0x730
[   36.023118]  [&lt;ffffffff81671788&gt;] tcp_conn_request+0x3e8/0x7b0
[   36.024185]  [&lt;ffffffff810e3872&gt;] ? __module_text_address+0x12/0x60
[   36.025327]  [&lt;ffffffff8167b2e1&gt;] tcp_v4_conn_request+0x51/0x60
[   36.026410]  [&lt;ffffffff816727e0&gt;] tcp_rcv_state_process+0x190/0xda0
[   36.027556]  [&lt;ffffffff81661f97&gt;] ? __inet_lookup_established+0x47/0x170
[   36.028784]  [&lt;ffffffff8167c2ad&gt;] tcp_v4_do_rcv+0x16d/0x3d0
[   36.029832]  [&lt;ffffffff812e6806&gt;] ? security_sock_rcv_skb+0x16/0x20
[   36.030936]  [&lt;ffffffff8167cc8a&gt;] tcp_v4_rcv+0x77a/0x7b0
[   36.031875]  [&lt;ffffffff816af8c3&gt;] ? iptable_filter_hook+0x33/0x70
[   36.032953]  [&lt;ffffffff81657d22&gt;] ip_local_deliver_finish+0x92/0x1f0
[   36.034065]  [&lt;ffffffff81657f1a&gt;] ip_local_deliver+0x9a/0xb0
[   36.035069]  [&lt;ffffffff81657c90&gt;] ? ip_rcv+0x3d0/0x3d0
[   36.035963]  [&lt;ffffffff81657569&gt;] ip_rcv_finish+0x119/0x330
[   36.036950]  [&lt;ffffffff81657ba7&gt;] ip_rcv+0x2e7/0x3d0
[   36.037847]  [&lt;ffffffff81610652&gt;] __netif_receive_skb_core+0x552/0x930
[   36.038994]  [&lt;ffffffff81610a57&gt;] __netif_receive_skb+0x27/0x70
[   36.040033]  [&lt;ffffffff81610b72&gt;] process_backlog+0xd2/0x1f0
[   36.041025]  [&lt;ffffffff81611482&gt;] net_rx_action+0x122/0x310
[   36.042007]  [&lt;ffffffff81076743&gt;] __do_softirq+0x103/0x2f0
[   36.042978]  [&lt;ffffffff81723e3c&gt;] do_softirq_own_stack+0x1c/0x30

This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch &lt;cpaasch@apple.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations</title>
<updated>2015-06-07T06:57:12+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-06-07T04:17:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=90c337da1524863838658078ec34241f45d8394d'/>
<id>90c337da1524863838658078ec34241f45d8394d</id>
<content type='text'>
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &amp;
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &amp;sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &amp;
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
