<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/netfilter/xt_CT.c, branch v4.3</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf</title>
<updated>2015-09-06T04:57:42+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-09-06T00:36:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=53cfd053e4d82b46dddc5d3522700263016e3156'/>
<id>53cfd053e4d82b46dddc5d3522700263016e3156</id>
<content type='text'>
Conflicts:
	include/net/netfilter/nf_conntrack.h

The conflict was an overlap between changing the type of the zone
argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free.

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net, they are:

1) Oneliner to restore maps in nf_tables since we support addressing registers
   at 32 bits level.

2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n,
   oneliner from Bernhard Thaler.

3) Out of bound access in ipset hash:net* set types, reported by Dave Jones'
   KASan utility, patch from Jozsef Kadlecsik.

4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of
   unnamed unions, patch from Elad Raz.

5) Add a workaround to address inconsistent endianess in the res_id field of
   nfnetlink batch messages, reported by Florian Westphal.

6) Fix error paths of CT/synproxy since the conntrack template was moved to use
   kmalloc, patch from Daniel Borkmann.

All of them look good to me to reach 4.2, I can route this to -stable myself
too, just let me know what you prefer.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	include/net/netfilter/nf_conntrack.h

The conflict was an overlap between changing the type of the zone
argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free.

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net, they are:

1) Oneliner to restore maps in nf_tables since we support addressing registers
   at 32 bits level.

2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n,
   oneliner from Bernhard Thaler.

3) Out of bound access in ipset hash:net* set types, reported by Dave Jones'
   KASan utility, patch from Jozsef Kadlecsik.

4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of
   unnamed unions, patch from Elad Raz.

5) Add a workaround to address inconsistent endianess in the res_id field of
   nfnetlink batch messages, reported by Florian Westphal.

6) Fix error paths of CT/synproxy since the conntrack template was moved to use
   kmalloc, patch from Daniel Borkmann.

All of them look good to me to reach 4.2, I can route this to -stable myself
too, just let me know what you prefer.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths</title>
<updated>2015-09-01T10:15:08+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-31T17:11:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=9cf94eab8b309e8bcc78b41dd1561c75b537dd0b'/>
<id>9cf94eab8b309e8bcc78b41dd1561c75b537dd0b</id>
<content type='text'>
Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack
templates") migrated templates to the new allocator api, but forgot to
update error paths for them in CT and synproxy to use nf_ct_tmpl_free()
instead of nf_conntrack_free().

Due to that, memory is being freed into the wrong kmemcache, but also
we drop the per net reference count of ct objects causing an imbalance.

In Brad's case, this leads to a wrap-around of net-&gt;ct.count and thus
lets __nf_conntrack_alloc() refuse to create a new ct object:

  [   10.340913] xt_addrtype: ipv6 does not support BROADCAST matching
  [   10.810168] nf_conntrack: table full, dropping packet
  [   11.917416] r8169 0000:07:00.0 eth0: link up
  [   11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
  [   12.815902] nf_conntrack: table full, dropping packet
  [   15.688561] nf_conntrack: table full, dropping packet
  [   15.689365] nf_conntrack: table full, dropping packet
  [   15.690169] nf_conntrack: table full, dropping packet
  [   15.690967] nf_conntrack: table full, dropping packet
  [...]

With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs.
nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus,
to fix the problem, export and use nf_ct_tmpl_free() instead.

Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates")
Reported-by: Brad Jackson &lt;bjackson0971@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack
templates") migrated templates to the new allocator api, but forgot to
update error paths for them in CT and synproxy to use nf_ct_tmpl_free()
instead of nf_conntrack_free().

Due to that, memory is being freed into the wrong kmemcache, but also
we drop the per net reference count of ct objects causing an imbalance.

In Brad's case, this leads to a wrap-around of net-&gt;ct.count and thus
lets __nf_conntrack_alloc() refuse to create a new ct object:

  [   10.340913] xt_addrtype: ipv6 does not support BROADCAST matching
  [   10.810168] nf_conntrack: table full, dropping packet
  [   11.917416] r8169 0000:07:00.0 eth0: link up
  [   11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
  [   12.815902] nf_conntrack: table full, dropping packet
  [   15.688561] nf_conntrack: table full, dropping packet
  [   15.689365] nf_conntrack: table full, dropping packet
  [   15.690169] nf_conntrack: table full, dropping packet
  [   15.690967] nf_conntrack: table full, dropping packet
  [...]

With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs.
nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus,
to fix the problem, export and use nf_ct_tmpl_free() instead.

Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates")
Reported-by: Brad Jackson &lt;bjackson0971@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2015-08-21T04:09:05+00:00</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2015-08-20T23:10:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=81bf1c64e7fe08f956c74fe2b0f1fa6eb163bd91'/>
<id>81bf1c64e7fe08f956c74fe2b0f1fa6eb163bd91</id>
<content type='text'>
Resolve conflicts with conntrack template fixes.

Conflicts:
	net/netfilter/nf_conntrack_core.c
	net/netfilter/nf_synproxy_core.c
	net/netfilter/xt_CT.c

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Resolve conflicts with conntrack template fixes.

Conflicts:
	net/netfilter/nf_conntrack_core.c
	net/netfilter/nf_synproxy_core.c
	net/netfilter/xt_CT.c

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: add efficient mark to zone mapping</title>
<updated>2015-08-17T23:24:05+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-14T14:03:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5e8018fc61423e677398d4ad4d72df70b9788e77'/>
<id>5e8018fc61423e677398d4ad4d72df70b9788e77</id>
<content type='text'>
This work adds the possibility of deriving the zone id from the skb-&gt;mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.

Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.

In order to not add/expose an extra ct-&gt;status bit, the zone structure has
been extended to carry a flag for deriving the mark.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This work adds the possibility of deriving the zone id from the skb-&gt;mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.

Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.

In order to not add/expose an extra ct-&gt;status bit, the zone structure has
been extended to carry a flag for deriving the mark.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: add direction support for zones</title>
<updated>2015-08-17T23:22:50+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-14T14:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=deedb59039f111c41aa5a54ee384c8e7c08bc78a'/>
<id>deedb59039f111c41aa5a54ee384c8e7c08bc78a</id>
<content type='text'>
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct-&gt;mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone-&gt;id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

  +--- tenant-1 ---+   mark := 1
  |    netperf     |--+
  +----------------+  |                CT zone := mark [ORIGINAL]
   [ip,sport] := X   +--------------+  +--- gateway ---+
                     | mark routing |--|     SNAT      |-- ... +
                     +--------------+  +---------------+       |
  +--- tenant-2 ---+  |                                     ~~~|~~~
  |    netperf     |--+                +-----------+           |
  +----------------+   mark := 2       | netserver |------ ... +
   [ip,sport] := X                     +-----------+
                                        [ip,port] := Y
On the gateway netns, example:

  iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
  iptables -t nat -A POSTROUTING -o &lt;dev&gt; -j SNAT --to-source &lt;ip&gt; --random-fully

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

conntrack dump from gateway netns:

  netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns

  tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
                        src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
                        src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

  [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
  [2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct-&gt;mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone-&gt;id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

  +--- tenant-1 ---+   mark := 1
  |    netperf     |--+
  +----------------+  |                CT zone := mark [ORIGINAL]
   [ip,sport] := X   +--------------+  +--- gateway ---+
                     | mark routing |--|     SNAT      |-- ... +
                     +--------------+  +---------------+       |
  +--- tenant-2 ---+  |                                     ~~~|~~~
  |    netperf     |--+                +-----------+           |
  +----------------+   mark := 2       | netserver |------ ... +
   [ip,sport] := X                     +-----------+
                                        [ip,port] := Y
On the gateway netns, example:

  iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
  iptables -t nat -A POSTROUTING -o &lt;dev&gt; -j SNAT --to-source &lt;ip&gt; --random-fully

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

conntrack dump from gateway netns:

  netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns

  tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
                        src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
                        src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

  [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
  [2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: push zone object into functions</title>
<updated>2015-08-11T10:29:01+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-08T19:40:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=308ac9143ee2208f54d061eca54a89da509b5d92'/>
<id>308ac9143ee2208f54d061eca54a89da509b5d92</id>
<content type='text'>
This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.

No functional changes in this patch.

The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.

No functional changes in this patch.

The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: checking for IS_ERR() instead of NULL</title>
<updated>2015-07-30T12:04:19+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2015-07-27T22:42:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1a727c63612fc582370cf3dc01239d3d239743b5'/>
<id>1a727c63612fc582370cf3dc01239d3d239743b5</id>
<content type='text'>
We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc()
so the error handling needs to changed to check for NULL instead of
IS_ERR().

Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack templates')
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc()
so the error handling needs to changed to check for NULL instead of
IS_ERR().

Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack templates')
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: fix netns dependencies with conntrack templates</title>
<updated>2015-07-20T12:58:19+00:00</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2015-07-13T13:11:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=0838aa7fcfcd875caa7bcc5dab0c3fd40444553d'/>
<id>0838aa7fcfcd875caa7bcc5dab0c3fd40444553d</id>
<content type='text'>
Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

  ip netns add foo
  ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
  ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net-&gt;ct.count &gt; 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net-&gt;ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info-&gt;ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Tested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

  ip netns add foo
  ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
  ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net-&gt;ct.count &gt; 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net-&gt;ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info-&gt;ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Tested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt</title>
<updated>2014-02-05T16:46:06+00:00</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2014-02-03T19:01:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e53376bef2cd97d3e3f61fdc677fb8da7d03d0da'/>
<id>e53376bef2cd97d3e3f61fdc677fb8da7d03d0da</id>
<content type='text'>
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&amp;ct-&gt;use) (use = 2)
                                         if (!l4proto-&gt;new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

&lt;4&gt;[67096.759334] ------------[ cut here ]------------
&lt;2&gt;[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
&lt;4&gt;[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
&lt;4&gt;[67096.759932] RIP: 0010:[&lt;ffffffffa03d99ac&gt;]  [&lt;ffffffffa03d99ac&gt;] destroy_conntrack+0x15c/0x190 [nf_conntrack]
&lt;4&gt;[67096.760255] Call Trace:
&lt;4&gt;[67096.760255]  [&lt;ffffffff814844a7&gt;] nf_conntrack_destroy+0x17/0x30
&lt;4&gt;[67096.760255]  [&lt;ffffffffa03d9bb5&gt;] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
&lt;4&gt;[67096.760255]  [&lt;ffffffffa03d9fb2&gt;] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
&lt;4&gt;[67096.760255]  [&lt;ffffffffa048c771&gt;] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
&lt;4&gt;[67096.760255]  [&lt;ffffffff81484419&gt;] nf_iterate+0x69/0xb0
&lt;4&gt;[67096.760255]  [&lt;ffffffff814b5b00&gt;] ? dst_output+0x0/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814845d4&gt;] nf_hook_slow+0x74/0x110
&lt;4&gt;[67096.760255]  [&lt;ffffffff814b5b00&gt;] ? dst_output+0x0/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814b66d5&gt;] raw_sendmsg+0x775/0x910
&lt;4&gt;[67096.760255]  [&lt;ffffffff8104c5a8&gt;] ? flush_tlb_others_ipi+0x128/0x130
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814c136a&gt;] inet_sendmsg+0x4a/0xb0
&lt;4&gt;[67096.760255]  [&lt;ffffffff81444e93&gt;] ? sock_sendmsg+0x13/0x140
&lt;4&gt;[67096.760255]  [&lt;ffffffff81444f97&gt;] sock_sendmsg+0x117/0x140
&lt;4&gt;[67096.760255]  [&lt;ffffffff8102e299&gt;] ? native_smp_send_reschedule+0x49/0x60
&lt;4&gt;[67096.760255]  [&lt;ffffffff81519beb&gt;] ? _spin_unlock_bh+0x1b/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff8109d930&gt;] ? autoremove_wake_function+0x0/0x40
&lt;4&gt;[67096.760255]  [&lt;ffffffff814960f0&gt;] ? do_ip_setsockopt+0x90/0xd80
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814457c9&gt;] sys_sendto+0x139/0x190
&lt;4&gt;[67096.760255]  [&lt;ffffffff810efa77&gt;] ? audit_syscall_entry+0x1d7/0x200
&lt;4&gt;[67096.760255]  [&lt;ffffffff810ef7c5&gt;] ? __audit_syscall_exit+0x265/0x290
&lt;4&gt;[67096.760255]  [&lt;ffffffff81474daf&gt;] compat_sys_socketcall+0x13f/0x210
&lt;4&gt;[67096.760255]  [&lt;ffffffff8104dea3&gt;] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Andrew Vagin &lt;avagin@parallels.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Reported-by: Andrew Vagin &lt;avagin@parallels.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Andrew Vagin &lt;avagin@parallels.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&amp;ct-&gt;use) (use = 2)
                                         if (!l4proto-&gt;new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

&lt;4&gt;[67096.759334] ------------[ cut here ]------------
&lt;2&gt;[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
&lt;4&gt;[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
&lt;4&gt;[67096.759932] RIP: 0010:[&lt;ffffffffa03d99ac&gt;]  [&lt;ffffffffa03d99ac&gt;] destroy_conntrack+0x15c/0x190 [nf_conntrack]
&lt;4&gt;[67096.760255] Call Trace:
&lt;4&gt;[67096.760255]  [&lt;ffffffff814844a7&gt;] nf_conntrack_destroy+0x17/0x30
&lt;4&gt;[67096.760255]  [&lt;ffffffffa03d9bb5&gt;] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
&lt;4&gt;[67096.760255]  [&lt;ffffffffa03d9fb2&gt;] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
&lt;4&gt;[67096.760255]  [&lt;ffffffffa048c771&gt;] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
&lt;4&gt;[67096.760255]  [&lt;ffffffff81484419&gt;] nf_iterate+0x69/0xb0
&lt;4&gt;[67096.760255]  [&lt;ffffffff814b5b00&gt;] ? dst_output+0x0/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814845d4&gt;] nf_hook_slow+0x74/0x110
&lt;4&gt;[67096.760255]  [&lt;ffffffff814b5b00&gt;] ? dst_output+0x0/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814b66d5&gt;] raw_sendmsg+0x775/0x910
&lt;4&gt;[67096.760255]  [&lt;ffffffff8104c5a8&gt;] ? flush_tlb_others_ipi+0x128/0x130
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814c136a&gt;] inet_sendmsg+0x4a/0xb0
&lt;4&gt;[67096.760255]  [&lt;ffffffff81444e93&gt;] ? sock_sendmsg+0x13/0x140
&lt;4&gt;[67096.760255]  [&lt;ffffffff81444f97&gt;] sock_sendmsg+0x117/0x140
&lt;4&gt;[67096.760255]  [&lt;ffffffff8102e299&gt;] ? native_smp_send_reschedule+0x49/0x60
&lt;4&gt;[67096.760255]  [&lt;ffffffff81519beb&gt;] ? _spin_unlock_bh+0x1b/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff8109d930&gt;] ? autoremove_wake_function+0x0/0x40
&lt;4&gt;[67096.760255]  [&lt;ffffffff814960f0&gt;] ? do_ip_setsockopt+0x90/0xd80
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff8100bc4e&gt;] ? apic_timer_interrupt+0xe/0x20
&lt;4&gt;[67096.760255]  [&lt;ffffffff814457c9&gt;] sys_sendto+0x139/0x190
&lt;4&gt;[67096.760255]  [&lt;ffffffff810efa77&gt;] ? audit_syscall_entry+0x1d7/0x200
&lt;4&gt;[67096.760255]  [&lt;ffffffff810ef7c5&gt;] ? __audit_syscall_exit+0x265/0x290
&lt;4&gt;[67096.760255]  [&lt;ffffffff81474daf&gt;] compat_sys_socketcall+0x13f/0x210
&lt;4&gt;[67096.760255]  [&lt;ffffffff8104dea3&gt;] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Andrew Vagin &lt;avagin@parallels.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Reported-by: Andrew Vagin &lt;avagin@parallels.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Andrew Vagin &lt;avagin@parallels.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: xt_CT: fix error value in xt_ct_tg_check()</title>
<updated>2014-01-03T22:41:39+00:00</updated>
<author>
<name>Eric Leblond</name>
<email>eric@regit.org</email>
</author>
<published>2014-01-02T09:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=14abfa161d256c60f3ea6ba494704ac634b94f63'/>
<id>14abfa161d256c60f3ea6ba494704ac634b94f63</id>
<content type='text'>
If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond &lt;eric@regit.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond &lt;eric@regit.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
