Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables

Pablo Neira Ayuso says: ==================== netfilter updates: nf_tables pull request The following patchset contains the current original nf_tables tree condensed in 17 patches. I have organized them by chronogical order since the original nf_tables code was released in 2009 and by dependencies between the different patches. The patches are: 1) Adapt all existing hooks in the tree to pass hook ops to the hook callback function, required by nf_tables, from Patrick McHardy. 2) Move alloc_null_binding to nf_nat_core, as it is now also needed by nf_tables and ip_tables, original patch from Patrick McHardy but required major changes to adapt it to the current tree that I made. 3) Add nf_tables core, including the netlink API, the packet filtering engine, expressions and built-in tables, from Patrick McHardy. This patch includes accumulated fixes since 2009 and minor enhancements. The patch description contains a list of references to the original patches for the record. For those that are not familiar to the original work, see [1], [2] and [3]. 4) Add netlink set API, this replaces the original set infrastructure to introduce a netlink API to add/delete sets and to add/delete set elements. This includes two set types: the hash and the rb-tree sets (used for interval based matching). The main difference with ipset is that this infrastructure is data type agnostic. Patch from Patrick McHardy. 5) Allow expression operation overload, this API change allows us to provide define expression subtypes depending on the configuration that is received from user-space via Netlink. It is used by follow up patches to provide optimized versions of the payload and cmp expressions and the x_tables compatibility layer, from Patrick McHardy. 6) Add optimized data comparison operation, it requires the previous patch, from Patrick McHardy. 7) Add optimized payload implementation, it requires patch 5, from Patrick McHardy. 8) Convert built-in tables to chain types. Each chain type have special semantics (filter, route and nat) that are used by userspace to configure the chain behaviour. The main chain regarding iptables is that tables become containers of chain, with no specific semantics. However, you may still configure your tables and chains to retain iptables like semantics, patch from me. 9) Add compatibility layer for x_tables. This patch adds support to use all existing x_tables extensions from nf_tables, this is used to provide a userspace utility that accepts iptables syntax but used internally the nf_tables kernel core. This patch includes missing features in the nf_tables core such as the per-chain stats, default chain policy and number of chain references, which are required by the iptables compatibility userspace tool. Patch from me. 10) Fix transport protocol matching, this fix is a side effect of the x_tables compatibility layer, which now provides a pointer to the transport header, from me. 11) Add support for dormant tables, this feature allows you to disable all chains and rules that are contained in one table, from me. 12) Add IPv6 NAT support. At the time nf_tables was made, there was no NAT IPv6 support yet, from Tomasz Bursztyka. 13) Complete net namespace support. This patch register the protocol family per net namespace, so tables (thus, other objects contained in tables such as sets, chains and rules) are only visible from the corresponding net namespace, from me. 14) Add the insert operation to the nf_tables netlink API, this requires adding a new position attribute that allow us to locate where in the ruleset a rule needs to be inserted, from Eric Leblond. 15) Add rule batching support, including atomic rule-set updates by using rule-set generations. This patch includes a change to nfnetlink to include two new control messages to indicate the beginning and the end of a batch. The end message is interpreted as the commit message, if it's missing, then the rule-set updates contained in the batch are aborted, from me. 16) Add trace support to the nf_tables packet filtering core, from me. 17) Add ARP filtering support, original patch from Patrick McHardy, but adapted to fit into the chain type infrastructure. This was recovered to be used by nft userspace tool and our compatibility arptables userspace tool. There is still work to do to fully replace x_tables [4] [5] but that can be done incrementally by extending our netlink API. Moreover, looking at netfilter-devel and the amount of contributions to nf_tables we've been getting, I think it would be good to have it mainstream to avoid accumulating large patchsets skip continuous rebases. I tried to provide a reasonable patchset, we have more than 100 accumulated patches in the original nf_tables tree, so I collapsed many of the small fixes to the main patch we had since 2009 and provide a small batch for review to netdev, while trying to retain part of the history. For those who didn't give a try to nf_tables yet, there's a quick howto available from Eric Leblond that describes how to get things working [6]. Comments/reviews welcome. Thanks! [1] http://lwn.net/Articles/324251/ [2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf [3] http://lwn.net/Articles/564095/ [4] http://people.netfilter.org/pablo/map-pending-work.txt [4] http://people.netfilter.org/pablo/nftables-todo.txt [5] https://home.regit.org/netfilter-en/nftables-quick-howto/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
author: David S. Miller <davem@davemloft.net> 2013-10-17 15:22:05 -0400
committer: David S. Miller <davem@davemloft.net> 2013-10-17 15:22:05 -0400
commit: da33edccebcc36d387423dcdb557094fbda55994 (patch)
tree: 9f426a52f875169ae24e54a395beedc697c96e02 /net/netfilter/nfnetlink.c
parent: 78dea8cc4942c6adbcccc8f483463906a078f039 (diff)
parent: ed683f138b3dbc8a5e878e24a0bfa0bb61043a09 (diff)
download: linux-da33edccebcc36d387423dcdb557094fbda55994.tar.gz
linux-da33edccebcc36d387423dcdb557094fbda55994.tar.bz2
linux-da33edccebcc36d387423dcdb557094fbda55994.zip
1 files changed, 171 insertions, 4 deletions
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 572d87dc116f..027f16af51a0 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -147,9 +147,6 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
-		return -EPERM;
-
 	/* All the messages must at least contain nfgenmsg */
 	if (nlmsg_len(nlh) < sizeof(struct nfgenmsg))
 		return 0;
@@ -217,9 +214,179 @@ replay:
 	}
 }
 
+static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
+				u_int16_t subsys_id)
+{
+	struct sk_buff *nskb, *oskb = skb;
+	struct net *net = sock_net(skb->sk);
+	const struct nfnetlink_subsystem *ss;
+	const struct nfnl_callback *nc;
+	bool success = true, done = false;
+	int err;
+
+	if (subsys_id >= NFNL_SUBSYS_COUNT)
+		return netlink_ack(skb, nlh, -EINVAL);
+replay:
+	nskb = netlink_skb_clone(oskb, GFP_KERNEL);
+	if (!nskb)
+		return netlink_ack(oskb, nlh, -ENOMEM);
+
+	nskb->sk = oskb->sk;
+	skb = nskb;
+
+	nfnl_lock(subsys_id);
+	ss = rcu_dereference_protected(table[subsys_id].subsys,
+				       lockdep_is_held(&table[subsys_id].mutex));
+	if (!ss) {
+#ifdef CONFIG_MODULES
+		nfnl_unlock(subsys_id);
+		request_module("nfnetlink-subsys-%d", subsys_id);
+		nfnl_lock(subsys_id);
+		ss = rcu_dereference_protected(table[subsys_id].subsys,
+					       lockdep_is_held(&table[subsys_id].mutex));
+		if (!ss)
+#endif
+		{
+			nfnl_unlock(subsys_id);
+			kfree_skb(nskb);
+			return netlink_ack(skb, nlh, -EOPNOTSUPP);
+		}
+	}
+
+	if (!ss->commit || !ss->abort) {
+		nfnl_unlock(subsys_id);
+		kfree_skb(nskb);
+		return netlink_ack(skb, nlh, -EOPNOTSUPP);
+	}
+
+	while (skb->len >= nlmsg_total_size(0)) {
+		int msglen, type;
+
+		nlh = nlmsg_hdr(skb);
+		err = 0;
+
+		if (nlh->nlmsg_len < NLMSG_HDRLEN) {
+			err = -EINVAL;
+			goto ack;
+		}
+
+		/* Only requests are handled by the kernel */
+		if (!(nlh->nlmsg_flags & NLM_F_REQUEST)) {
+			err = -EINVAL;
+			goto ack;
+		}
+
+		type = nlh->nlmsg_type;
+		if (type == NFNL_MSG_BATCH_BEGIN) {
+			/* Malformed: Batch begin twice */
+			success = false;
+			goto done;
+		} else if (type == NFNL_MSG_BATCH_END) {
+			done = true;
+			goto done;
+		} else if (type < NLMSG_MIN_TYPE) {
+			err = -EINVAL;
+			goto ack;
+		}
+
+		/* We only accept a batch with messages for the same
+		 * subsystem.
+		 */
+		if (NFNL_SUBSYS_ID(type) != subsys_id) {
+			err = -EINVAL;
+			goto ack;
+		}
+
+		nc = nfnetlink_find_client(type, ss);
+		if (!nc) {
+			err = -EINVAL;
+			goto ack;
+		}
+
+		{
+			int min_len = nlmsg_total_size(sizeof(struct nfgenmsg));
+			u_int8_t cb_id = NFNL_MSG_TYPE(nlh->nlmsg_type);
+			struct nlattr *cda[ss->cb[cb_id].attr_count + 1];
+			struct nlattr *attr = (void *)nlh + min_len;
+			int attrlen = nlh->nlmsg_len - min_len;
+
+			err = nla_parse(cda, ss->cb[cb_id].attr_count,
+					attr, attrlen, ss->cb[cb_id].policy);
+			if (err < 0)
+				goto ack;
+
+			if (nc->call_batch) {
+				err = nc->call_batch(net->nfnl, skb, nlh,
+						     (const struct nlattr **)cda);
+			}
+
+			/* The lock was released to autoload some module, we
+			 * have to abort and start from scratch using the
+			 * original skb.
+			 */
+			if (err == -EAGAIN) {
+				ss->abort(skb);
+				nfnl_unlock(subsys_id);
+				kfree_skb(nskb);
+				goto replay;
+			}
+		}
+ack:
+		if (nlh->nlmsg_flags & NLM_F_ACK || err) {
+			/* We don't stop processing the batch on errors, thus,
+			 * userspace gets all the errors that the batch
+			 * triggers.
+			 */
+			netlink_ack(skb, nlh, err);
+			if (err)
+				success = false;
+		}
+
+		msglen = NLMSG_ALIGN(nlh->nlmsg_len);
+		if (msglen > skb->len)
+			msglen = skb->len;
+		skb_pull(skb, msglen);
+	}
+done:
+	if (success && done)
+		ss->commit(skb);
+	else
+		ss->abort(skb);
+
+	nfnl_unlock(subsys_id);
+	kfree_skb(nskb);
+}
+
 static void nfnetlink_rcv(struct sk_buff *skb)
 {
-	netlink_rcv_skb(skb, &nfnetlink_rcv_msg);
+	struct nlmsghdr *nlh = nlmsg_hdr(skb);
+	struct net *net = sock_net(skb->sk);
+	int msglen;
+
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		return netlink_ack(skb, nlh, -EPERM);
+
+	if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+	    skb->len < nlh->nlmsg_len)
+		return;
+
+	if (nlh->nlmsg_type == NFNL_MSG_BATCH_BEGIN) {
+		struct nfgenmsg *nfgenmsg;
+
+		msglen = NLMSG_ALIGN(nlh->nlmsg_len);
+		if (msglen > skb->len)
+			msglen = skb->len;
+
+		if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+		    skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg))
+			return;
+
+		nfgenmsg = nlmsg_data(nlh);
+		skb_pull(skb, msglen);
+		nfnetlink_rcv_batch(skb, nlh, nfgenmsg->res_id);
+	} else {
+		netlink_rcv_skb(skb, &nfnetlink_rcv_msg);
+	}
 }
 
 #ifdef CONFIG_MODULES
author	David S. Miller <davem@davemloft.net>	2013-10-17 15:22:05 -0400
committer	David S. Miller <davem@davemloft.net>	2013-10-17 15:22:05 -0400
commit	da33edccebcc36d387423dcdb557094fbda55994 (patch)
tree	9f426a52f875169ae24e54a395beedc697c96e02 /net/netfilter/nfnetlink.c
parent	78dea8cc4942c6adbcccc8f483463906a078f039 (diff)
parent	ed683f138b3dbc8a5e878e24a0bfa0bb61043a09 (diff)
download	linux-da33edccebcc36d387423dcdb557094fbda55994.tar.gz linux-da33edccebcc36d387423dcdb557094fbda55994.tar.bz2 linux-da33edccebcc36d387423dcdb557094fbda55994.zip