linux.git/net/core, branch v3.15-rc6

rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set

2014-04-24T17:52:54+00:00

Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with
buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
attribute if they were solicited by a GETLINK message containing an
IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.

That was done because some user programs broke when they received more data
than expected - because IFLA_VFINFO_LIST contains information for each VF
it can become large if there are many VFs.

However, the IFLA_VF_PORTS attribute, supplied for devices which implement
ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
It supplies per-VF information and can therefore become large, but it is
not currently conditional on the IFLA_EXT_MASK value.

Worse, it interacts badly with the existing EXT_MASK handling.  When
IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
NLMSG_GOODSIZE.  If the information for IFLA_VF_PORTS exceeds this, then
rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
netlink_dump() will misinterpret this as having finished the listing and
omit data for this interface and all subsequent ones.  That can cause
getifaddrs(3) to enter an infinite loop.

This patch addresses the problem by only supplying IFLA_VF_PORTS when
IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.

Signed-off-by: David Gibson 
Reviewed-by: Jiri Pirko 
Signed-off-by: David S. Miller

rtnetlink: Warn when interface's information won't fit in our packet

2014-04-24T17:52:54+00:00

Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.

If it doesn't, however, things will go badly wrong,  When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones.  This can cause getifaddrs(3) to
enter an infinite loop.

This patch won't fix the problem, but it will WARN_ON() making it easier to
track down what's going wrong.

Signed-off-by: David Gibson 
Reviewed-by: Jiri Pirko 
Signed-off-by: David S. Miller

net: Use netlink_ns_capable to verify the permisions of netlink messages

2014-04-24T17:44:54+00:00

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: David S. Miller

net: Add variants of capable for use on on sockets

2014-04-24T17:44:53+00:00

sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.

Signed-off-by: "Eric W. Biederman" 
Signed-off-by: David S. Miller

net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump

2014-04-24T17:44:53+00:00

The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
from it's sources it is not clear why it is wrong.  Move the computation
into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.

This does not yet correct the capability check but instead simply moves it to make
it clear what is going on.

Reported-by: Andy Lutomirski 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: David S. Miller

net: filter: initialize A and X registers

2014-04-23T19:34:41+00:00

exisiting BPF verifier allows uninitialized access to registers,
'ret A' is considered to be a valid filter.
So initialize A and X to zero to prevent leaking kernel memory
In the future BPF verifier will be rejecting such filters

Signed-off-by: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller

net: Fix ns_capable check in sock_diag_put_filterinfo

2014-04-22T16:49:39+00:00

The caller needs capabilities on the namespace being queried, not on
their own namespace.  This is a security bug, although it likely has
only a minor impact.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski 
Acked-by: Nicolas Dichtel 
Signed-off-by: David S. Miller

vlan: Fix lockdep warning when vlan dev handle notification

2014-04-18T21:48:30+00:00

When I open the LOCKDEP config and run these steps:

modprobe 8021q
vconfig add eth2 20
vconfig add eth2.20 30
ifconfig eth2 xx.xx.xx.xx

then the Call Trace happened:

[32524.386288] =============================================
[32524.386293] [ INFO: possible recursive locking detected ]
[32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G           O
[32524.386302] ---------------------------------------------
[32524.386306] ifconfig/3103 is trying to acquire lock:
[32524.386310]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_mc_sync+0x64/0xb0
[32524.386326]
[32524.386326] but task is already holding lock:
[32524.386330]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_set_rx_mode+0x23/0x40
[32524.386341]
[32524.386341] other info that might help us debug this:
[32524.386345]  Possible unsafe locking scenario:
[32524.386345]
[32524.386350]        CPU0
[32524.386352]        ----
[32524.386354]   lock(&vlan_netdev_addr_lock_key/1);
[32524.386359]   lock(&vlan_netdev_addr_lock_key/1);
[32524.386364]
[32524.386364]  *** DEADLOCK ***
[32524.386364]
[32524.386368]  May be due to missing lock nesting notation
[32524.386368]
[32524.386373] 2 locks held by ifconfig/3103:
[32524.386376]  #0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x20
[32524.386387]  #1:  (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_set_rx_mode+0x23/0x40
[32524.386398]
[32524.386398] stack backtrace:
[32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G           O 3.14.0-rc2-0.7-default+ #35
[32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[32524.386414]  ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
[32524.386421]  ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
[32524.386428]  000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
[32524.386435] Call Trace:
[32524.386441]  [] dump_stack+0x6a/0x78
[32524.386448]  [] __lock_acquire+0x7ab/0x1940
[32524.386454]  [] ? __lock_acquire+0x3ea/0x1940
[32524.386459]  [] lock_acquire+0xe4/0x110
[32524.386464]  [] ? dev_mc_sync+0x64/0xb0
[32524.386471]  [] _raw_spin_lock_nested+0x2a/0x40
[32524.386476]  [] ? dev_mc_sync+0x64/0xb0
[32524.386481]  [] dev_mc_sync+0x64/0xb0
[32524.386489]  [] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
[32524.386495]  [] __dev_set_rx_mode+0x5f/0xb0
[32524.386500]  [] dev_set_rx_mode+0x2b/0x40
[32524.386506]  [] __dev_open+0xef/0x150
[32524.386511]  [] __dev_change_flags+0xa7/0x190
[32524.386516]  [] dev_change_flags+0x32/0x80
[32524.386524]  [] devinet_ioctl+0x7d6/0x830
[32524.386532]  [] ? dev_ioctl+0x34b/0x660
[32524.386540]  [] inet_ioctl+0x80/0xa0
[32524.386550]  [] sock_do_ioctl+0x2d/0x60
[32524.386558]  [] sock_ioctl+0x82/0x2a0
[32524.386568]  [] do_vfs_ioctl+0x93/0x590
[32524.386578]  [] ? rcu_read_lock_held+0x45/0x50
[32524.386586]  [] ? __fget_light+0x105/0x110
[32524.386594]  [] SyS_ioctl+0x91/0xb0
[32524.386604]  [] system_call_fastpath+0x16/0x1b

========================================================================

The reason is that all of the addr_lock_key for vlan dev have the same class,
so if we change the status for vlan dev, the vlan dev and its real dev will
hold the same class of addr_lock_key together, so the warning happened.

we should distinguish the lock depth for vlan dev and its real dev.

v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
	could support to add 8 vlan id on a same vlan dev, I think it is enough for current
	scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
	and the vlan dev would not meet the same class key with its real dev.

	The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
	dev could get a suitable class key.

v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
	and its real dev, but it make no sense, because the difference for subclass in the
	lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
	to distinguish the different depth for every vlan dev, the same depth of the vlan dev
	could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
	I think it is enough here, the lockdep should never exceed that value.

v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
	we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
	and use the depth as the subclass for addr_lock_key.

Signed-off-by: Ding Tianhong 
Signed-off-by: David S. Miller

ipv4: add a sock pointer to dst->output() path.

2014-04-15T17:47:15+00:00

In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c922d55 ("vxlan: keep original skb ownership")
Reported-by: lucien xin 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: Start with correct mac_len in skb_network_protocol

2014-04-14T22:58:58+00:00

Sometimes, when the packet arrives at skb_mac_gso_segment()
its skb->mac_len already accounts for some of the mac lenght
headers in the packet.  This seems to happen when forwarding
through and OpenSSL tunnel.

When we start looking for any vlan headers in skb_network_protocol()
we seem to ignore any of the already known mac headers and start
with an ETH_HLEN.  This results in an incorrect offset, dropped
TSO frames and general slowness of the connection.

We can start counting from the known skb->mac_len
and return at least that much if all mac level headers
are known and accounted for.

Fixes: 53d6471cef17262d3ad1c7ce8982a234244f68ec (net: Account for all vlan headers in skb_mac_gso_segment)
CC: Eric Dumazet 
CC: Daniel Borkman 
Tested-by: Martin Filip 
Signed-off-by: Vlad Yasevich 
Signed-off-by: David S. Miller