linux.git/include/net, branch v3.10.94

net: avoid NULL deref in inet_ctl_sock_destroy()

2015-12-09T18:40:06+00:00

[ Upstream commit 8fa677d2706d325d71dab91bf6e6512c05214e37 ]

Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.

Signed-off-by: Eric Dumazet 
Reported-by: Dmitry Vyukov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: add pfmemalloc check in sk_add_backlog()

2015-10-27T00:44:48+00:00

[ Upstream commit c7c49b8fde26b74277188bdc6c9dca38db6fa35b ]

Greg reported crashes hitting the following check in __sk_backlog_rcv()

	BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));

The pfmemalloc bit is currently checked in sk_filter().

This works correctly for TCP, because sk_filter() is ran in
tcp_v[46]_rcv() before hitting the prequeue or backlog checks.

For UDP or other protocols, this does not work, because the sk_filter()
is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
queuing if socket is owned by user by the time packet is processed by
softirq handler.

Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
Signed-off-by: Eric Dumazet 
Reported-by: Greg Thelen 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

af_unix: Convert the unix_sk macro to an inline function for type safety

2015-10-27T00:44:47+00:00

[ Upstream commit 4613012db1d911f80897f9446a49de817b2c4c47 ]

As suggested by Eric Dumazet this change replaces the
#define with a static inline function to enjoy
complaints by the compiler when misusing the API.

Signed-off-by: Aaron Conole 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv6: lock socket in ip6_datagram_connect()

2015-10-01T10:07:37+00:00

[ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ]

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet 
Acked-by: Herbert Xu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sctp: fix ASCONF list handling

2015-10-01T10:07:34+00:00

commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 upstream.

->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.

This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().

Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.

Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen 
Suggested-by: Neil Horman 
Suggested-by: Hannes Frederic Sowa 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller 
[wangkai: backport to 3.10: adjust context]
Signed-off-by: Wang Kai 
Signed-off-by: Greg Kroah-Hartman

ipv6: prevent fib6_run_gc() contention

2015-07-04T02:48:09+00:00

commit 2ac3ac8f86f2fe065d746d9a9abaca867adec577 upstream.

On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek 
Signed-off-by: David S. Miller 
Cc: Konstantin Khlebnikov 
Signed-off-by: Greg Kroah-Hartman

ipv4: tcp: get rid of ugly unicast_sock

2015-02-27T01:48:48+00:00

[ Upstream commit bdbbb8527b6f6a358dbcb70dac247034d665b8e4 ]

In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :

commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.

commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.

commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression.

commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.

Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.

This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: try to cache dst_entries which would cause a redirect

2015-02-27T01:48:48+00:00

[ Upstream commit df4d92549f23e1c037e83323aff58a21b3de7fe0 ]

Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov 
Signed-off-by: Marcelo Leitner 
Signed-off-by: Florian Westphal 
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: Julian Anastasov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks

2014-11-21T17:22:55+00:00

commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 
 [] skb_put+0x5c/0x70
 [] sctp_addto_chunk+0x63/0xd0 [sctp]
 [] sctp_process_asconf+0x1af/0x540 [sctp]
 [] ? _read_unlock_bh+0x15/0x20
 [] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [] sctp_do_sm+0x71/0x1210 [sctp]
 [] ? fib_rules_lookup+0xad/0xf0
 [] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [] sctp_inq_push+0x56/0x80 [sctp]
 [] sctp_rcv+0x982/0xa10 [sctp]
 [] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [] ? nf_iterate+0x69/0xb0
 [] ? ip_local_deliver_finish+0x0/0x2d0
 [] ? nf_hook_slow+0x76/0x120
 [] ? ip_local_deliver_finish+0x0/0x2d0
 [] ip_local_deliver_finish+0xdd/0x2d0
 [] ip_local_deliver+0x98/0xa0
 [] ip_rcv_finish+0x12d/0x440
 [] ip_rcv+0x275/0x350
 [] __netif_receive_skb+0x4ab/0x750
 [] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann 
Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Cc: Josh Boyer 
Signed-off-by: Greg Kroah-Hartman

net: sctp: fix panic on duplicate ASCONF chunks

2014-11-21T17:22:55+00:00

commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann 
Signed-off-by: Vlad Yasevich 
Signed-off-by: David S. Miller 
Cc: Josh Boyer 
Signed-off-by: Greg Kroah-Hartman