linux.git/kernel/bpf, branch v4.18.19

bpf: wait for running BPF programs when updating map-in-map

2018-11-13T19:12:59+00:00

commit 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8 upstream.

The map-in-map frequently serves as a mechanism for atomic
snapshotting of state that a BPF program might record.  The current
implementation is dangerous to use in this way, however, since
userspace has no way of knowing when all programs that might have
retrieved the "old" value of the map may have completed.

This change ensures that map update operations on map-in-map map types
always wait for all references to the old map to drop before returning
to userspace.

Signed-off-by: Daniel Colascione 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Chenbo Feng 
Signed-off-by: Greg Kroah-Hartman

bpf/verifier: fix verifier instability

2018-11-13T19:12:25+00:00

[ Upstream commit a9c676bc8fc58d00eea9836fb14ee43c0346416a ]

Edward Cree says:
In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access()
has supplied a reg_type, the other members of the register state are set
appropriately.  Previously reg.range was set to 0, but as it is in a
union with reg.map_ptr, which is larger, upper bytes of the latter were
left in place.  This then caused the memcmp() in regsafe() to fail,
preventing some branches from being pruned (and occasionally causing the
same program to take a varying number of processed insns on repeated
verifier runs).

Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known()

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Debugged-by: Edward Cree 
Acked-by: Edward Cree 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

xsk: do not call synchronize_net() under RCU read lock

2018-11-13T19:12:16+00:00

[ Upstream commit cee271678d0e3177a25d0fcb2fa5e051d48e4262 ]

The XSKMAP update and delete functions called synchronize_net(), which
can sleep. It is not allowed to sleep during an RCU read section.

Instead we need to make sure that the sock sk_destruct (xsk_destruct)
function is asynchronously called after an RCU grace period. Setting
the SOCK_RCU_FREE flag for XDP sockets takes care of this.

Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
Reported-by: Eric Dumazet 
Signed-off-by: Björn Töpel 
Acked-by: Song Liu 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bpf: fix partial copy of map_ptr when dst is scalar

2018-11-10T15:49:45+00:00

commit 0962590e553331db2cc0aef2dc35c57f6300dbbe upstream.

ALU operations on pointers such as scalar_reg += map_value_ptr are
handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
and range in the register state share a union, so transferring state
through dst_reg->range = ptr_reg->range is just buggy as any new
map_ptr in the dst_reg is then truncated (or null) for subsequent
checks. Fix this by adding a raw member and use it for copying state
over to dst_reg.

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann 
Cc: Edward Cree 
Acked-by: Alexei Starovoitov 
Signed-off-by: Alexei Starovoitov 
Acked-by: Edward Cree 
Signed-off-by: Sasha Levin

bpf: sockmap, fix transition through disconnect without close

2018-10-20T07:47:08+00:00

[ Upstream commit b05545e15e1ff1d6a6a8593971275f9cc3e6b92b ]

It is possible (via shutdown()) for TCP socks to go trough TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call our bpf_tcp_close() callback. Because of this a user
could disconnect a socket then put it in a LISTEN state which would
break our assumptions about sockets always being ESTABLISHED state.

To resolve this rely on the unhash hook, which is called in the
disconnect case, to remove the sock from the sockmap.

Reported-by: Eric Dumazet 
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend 
Acked-by: Yonghong Song 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bpf: sockmap only allow ESTABLISHED sock state

2018-10-20T07:47:08+00:00

[ Upstream commit 5607fff303636d48b88414c6be353d9fed700af2 ]

After this patch we only allow socks that are in ESTABLISHED state or
are being added via a sock_ops event that is transitioning into an
ESTABLISHED state. By allowing sock_ops events we allow users to
manage sockmaps directly from sock ops programs. The two supported
sock_ops ops are BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB and
BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB.

Similar to TLS ULP this ensures sk_user_data is correct.

Reported-by: Eric Dumazet 
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend 
Acked-by: Yonghong Song 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bpf: btf: Fix end boundary calculation for type section

2018-10-18T07:18:15+00:00

[ Upstream commit 4b1c5d917d34f705096bb7dd8a2bd19b0881970e ]

The end boundary math for type section is incorrect in
btf_check_all_metas().  It just happens that hdr->type_off
is always 0 for now because there are only two sections
(type and string) and string section must be at the end (ensured
in btf_parse_str_sec).

However, type_off may not be 0 if a new section would be added later.
This patch fixes it.

Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header")
Reported-by: Dmitry Vyukov 
Signed-off-by: Martin KaFai Lau 
Acked-by: Yonghong Song 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bpf: 32-bit RSH verification must truncate input before the ALU op

2018-10-10T06:55:58+00:00

commit b799207e1e1816b09e7a5920fbb2d5fcf6edd681 upstream.

When I wrote commit 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.

That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.

Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.

Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
Acked-by: Daniel Borkmann 
Signed-off-by: Jann Horn 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Greg Kroah-Hartman

bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP

2018-10-10T06:55:55+00:00

[ Upstream commit 597222f72a94118f593e4f32bf58ae7e049a0df1 ]

Currently we check sk_user_data is non NULL to determine if the sk
exists in a map. However, this is not sufficient to ensure the psock
or the ULP ops are not in use by another user, such as kcm or TLS. To
avoid this when adding a sock to a map also verify it is of the
correct ULP type. Additionally, when releasing a psock verify that
it is the TCP_ULP_BPF type before releasing the ULP. The error case
where we abort an update due to ULP collision can cause this error
path.

For example,

  __sock_map_ctx_update_elem()
     [...]
     err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS
     if (err)                                <- so err out here
        goto out_free
     [...]
  out_free:
     smap_release_sock() <- calling tcp_cleanup_ulp releases the
                            TLS ULP incorrectly.

Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Signed-off-by: John Fastabend 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bpf: sockmap, decrement copied count correctly in redirect error case

2018-10-10T06:55:51+00:00

[ Upstream commit 501ca81760c204ec59b73e4a00bee5971fc0f1b1 ]

Currently, when a redirect occurs in sockmap and an error occurs in
the redirect call we unwind the scatterlist once in the error path
of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then
in the error path of sendmsg we decrement the copied count by the
send size.

However, its possible we partially sent data before the error was
generated. This can happen if do_tcp_sendpages() partially sends the
scatterlist before encountering a memory pressure error. If this
happens we need to decrement the copied value (the value tracking
how many bytes were actually sent to TCP stack) by the number of
remaining bytes _not_ the entire send size. Otherwise we risk
confusing userspace.

Also we don't need two calls to free the scatterlist one is
good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and
then properly reduce copied by the number of remaining bytes which
may in fact be the entire send size if no bytes were sent.

To do this use bool to indicate if free_start_sg() should do mem
accounting or not.

Signed-off-by: John Fastabend 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman