linux.git/kernel/bpf, branch v4.19.205

bpf: Fix leakage under speculation on mispredicted branches

2021-08-15T11:05:04+00:00

commit 9183671af6dbf60a1219371d4ed73e23f43b49db upstream.

The verifier only enumerates valid control-flow paths and skips paths that
are unreachable in the non-speculative domain. And so it can miss issues
under speculative execution on mispredicted branches.

For example, a type confusion has been demonstrated with the following
crafted program:

  // r0 = pointer to a map array entry
  // r6 = pointer to readable stack slot
  // r9 = scalar controlled by attacker
  1: r0 = *(u64 *)(r0) // cache miss
  2: if r0 != 0x0 goto line 4
  3: r6 = r9
  4: if r0 != 0x1 goto line 6
  5: r9 = *(u8 *)(r6)
  6: // leak r9

Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier
concludes that the pointer dereference on line 5 is safe. But: if the
attacker trains both the branches to fall-through, such that the following
is speculatively executed ...

  r6 = r9
  r9 = *(u8 *)(r6)
  // leak r9

... then the program will dereference an attacker-controlled value and could
leak its content under speculative execution via side-channel. This requires
to mistrain the branch predictor, which can be rather tricky, because the
branches are mutually exclusive. However such training can be done at
congruent addresses in user space using different branches that are not
mutually exclusive. That is, by training branches in user space ...

  A:  if r0 != 0x0 goto line C
  B:  ...
  C:  if r0 != 0x0 goto line D
  D:  ...

... such that addresses A and C collide to the same CPU branch prediction
entries in the PHT (pattern history table) as those of the BPF program's
lines 2 and 4, respectively. A non-privileged attacker could simply brute
force such collisions in the PHT until observing the attack succeeding.

Alternative methods to mistrain the branch predictor are also possible that
avoid brute forcing the collisions in the PHT. A reliable attack has been
demonstrated, for example, using the following crafted program:

  // r0 = pointer to a [control] map array entry
  // r7 = *(u64 *)(r0 + 0), training/attack phase
  // r8 = *(u64 *)(r0 + 8), oob address
  // [...]
  // r0 = pointer to a [data] map array entry
  1: if r7 == 0x3 goto line 3
  2: r8 = r0
  // crafted sequence of conditional jumps to separate the conditional
  // branch in line 193 from the current execution flow
  3: if r0 != 0x0 goto line 5
  4: if r0 == 0x0 goto exit
  5: if r0 != 0x0 goto line 7
  6: if r0 == 0x0 goto exit
  [...]
  187: if r0 != 0x0 goto line 189
  188: if r0 == 0x0 goto exit
  // load any slowly-loaded value (due to cache miss in phase 3) ...
  189: r3 = *(u64 *)(r0 + 0x1200)
  // ... and turn it into known zero for verifier, while preserving slowly-
  // loaded dependency when executing:
  190: r3 &= 1
  191: r3 &= 2
  // speculatively bypassed phase dependency
  192: r7 += r3
  193: if r7 == 0x3 goto exit
  194: r4 = *(u8 *)(r8 + 0)
  // leak r4

As can be seen, in training phase (phase != 0x3), the condition in line 1
turns into false and therefore r8 with the oob address is overridden with
the valid map value address, which in line 194 we can read out without
issues. However, in attack phase, line 2 is skipped, and due to the cache
miss in line 189 where the map value is (zeroed and later) added to the
phase register, the condition in line 193 takes the fall-through path due
to prior branch predictor training, where under speculation, it'll load the
byte at oob address r8 (unknown scalar type at that point) which could then
be leaked via side-channel.

One way to mitigate these is to 'branch off' an unreachable path, meaning,
the current verification path keeps following the is_branch_taken() path
and we push the other branch to the verification stack. Given this is
unreachable from the non-speculative domain, this branch's vstate is
explicitly marked as speculative. This is needed for two reasons: i) if
this path is solely seen from speculative execution, then we later on still
want the dead code elimination to kick in in order to sanitize these
instructions with jmp-1s, and ii) to ensure that paths walked in the
non-speculative domain are not pruned from earlier walks of paths walked in
the speculative domain. Additionally, for robustness, we mark the registers
which have been part of the conditional as unknown in the speculative path
given there should be no assumptions made on their content.

The fix in here mitigates type confusion attacks described earlier due to
i) all code paths in the BPF program being explored and ii) existing
verifier logic already ensuring that given memory access instruction
references one specific data structure.

An alternative to this fix that has also been looked at in this scope was to
mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as
well as direction encoding (always-goto, always-fallthrough, unknown), such
that mixing of different always-* directions themselves as well as mixing of
always-* with unknown directions would cause a program rejection by the
verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else
{ x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this
would result in only single direction always-* taken paths, and unknown taken
paths being allowed, such that the former could be patched from a conditional
jump to an unconditional jump (ja). Compared to this approach here, it would
have two downsides: i) valid programs that otherwise are not performing any
pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are
required to turn off path pruning for unprivileged, where both can be avoided
in this work through pushing the invalid branch to the verification stack.

The issue was originally discovered by Adam and Ofek, and later independently
discovered and reported as a result of Benedict and Piotr's research work.

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: Adam Morrison 
Reported-by: Ofek Kirzner 
Reported-by: Benedict Schlueter 
Reported-by: Piotr Krysiuk 
Signed-off-by: Daniel Borkmann 
Reviewed-by: John Fastabend 
Reviewed-by: Benedict Schlueter 
Reviewed-by: Piotr Krysiuk 
Acked-by: Alexei Starovoitov 
[OP: use allow_ptr_leaks instead of bypass_spec_v1]
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Do not mark insn as seen under speculative path verification

2021-08-15T11:05:04+00:00

commit fe9a5ca7e370e613a9a75a13008a3845ea759d6e upstream.

... in such circumstances, we do not want to mark the instruction as seen given
the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
from the non-speculative path verification. We do however want to verify it for
safety regardless.

With the patch as-is all the insns that have been marked as seen before the
patch will also be marked as seen after the patch (just with a potentially
different non-zero count). An upcoming patch will also verify paths that are
unreachable in the non-speculative domain, hence this extension is needed.

Signed-off-by: Daniel Borkmann 
Reviewed-by: John Fastabend 
Reviewed-by: Benedict Schlueter 
Reviewed-by: Piotr Krysiuk 
Acked-by: Alexei Starovoitov 
[OP: - env->pass_cnt is not used in 4.19, so adjust sanitize_mark_insn_seen()
       to assign "true" instead
     - drop sanitize_insn_aux_data() comment changes, as the function is not
       present in 4.19]
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Inherit expanded/patched seen count from old aux data

2021-08-15T11:05:04+00:00

commit d203b0fd863a2261e5d00b97f3d060c4c2a6db71 upstream.

Instead of relying on current env->pass_cnt, use the seen count from the
old aux data in adjust_insn_aux_data(), and expand it to the new range of
patched instructions. This change is valid given we always expand 1:n
with n>=1, so what applies to the old/original instruction needs to apply
for the replacement as well.

Not relying on env->pass_cnt is a prerequisite for a later change where we
want to avoid marking an instruction seen when verified under speculative
execution path.

Signed-off-by: Daniel Borkmann 
Reviewed-by: John Fastabend 
Reviewed-by: Benedict Schlueter 
Reviewed-by: Piotr Krysiuk 
Acked-by: Alexei Starovoitov 
[OP: - declare old_data as bool instead of u32 (struct bpf_insn_aux_data.seen
     is bool in 5.4)
     - adjusted context for 4.19]
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Add BPF_F_ANY_ALIGNMENT.

2021-06-10T11:24:08+00:00

commit e9ee9efc0d176512cdce9d27ff8549d7ffa2bfcd upstream

Often we want to write tests cases that check things like bad context
offset accesses.  And one way to do this is to use an odd offset on,
for example, a 32-bit load.

This unfortunately triggers the alignment checks first on platforms
that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS.  So the test
case see the alignment failure rather than what it was testing for.

It is often not completely possible to respect the original intention
of the test, or even test the same exact thing, while solving the
alignment issue.

Another option could have been to check the alignment after the
context and other validations are performed by the verifier, but
that is a non-trivial change to the verifier.

Signed-off-by: David S. Miller 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Tiezhu Yang 
Signed-off-by: Greg Kroah-Hartman

bpf: No need to simulate speculative domain for immediates

2021-06-03T06:38:07+00:00

commit a7036191277f9fa68d92f2071ddc38c09b1e5ee5 upstream

In 801c6058d14a ("bpf: Fix leakage of uninitialized bpf stack under
speculation") we replaced masking logic with direct loads of immediates
if the register is a known constant. Given in this case we do not apply
any masking, there is also no reason for the operation to be truncated
under the speculative domain.

Therefore, there is also zero reason for the verifier to branch-off and
simulate this case, it only needs to do it for unknown but bounded scalars.
As a side-effect, this also enables few test cases that were previously
rejected due to simulation under zero truncation.

Signed-off-by: Daniel Borkmann 
Reviewed-by: Piotr Krysiuk 
Acked-by: Alexei Starovoitov 
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Fix mask direction swap upon off reg sign change

2021-06-03T06:38:07+00:00

commit bb01a1bba579b4b1c5566af24d95f1767859771e upstream

Masking direction as indicated via mask_to_left is considered to be
calculated once and then used to derive pointer limits. Thus, this
needs to be placed into bpf_sanitize_info instead so we can pass it
to sanitize_ptr_alu() call after the pointer move. Piotr noticed a
corner case where the off reg causes masking direction change which
then results in an incorrect final aux->alu_limit.

Fixes: 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask")
Reported-by: Piotr Krysiuk 
Signed-off-by: Daniel Borkmann 
Reviewed-by: Piotr Krysiuk 
Acked-by: Alexei Starovoitov 
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Wrap aux data inside bpf_sanitize_info container

2021-06-03T06:38:07+00:00

commit 3d0220f6861d713213b015b582e9f21e5b28d2e0 upstream

Add a container structure struct bpf_sanitize_info which holds
the current aux info, and update call-sites to sanitize_ptr_alu()
to pass it in. This is needed for passing in additional state
later on.

Signed-off-by: Daniel Borkmann 
Reviewed-by: Piotr Krysiuk 
Acked-by: Alexei Starovoitov 
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Fix leakage of uninitialized bpf stack under speculation

2021-06-03T06:38:07+00:00

commit 801c6058d14a82179a7ee17a4b532cac6fad067f upstream.

The current implemented mechanisms to mitigate data disclosure under
speculation mainly address stack and map value oob access from the
speculative domain. However, Piotr discovered that uninitialized BPF
stack is not protected yet, and thus old data from the kernel stack,
potentially including addresses of kernel structures, could still be
extracted from that 512 bytes large window. The BPF stack is special
compared to map values since it's not zero initialized for every
program invocation, whereas map values /are/ zero initialized upon
their initial allocation and thus cannot leak any prior data in either
domain. In the non-speculative domain, the verifier ensures that every
stack slot read must have a prior stack slot write by the BPF program
to avoid such data leaking issue.

However, this is not enough: for example, when the pointer arithmetic
operation moves the stack pointer from the last valid stack offset to
the first valid offset, the sanitation logic allows for any intermediate
offsets during speculative execution, which could then be used to
extract any restricted stack content via side-channel.

Given for unprivileged stack pointer arithmetic the use of unknown
but bounded scalars is generally forbidden, we can simply turn the
register-based arithmetic operation into an immediate-based arithmetic
operation without the need for masking. This also gives the benefit
of reducing the needed instructions for the operation. Given after
the work in 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic
mask"), the aux->alu_limit already holds the final immediate value for
the offset register with the known scalar. Thus, a simple mov of the
immediate to AX register with using AX as the source for the original
instruction is sufficient and possible now in this case.

Reported-by: Piotr Krysiuk 
Signed-off-by: Daniel Borkmann 
Tested-by: Piotr Krysiuk 
Reviewed-by: Piotr Krysiuk 
Reviewed-by: John Fastabend 
Acked-by: Alexei Starovoitov 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Tighten speculative pointer arithmetic mask

2021-06-03T06:38:07+00:00

commit 7fedb63a8307dda0ec3b8969a3b233a1dd7ea8e0 upstream.

This work tightens the offset mask we use for unprivileged pointer arithmetic
in order to mitigate a corner case reported by Piotr and Benedict where in
the speculative domain it is possible to advance, for example, the map value
pointer by up to value_size-1 out-of-bounds in order to leak kernel memory
via side-channel to user space.

Before this change, the computed ptr_limit for retrieve_ptr_limit() helper
represents largest valid distance when moving pointer to the right or left
which is then fed as aux->alu_limit to generate masking instructions against
the offset register. After the change, the derived aux->alu_limit represents
the largest potential value of the offset register which we mask against which
is just a narrower subset of the former limit.

For minimal complexity, we call sanitize_ptr_alu() from 2 observation points
in adjust_ptr_min_max_vals(), that is, before and after the simulated alu
operation. In the first step, we retieve the alu_state and alu_limit before
the operation as well as we branch-off a verifier path and push it to the
verification stack as we did before which checks the dst_reg under truncation,
in other words, when the speculative domain would attempt to move the pointer
out-of-bounds.

In the second step, we retrieve the new alu_limit and calculate the absolute
distance between both. Moreover, we commit the alu_state and final alu_limit
via update_alu_sanitation_state() to the env's instruction aux data, and bail
out from there if there is a mismatch due to coming from different verification
paths with different states.

Reported-by: Piotr Krysiuk 
Reported-by: Benedict Schlueter 
Signed-off-by: Daniel Borkmann 
Reviewed-by: John Fastabend 
Acked-by: Alexei Starovoitov 
Tested-by: Benedict Schlueter 
[fllinden@amazon.com: backported to 5.4]
Signed-off-by: Frank van der Linden 
Signed-off-by: Greg Kroah-Hartman 
[OP: backport to 4.19]
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman

bpf: Move sanitize_val_alu out of op switch

2021-06-03T06:38:07+00:00

commit f528819334881fd622fdadeddb3f7edaed8b7c9b upstream.

Add a small sanitize_needed() helper function and move sanitize_val_alu()
out of the main opcode switch. In upcoming work, we'll move sanitize_ptr_alu()
as well out of its opcode switch so this helps to streamline both.

Signed-off-by: Daniel Borkmann 
Reviewed-by: John Fastabend 
Acked-by: Alexei Starovoitov 
[fllinden@amazon.com: backported to 5.4]
Signed-off-by: Frank van der Linden 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ovidiu Panait 
Signed-off-by: Greg Kroah-Hartman