linux.git/kernel/events, branch v4.19.205

perf: Fix data race between pin_count increment/decrement

2021-06-16T09:55:01+00:00

commit 6c605f8371159432ec61cbb1488dcf7ad24ad19a upstream.

KCSAN reports a data race between increment and decrement of pin_count:

  write to 0xffff888237c2d4e0 of 4 bytes by task 15740 on cpu 1:
   find_get_context		kernel/events/core.c:4617
   __do_sys_perf_event_open	kernel/events/core.c:12097 [inline]
   __se_sys_perf_event_open	kernel/events/core.c:11933
   ...
  read to 0xffff888237c2d4e0 of 4 bytes by task 15743 on cpu 0:
   perf_unpin_context		kernel/events/core.c:1525 [inline]
   __do_sys_perf_event_open	kernel/events/core.c:12328 [inline]
   __se_sys_perf_event_open	kernel/events/core.c:11933
   ...

Because neither read-modify-write here is atomic, this can lead to one
of the operations being lost, resulting in an inconsistent pin_count.
Fix it by adding the missing locking in the CPU-event case.

Fixes: fe4b04fa31a6 ("perf: Cure task_oncpu_function_call() races")
Reported-by: syzbot+142c9018f5962db69c7e@syzkaller.appspotmail.com
Signed-off-by: Marco Elver 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210527104711.2671610-1-elver@google.com
Signed-off-by: Greg Kroah-Hartman

perf/core: Fix endless multiplex timer

2021-06-16T09:54:56+00:00

commit 90c91dfb86d0ff545bd329d3ddd72c147e2ae198 upstream.

Kan and Andi reported that we fail to kill rotation when the flexible
events go empty, but the context does not. XXX moar

Fixes: fd7d55172d1e ("perf/cgroups: Don't rotate events for cgroups unnecessarily")
Reported-by: Andi Kleen 
Reported-by: Kan Liang 
Tested-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Wen Yang 
Link: https://lkml.kernel.org/r/20200305123851.GX2596@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman

perf/core: Fix corner case in perf_rotate_context()

2021-06-10T11:24:08+00:00

commit 7fa343b7fdc4f351de4e3f28d5c285937dd1f42f upstream.

In perf_rotate_context(), when the first cpu flexible event fail to
schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is
NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus
cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next
perf_event_sched_in() will skip all cpu flexible events because of the
EVENT_FLEXIBLE bit.

In the next call of perf_rotate_context(), cpu_rotate stays 1, and
cpu_event stays NULL, so this process repeats. The end result is, flexible
events on this cpu will not be scheduled (until another event being added
to the cpuctx).

Here is an easy repro of this issue. On Intel CPUs, where ref-cycles
could only use one counter, run one pinned event for ref-cycles, one
flexible event for ref-cycles, and one flexible event for cycles. The
flexible ref-cycles is never scheduled, which is expected. However,
because of this issue, the cycles event is never scheduled either.

 $ perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000

           time             counts unit events
    1.000152973         15,412,480      ref-cycles:D
    1.000152973            ref-cycles     (0.00%)
    1.000152973            cycles         (0.00%)
    2.000486957         18,263,120      ref-cycles:D
    2.000486957            ref-cycles     (0.00%)
    2.000486957            cycles         (0.00%)

To fix this, when the flexible_active list is empty, try rotate the
first event in the flexible_groups. Also, rename ctx_first_active() to
ctx_event_to_rotate(), which is more accurate.

Signed-off-by: Song Liu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sasha Levin 
Cc: Thomas Gleixner 
Fixes: 8d5bce0c37fa ("perf/core: Optimize perf_rotate_context() event scheduling")
Link: https://lkml.kernel.org/r/20191008165949.920548-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Wen Yang 
Signed-off-by: Greg Kroah-Hartman

perf/cgroups: Don't rotate events for cgroups unnecessarily

2021-06-10T11:24:08+00:00

commit fd7d55172d1e2e501e6da0a5c1de25f06612dc2e upstream.

Currently perf_rotate_context assumes that if the context's nr_events !=
nr_active a rotation is necessary for perf event multiplexing. With
cgroups, nr_events is the total count of events for all cgroups and
nr_active will not include events in a cgroup other than the current
task's. This makes rotation appear necessary for cgroups when it is not.

Add a perf_event_context flag that is set when rotation is necessary.
Clear the flag during sched_out and set it when a flexible sched_in
fails due to resources.

Signed-off-by: Ian Rogers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Borislav Petkov 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Linus Torvalds 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Wen Yang 
Signed-off-by: Greg Kroah-Hartman

perf/core: Fix race in the perf_mmap_close() function

2020-11-18T18:18:52+00:00

commit f91072ed1b7283b13ca57fcfbece5a3b92726143 upstream.

There's a possible race in perf_mmap_close() when checking ring buffer's
mmap_count refcount value. The problem is that the mmap_count check is
not atomic because we call atomic_dec() and atomic_read() separately.

  perf_mmap_close:
  ...
   atomic_dec(&rb->mmap_count);
   ...
   if (atomic_read(&rb->mmap_count))
      goto out_put;

   
   free_uid

out_put:
  ring_buffer_put(rb); /* could be last */

The race can happen when we have two (or more) events sharing same ring
buffer and they go through atomic_dec() and then they both see 0 as refcount
value later in atomic_read(). Then both will go on and execute code which
is meant to be run just once.

The code that detaches ring buffer is probably fine to be executed more
than once, but the problem is in calling free_uid(), which will later on
demonstrate in related crashes and refcount warnings, like:

  refcount_t: addition on 0; use-after-free.
  ...
  RIP: 0010:refcount_warn_saturate+0x6d/0xf
  ...
  Call Trace:
  prepare_creds+0x190/0x1e0
  copy_creds+0x35/0x172
  copy_process+0x471/0x1a80
  _do_fork+0x83/0x3a0
  __do_sys_wait4+0x83/0x90
  __do_sys_clone+0x85/0xa0
  do_syscall_64+0x5b/0x1e0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Using atomic decrease and check instead of separated calls.

Tested-by: Michael Petlan 
Signed-off-by: Jiri Olsa 
Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Namhyung Kim 
Acked-by: Wade Mealing 
Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole");
Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava
[sudip: used ring_buffer]
Signed-off-by: Sudip Mukherjee 
Signed-off-by: Greg Kroah-Hartman

perf: Fix get_recursion_context()

2020-11-18T18:18:48+00:00

[ Upstream commit ce0f17fc93f63ee91428af10b7b2ddef38cd19e5 ]

One should use in_serving_softirq() to detect SoftIRQ context.

Fixes: 96f6d4444302 ("perf_counter: avoid recursion")
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20201030151955.120572175@infradead.org
Signed-off-by: Sasha Levin

perf/core: Fix a memory leak in perf_event_parse_addr_filter()

2020-11-10T11:36:02+00:00

commit 7bdb157cdebbf95a1cd94ed2e01b338714075d00 upstream.

As shown through runtime testing, the "filename" allocation is not
always freed in perf_event_parse_addr_filter().

There are three possible ways that this could happen:

 - It could be allocated twice on subsequent iterations through the loop,
 - or leaked on the success path,
 - or on the failure path.

Clean up the code flow to make it obvious that 'filename' is always
freed in the reallocation path and in the two return paths as well.

We rely on the fact that kfree(NULL) is NOP and filename is initialized
with NULL.

This fixes the leak. No other side effects expected.

[ Dan Carpenter: cleaned up the code flow & added a changelog. ]
[ Ingo Molnar: updated the changelog some more. ]

Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Signed-off-by: "kiyin(尹亮)" 
Signed-off-by: Dan Carpenter 
Signed-off-by: Ingo Molnar 
Cc: "Srivatsa S. Bhat" 
Cc: Anthony Liguori 
--
 kernel/events/core.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)
Signed-off-by: Greg Kroah-Hartman

perf: Fix task_function_call() error handling

2020-10-14T08:31:26+00:00

[ Upstream commit 6d6b8b9f4fceab7266ca03d194f60ec72bd4b654 ]

The error handling introduced by commit:

  2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")

looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.

Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju 
Signed-off-by: Kajol Jain 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Reviewed-by: Barret Rhoden 
Tested-by: Srikar Dronamraju 
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin

uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix GDB regression

2020-07-29T08:16:47+00:00

commit fe5ed7ab99c656bd2f5b79b49df0e9ebf2cead8a upstream.

If a tracee is uprobed and it hits int3 inserted by debugger, handle_swbp()
does send_sig(SIGTRAP, current, 0) which means si_code == SI_USER. This used
to work when this code was written, but then GDB started to validate si_code
and now it simply can't use breakpoints if the tracee has an active uprobe:

	# cat test.c
	void unused_func(void)
	{
	}
	int main(void)
	{
		return 0;
	}

	# gcc -g test.c -o test
	# perf probe -x ./test -a unused_func
	# perf record -e probe_test:unused_func gdb ./test -ex run
	GNU gdb (GDB) 10.0.50.20200714-git
	...
	Program received signal SIGTRAP, Trace/breakpoint trap.
	0x00007ffff7ddf909 in dl_main () from /lib64/ld-linux-x86-64.so.2
	(gdb)

The tracee hits the internal breakpoint inserted by GDB to monitor shared
library events but GDB misinterprets this SIGTRAP and reports a signal.

Change handle_swbp() to use force_sig(SIGTRAP), this matches do_int3_user()
and fixes the problem.

This is the minimal fix for -stable, arch/x86/kernel/uprobes.c is equally
wrong; it should use send_sigtrap(TRAP_TRACE) instead of send_sig(SIGTRAP),
but this doesn't confuse GDB and needs another x86-specific patch.

Reported-by: Aaron Merey 
Signed-off-by: Oleg Nesterov 
Signed-off-by: Ingo Molnar 
Reviewed-by: Srikar Dronamraju 
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200723154420.GA32043@redhat.com
Signed-off-by: Greg Kroah-Hartman

perf: Add cond_resched() to task_function_call()

2020-06-22T07:05:08+00:00

commit 2ed6edd33a214bca02bd2b45e3fc3038a059436b upstream.

Under rare circumstances, task_function_call() can repeatedly fail and
cause a soft lockup.

There is a slight race where the process is no longer running on the cpu
we targeted by the time remote_function() runs.  The code will simply
try again.  If we are very unlucky, this will continue to fail, until a
watchdog fires.  This can happen in a heavily loaded, multi-core virtual
machine.

Reported-by: syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com
Signed-off-by: Barret Rhoden 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20200414222920.121401-1-brho@google.com
Signed-off-by: Greg Kroah-Hartman