summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-05-27sched/fair: Fix enqueue_task_fair() warning some morePhil Auld1-0/+7
[ Upstream commit b34cb07dde7c2346dec73d053ce926aeaa087303 ] sched/fair: Fix enqueue_task_fair warning some more The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning) did not fully resolve the issues with the rq->tmp_alone_branch != &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where the first for_each_sched_entity loop exits due to on_rq, having incompletely updated the list. In this case the second for_each_sched_entity loop can further modify se. The later code to fix up the list management fails to do what is needed because se does not point to the sched_entity which broke out of the first loop. The list is not fixed up because the throttled parent was already added back to the list by a task enqueue in a parallel child hierarchy. Address this by calling list_add_leaf_cfs_rq if there are throttled parents while doing the second for_each_sched_entity loop. Fixes: fe61468b2cb ("sched/fair: Fix enqueue_task_fair warning") Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27sched/fair: Fix reordering of enqueue/dequeue_task_fair()Vincent Guittot1-8/+9
[ Upstream commit 5ab297bab984310267734dfbcc8104566658ebef ] Even when a cgroup is throttled, the group se of a child cgroup can still be enqueued and its gse->on_rq stays true. When a task is enqueued on such child, we still have to update the load_avg and increase h_nr_running of the throttled cfs. Nevertheless, the 1st for_each_sched_entity() loop is skipped because of gse->on_rq == true and the 2nd loop because the cfs is throttled whereas we have to update both load_avg with the old h_nr_running and increase h_nr_running in such case. The same sequence can happen during dequeue when se moves to parent before breaking in the 1st loop. Note that the update of load_avg will effectively happen only once in order to sync up to the throttled time. Next call for updating load_avg will stop early because the clock stays unchanged. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Fixes: 6d4d22468dae ("sched/fair: Reorder enqueue/dequeue_task_fair path") Link: https://lkml.kernel.org/r/20200306084208.12583-1-vincent.guittot@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27sched/fair: Reorder enqueue/dequeue_task_fair pathVincent Guittot1-22/+20
[ Upstream commit 6d4d22468dae3d8757af9f8b81b848a76ef4409d ] The walk through the cgroup hierarchy during the enqueue/dequeue of a task is split in 2 distinct parts for throttled cfs_rq without any added value but making code less readable. Change the code ordering such that everything related to a cfs_rq (throttled or not) will be done in the same loop. In addition, the same steps ordering is used when updating a cfs_rq: - update_load_avg - update_cfs_group - update *h_nr_running This reordering enables the use of h_nr_running in PELT algorithm. No functional and performance changes are expected and have been noticed during tests. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-5-mgorman@techsingularity.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20Stop the ad-hoc games with -Wno-maybe-initializedLinus Torvalds1-1/+0
commit 78a5255ffb6a1af189a83e493d916ba1c54d8c75 upstream. We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20fork: prevent accidental access to clone3 featuresChristian Brauner1-6/+7
[ Upstream commit 3f2c788a13143620c5471ac96ac4f033fc9ac3f3 ] Jan reported an issue where an interaction between sign-extending clone's flag argument on ppc64le and the new CLONE_INTO_CGROUP feature causes clone() to consistently fail with EBADF. The whole story is a little longer. The legacy clone() syscall is odd in a bunch of ways and here two things interact. First, legacy clone's flag argument is word-size dependent, i.e. it's an unsigned long whereas most system calls with flag arguments use int or unsigned int. Second, legacy clone() ignores unknown and deprecated flags. The two of them taken together means that users on 64bit systems can pass garbage for the upper 32bit of the clone() syscall since forever and things would just work fine. Just try this on a 64bit kernel prior to v5.7-rc1 where this will succeed and on v5.7-rc1 where this will fail with EBADF: int main(int argc, char *argv[]) { pid_t pid; /* Note that legacy clone() has different argument ordering on * different architectures so this won't work everywhere. * * Only set the upper 32 bits. */ pid = syscall(__NR_clone, 0xffffffff00000000 | SIGCHLD, NULL, NULL, NULL, NULL); if (pid < 0) exit(EXIT_FAILURE); if (pid == 0) exit(EXIT_SUCCESS); if (wait(NULL) != pid) exit(EXIT_FAILURE); exit(EXIT_SUCCESS); } Since legacy clone() couldn't be extended this was not a problem so far and nobody really noticed or cared since nothing in the kernel ever bothered to look at the upper 32 bits. But once we introduced clone3() and expanded the flag argument in struct clone_args to 64 bit we opened this can of worms. With the first flag-based extension to clone3() making use of the upper 32 bits of the flag argument we've effectively made it possible for the legacy clone() syscall to reach clone3() only flags. The sign extension scenario is just the odd corner-case that we needed to figure this out. The reason we just realized this now and not already when we introduced CLONE_CLEAR_SIGHAND was that CLONE_INTO_CGROUP assumes that a valid cgroup file descriptor has been given. So the sign extension (or the user accidently passing garbage for the upper 32 bits) caused the CLONE_INTO_CGROUP bit to be raised and the kernel to error out when it didn't find a valid cgroup file descriptor. Let's fix this by always capping the upper 32 bits for all codepaths that are not aware of clone3() features. This ensures that we can't reach clone3() only features by accident via legacy clone as with the sign extension case and also that legacy clone() works exactly like before, i.e. ignoring any unknown flags. This solution risks no regressions and is also pretty clean. Fixes: 7f192e3cd316 ("fork: add clone3") Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitry V. Levin <ldv@altlinux.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: libc-alpha@sourceware.org Cc: stable@vger.kernel.org # 5.3+ Link: https://sourceware.org/pipermail/libc-alpha/2020-May/113596.html Link: https://lore.kernel.org/r/20200507103214.77218-1-christian.brauner@ubuntu.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20bpf: Fix error return code in map_lookup_and_delete_elem()Wei Yongjun1-1/+3
[ Upstream commit 7f645462ca01d01abb94d75e6768c8b3ed3a188b ] Fix to return negative error code -EFAULT from the copy_to_user() error handling case instead of 0, as done elsewhere in this function. Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430081851.166996-1-weiyongjun1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20umh: fix memory leak on execve failureVincent Minet1-0/+6
[ Upstream commit db803036ada7d61d096783726f9771b3fc540370 ] If a UMH process created by fork_usermode_blob() fails to execute, a pair of struct file allocated by umh_pipe_setup() will leak. Under normal conditions, the caller (like bpfilter) needs to manage the lifetime of the UMH and its two pipes. But when fork_usermode_blob() fails, the caller doesn't really have a way to know what needs to be done. It seems better to do the cleanup ourselves in this case. Fixes: 449325b52b7a ("umh: introduce fork_usermode_blob() helper") Signed-off-by: Vincent Minet <v.minet@criteo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14coredump: fix crash when umh is disabledLuis Chamberlain1-0/+5
commit 3740d93e37902b31159a82da2d5c8812ed825404 upstream. Commit 64e90a8acb859 ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()") added the optiont to disable all call_usermodehelper() calls by setting STATIC_USERMODEHELPER_PATH to an empty string. When this is done, and crashdump is triggered, it will crash on null pointer dereference, since we make assumptions over what call_usermodehelper_exec() did. This has been reported by Sergey when one triggers a a coredump with the following configuration: ``` CONFIG_STATIC_USERMODEHELPER=y CONFIG_STATIC_USERMODEHELPER_PATH="" kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e ``` The way disabling the umh was designed was that call_usermodehelper_exec() would just return early, without an error. But coredump assumes certain variables are set up for us when this happens, and calls ile_start_write(cprm.file) with a NULL file. [ 2.819676] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 2.819859] #PF: supervisor read access in kernel mode [ 2.820035] #PF: error_code(0x0000) - not-present page [ 2.820188] PGD 0 P4D 0 [ 2.820305] Oops: 0000 [#1] SMP PTI [ 2.820436] CPU: 2 PID: 89 Comm: a Not tainted 5.7.0-rc1+ #7 [ 2.820680] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 [ 2.821150] RIP: 0010:do_coredump+0xd80/0x1060 [ 2.821385] Code: e8 95 11 ed ff 48 c7 c6 cc a7 b4 81 48 8d bd 28 ff ff ff 89 c2 e8 70 f1 ff ff 41 89 c2 85 c0 0f 84 72 f7 ff ff e9 b4 fe ff ff <48> 8b 57 20 0f b7 02 66 25 00 f0 66 3d 00 8 0 0f 84 9c 01 00 00 44 [ 2.822014] RSP: 0000:ffffc9000029bcb8 EFLAGS: 00010246 [ 2.822339] RAX: 0000000000000000 RBX: ffff88803f860000 RCX: 000000000000000a [ 2.822746] RDX: 0000000000000009 RSI: 0000000000000282 RDI: 0000000000000000 [ 2.823141] RBP: ffffc9000029bde8 R08: 0000000000000000 R09: ffffc9000029bc00 [ 2.823508] R10: 0000000000000001 R11: ffff88803dec90be R12: ffffffff81c39da0 [ 2.823902] R13: ffff88803de84400 R14: 0000000000000000 R15: 0000000000000000 [ 2.824285] FS: 00007fee08183540(0000) GS:ffff88803e480000(0000) knlGS:0000000000000000 [ 2.824767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.825111] CR2: 0000000000000020 CR3: 000000003f856005 CR4: 0000000000060ea0 [ 2.825479] Call Trace: [ 2.825790] get_signal+0x11e/0x720 [ 2.826087] do_signal+0x1d/0x670 [ 2.826361] ? force_sig_info_to_task+0xc1/0xf0 [ 2.826691] ? force_sig_fault+0x3c/0x40 [ 2.826996] ? do_trap+0xc9/0x100 [ 2.827179] exit_to_usermode_loop+0x49/0x90 [ 2.827359] prepare_exit_to_usermode+0x77/0xb0 [ 2.827559] ? invalid_op+0xa/0x30 [ 2.827747] ret_from_intr+0x20/0x20 [ 2.827921] RIP: 0033:0x55e2c76d2129 [ 2.828107] Code: 2d ff ff ff e8 68 ff ff ff 5d c6 05 18 2f 00 00 01 c3 0f 1f 80 00 00 00 00 c3 0f 1f 80 00 00 00 00 e9 7b ff ff ff 55 48 89 e5 <0f> 0b b8 00 00 00 00 5d c3 66 2e 0f 1f 84 0 0 00 00 00 00 0f 1f 40 [ 2.828603] RSP: 002b:00007fffeba5e080 EFLAGS: 00010246 [ 2.828801] RAX: 000055e2c76d2125 RBX: 0000000000000000 RCX: 00007fee0817c718 [ 2.829034] RDX: 00007fffeba5e188 RSI: 00007fffeba5e178 RDI: 0000000000000001 [ 2.829257] RBP: 00007fffeba5e080 R08: 0000000000000000 R09: 00007fee08193c00 [ 2.829482] R10: 0000000000000009 R11: 0000000000000000 R12: 000055e2c76d2040 [ 2.829727] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2.829964] CR2: 0000000000000020 [ 2.830149] ---[ end trace ceed83d8c68a1bf1 ]--- ``` Cc: <stable@vger.kernel.org> # v4.11+ Fixes: 64e90a8acb85 ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199795 Reported-by: Tony Vroon <chainsaw@gentoo.org> Reported-by: Sergey Kvachonok <ravenexp@gmail.com> Tested-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20200416162859.26518-1-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14tracing: Add a vmalloc_sync_mappings() for safe measureSteven Rostedt (VMware)1-0/+13
commit 11f5efc3ab66284f7aaacc926e9351d658e2577b upstream. x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu areas can be complex, to say the least. Mappings may happen at boot up, and if nothing synchronizes the page tables, those page mappings may not be synced till they are used. This causes issues for anything that might touch one of those mappings in the path of the page fault handler. When one of those unmapped mappings is touched in the page fault handler, it will cause another page fault, which in turn will cause a page fault, and leave us in a loop of page faults. Commit 763802b53a42 ("x86/mm: split vmalloc_sync_all()") split vmalloc_sync_all() into vmalloc_sync_unmappings() and vmalloc_sync_mappings(), as on system exit, it did not need to do a full sync on x86_64 (although it still needed to be done on x86_32). By chance, the vmalloc_sync_all() would synchronize the page mappings done at boot up and prevent the per cpu area from being a problem for tracing in the page fault handler. But when that synchronization in the exit of a task became a nop, it caused the problem to appear. Link: https://lore.kernel.org/r/20200429054857.66e8e333@oasis.local.home Cc: stable@vger.kernel.org Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14tracing/kprobes: Fix a double initialization typoMasami Hiramatsu1-1/+1
[ Upstream commit dcbd21c9fca5e954fd4e3d91884907eb6d47187e ] Fix a typo that resulted in an unnecessary double initialization to addr. Link: http://lkml.kernel.org/r/158779374968.6082.2337484008464939919.stgit@devnote2 Cc: Tom Zanussi <zanussi@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: c7411a1a126f ("tracing/kprobe: Check whether the non-suffixed symbol is notrace") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-10tracing: Fix memory leaks in trace_events_hist.cVamshi K Sthambamkadi1-0/+7
[ Upstream commit 9da73974eb9c965dd9989befb593b8c8da9e4bdc ] kmemleak report 1: [<9092c50b>] kmem_cache_alloc_trace+0x138/0x270 [<05a2c9ed>] create_field_var+0xcf/0x180 [<528a2d68>] action_create+0xe2/0xc80 [<63f50b61>] event_hist_trigger_func+0x15b5/0x1920 [<28ea5d3d>] trigger_process_regex+0x7b/0xc0 [<3138e86f>] event_trigger_write+0x4d/0xb0 [<ffd66c19>] __vfs_write+0x30/0x200 [<4f424a0d>] vfs_write+0x96/0x1b0 [<da59a290>] ksys_write+0x53/0xc0 [<3717101a>] __ia32_sys_write+0x15/0x20 [<c5f23497>] do_fast_syscall_32+0x70/0x250 [<46e2629c>] entry_SYSENTER_32+0xaf/0x102 This is because save_vars[] of struct hist_trigger_data are not destroyed kmemleak report 2: [<9092c50b>] kmem_cache_alloc_trace+0x138/0x270 [<6e5e97c5>] create_var+0x3c/0x110 [<de82f1b9>] create_field_var+0xaf/0x180 [<528a2d68>] action_create+0xe2/0xc80 [<63f50b61>] event_hist_trigger_func+0x15b5/0x1920 [<28ea5d3d>] trigger_process_regex+0x7b/0xc0 [<3138e86f>] event_trigger_write+0x4d/0xb0 [<ffd66c19>] __vfs_write+0x30/0x200 [<4f424a0d>] vfs_write+0x96/0x1b0 [<da59a290>] ksys_write+0x53/0xc0 [<3717101a>] __ia32_sys_write+0x15/0x20 [<c5f23497>] do_fast_syscall_32+0x70/0x250 [<46e2629c>] entry_SYSENTER_32+0xaf/0x102 struct hist_field allocated through create_var() do not initialize "ref" field to 1. The code in __destroy_hist_field() does not destroy object if "ref" is initialized to zero, the condition if (--hist_field->ref > 1) always passes since unsigned int wraps. kmemleak report 3: [<f8666fcc>] __kmalloc_track_caller+0x139/0x2b0 [<bb7f80a5>] kstrdup+0x27/0x50 [<39d70006>] init_var_ref+0x58/0xd0 [<8ca76370>] create_var_ref+0x89/0xe0 [<f045fc39>] action_create+0x38f/0xc80 [<7c146821>] event_hist_trigger_func+0x15b5/0x1920 [<07de3f61>] trigger_process_regex+0x7b/0xc0 [<e87daf8f>] event_trigger_write+0x4d/0xb0 [<19bf1512>] __vfs_write+0x30/0x200 [<64ce4d27>] vfs_write+0x96/0x1b0 [<a6f34170>] ksys_write+0x53/0xc0 [<7d4230cd>] __ia32_sys_write+0x15/0x20 [<8eadca00>] do_fast_syscall_32+0x70/0x250 [<235cf985>] entry_SYSENTER_32+0xaf/0x102 hist_fields (system & event_name) are not freed Link: http://lkml.kernel.org/r/20200422061503.GA5151@cosmos Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-06PM: hibernate: Freeze kernel threads in software_resume()Dexuan Cui1-0/+7
commit 2351f8d295ed63393190e39c2f7c1fee1a80578f upstream. Currently the kernel threads are not frozen in software_resume(), so between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(), system_freezable_power_efficient_wq can still try to submit SCSI commands and this can cause a panic since the low level SCSI driver (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept any SCSI commands: https://lkml.org/lkml/2020/4/10/47 At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying to resolve the issue from hv_storvsc, but with the help of Bart Van Assche, I realized it's better to fix software_resume(), since this looks like a generic issue, not only pertaining to SCSI. Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02signal: check sig before setting info in kill_pid_usb_asyncioZhiqiang Liu1-3/+3
[ Upstream commit eaec2b0bd30690575c581eebffae64bfb7f684ac ] In kill_pid_usb_asyncio, if signal is not valid, we do not need to set info struct. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/f525fd08-1cf7-fb09-d20c-4359145eb940@huawei.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02perf/core: fix parent pid/tid in task exit eventsIan Rogers1-3/+10
commit f3bed55e850926614b9898fe982f66d2541a36a5 upstream. Current logic yields the child task as the parent. Before: $ perf record bash -c "perf list > /dev/null" $ perf script -D |grep 'FORK\|EXIT' 4387036190981094 0x5a70 [0x30]: PERF_RECORD_FORK(10472:10472):(10470:10470) 4387036606207580 0xf050 [0x30]: PERF_RECORD_EXIT(10472:10472):(10472:10472) 4387036607103839 0x17150 [0x30]: PERF_RECORD_EXIT(10470:10470):(10470:10470) ^ Note the repeated values here -------------------/ After: 383281514043 0x9d8 [0x30]: PERF_RECORD_FORK(2268:2268):(2266:2266) 383442003996 0x2180 [0x30]: PERF_RECORD_EXIT(2268:2268):(2266:2266) 383451297778 0xb70 [0x30]: PERF_RECORD_EXIT(2266:2266):(2265:2265) Fixes: 94d5d1b2d891 ("perf_counter: Report the cloning task as parent on perf_counter_fork()") Reported-by: KP Singh <kpsingh@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200417182842.12522-1-irogers@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02sched/core: Fix reset-on-fork from RT with uclampQuentin Perret1-7/+2
commit eaf5a92ebde5bca3bb2565616115bd6d579486cd upstream. uclamp_fork() resets the uclamp values to their default when the reset-on-fork flag is set. It also checks whether the task has a RT policy, and sets its uclamp.min to 1024 accordingly. However, during reset-on-fork, the task's policy is lowered to SCHED_NORMAL right after, hence leading to an erroneous uclamp.min setting for the new task if it was forked from RT. Fix this by removing the unnecessary check on rt_task() in uclamp_fork() as this doesn't make sense if the reset-on-fork flag is set. Fixes: 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks") Reported-by: Chitti Babu Theegala <ctheegal@codeaurora.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Patrick Bellasi <patrick.bellasi@matbug.net> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20200416085956.217587-1-qperret@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02cpumap: Avoid warning when CONFIG_DEBUG_PER_CPU_MAPS is enabledToke Høiland-Jørgensen1-1/+1
commit bc23d0e3f717ced21fbfacab3ab887d55e5ba367 upstream. When the kernel is built with CONFIG_DEBUG_PER_CPU_MAPS, the cpumap code can trigger a spurious warning if CONFIG_CPUMASK_OFFSTACK is also set. This happens because in this configuration, NR_CPUS can be larger than nr_cpumask_bits, so the initial check in cpu_map_alloc() is not sufficient to guard against hitting the warning in cpumask_check(). Fix this by explicitly checking the supplied key against the nr_cpumask_bits variable before calling cpu_possible(). Fixes: 6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Xiumei Mu <xmu@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200416083120.453718-1-toke@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02bpf: Forbid XADD on spilled pointers for unprivileged usersJann Horn1-9/+19
commit 6e7e63cbb023976d828cdb22422606bf77baa8a9 upstream. When check_xadd() verifies an XADD operation on a pointer to a stack slot containing a spilled pointer, check_stack_read() verifies that the read, which is part of XADD, is valid. However, since the placeholder value -1 is passed as `value_regno`, check_stack_read() can only return a binary decision and can't return the type of the value that was read. The intent here is to verify whether the value read from the stack slot may be used as a SCALAR_VALUE; but since check_stack_read() doesn't check the type, and the type information is lost when check_stack_read() returns, this is not enforced, and a malicious user can abuse XADD to leak spilled kernel pointers. Fix it by letting check_stack_read() verify that the value is usable as a SCALAR_VALUE if no type information is passed to the caller. To be able to use __is_pointer_value() in check_stack_read(), move it up. Fix up the expected unprivileged error message for a BPF selftest that, until now, assumed that unprivileged users can use XADD on stack-spilled pointers. This also gives us a test for the behavior introduced in this patch for free. In theory, this could also be fixed by forbidding XADD on stack spills entirely, since XADD is a locked operation (for operations on memory with concurrency) and there can't be any concurrency on the BPF stack; but Alexei has said that he wants to keep XADD on stack slots working to avoid changes to the test suite [1]. The following BPF program demonstrates how to leak a BPF map pointer as an unprivileged user using this bug: // r7 = map_pointer BPF_LD_MAP_FD(BPF_REG_7, small_map), // r8 = launder(map_pointer) BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_7, -8), BPF_MOV64_IMM(BPF_REG_1, 0), ((struct bpf_insn) { .code = BPF_STX | BPF_DW | BPF_XADD, .dst_reg = BPF_REG_FP, .src_reg = BPF_REG_1, .off = -8 }), BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_FP, -8), // store r8 into map BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_7), BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP), BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -4), BPF_ST_MEM(BPF_W, BPF_REG_ARG2, 0, 0), BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_8, 0), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN() [1] https://lore.kernel.org/bpf/20200416211116.qxqcza5vo2ddnkdq@ast-mbp.dhcp.thefacebook.com/ Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200417000007.10734-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02printk: queue wake_up_klogd irq_work only if per-CPU areas are readySergey Senozhatsky3-10/+40
commit ab6f762f0f53162d41497708b33c9a3236d3609e upstream. printk_deferred(), similarly to printk_safe/printk_nmi, does not immediately attempt to print a new message on the consoles, avoiding calls into non-reentrant kernel paths, e.g. scheduler or timekeeping, which potentially can deadlock the system. Those printk() flavors, instead, rely on per-CPU flush irq_work to print messages from safer contexts. For same reasons (recursive scheduler or timekeeping calls) printk() uses per-CPU irq_work in order to wake up user space syslog/kmsg readers. However, only printk_safe/printk_nmi do make sure that per-CPU areas have been initialised and that it's safe to modify per-CPU irq_work. This means that, for instance, should printk_deferred() be invoked "too early", that is before per-CPU areas are initialised, printk_deferred() will perform illegal per-CPU access. Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random: silence a lockdep splat with printk()") user-space syslog/kmsg readers are not able to read new kernel messages. The reason is printk_deferred() being called too early (as was pointed out by Petr and John). Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU areas are initialized. Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/ Reported-by: Lech Perczak <l.perczak@camlintechnologies.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Jann Horn <jannh@google.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: John Ogness <john.ogness@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29audit: check the length of userspace generated audit recordsPaul Moore1-0/+3
commit 763dafc520add02a1f4639b500c509acc0ea8e5b upstream. Commit 756125289285 ("audit: always check the netlink payload length in audit_receive_msg()") fixed a number of missing message length checks, but forgot to check the length of userspace generated audit records. The good news is that you need CAP_AUDIT_WRITE to submit userspace audit records, which is generally only given to trusted processes, so the impact should be limited. Cc: stable@vger.kernel.org Fixes: 756125289285 ("audit: always check the netlink payload length in audit_receive_msg()") Reported-by: syzbot+49e69b4d71a420ceda3e@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29signal: Avoid corrupting si_pid and si_uid in do_notify_parentEric W. Biederman1-1/+5
commit 61e713bdca3678e84815f2427f7a063fc353a1fc upstream. Christof Meerwald <cmeerw@cmeerw.org> writes: > Hi, > > this is probably related to commit > 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace > fixups of si_pid and si_uid). > > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a > properly set si_pid field - this seems to happen for multi-threaded > child processes. > > A simple test program (based on the sample from the signalfd man page): > > #include <sys/signalfd.h> > #include <signal.h> > #include <unistd.h> > #include <spawn.h> > #include <stdlib.h> > #include <stdio.h> > > #define handle_error(msg) \ > do { perror(msg); exit(EXIT_FAILURE); } while (0) > > int main(int argc, char *argv[]) > { > sigset_t mask; > int sfd; > struct signalfd_siginfo fdsi; > ssize_t s; > > sigemptyset(&mask); > sigaddset(&mask, SIGCHLD); > > if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1) > handle_error("sigprocmask"); > > pid_t chldpid; > char *chldargv[] = { "./sfdclient", NULL }; > posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL); > > sfd = signalfd(-1, &mask, 0); > if (sfd == -1) > handle_error("signalfd"); > > for (;;) { > s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo)); > if (s != sizeof(struct signalfd_siginfo)) > handle_error("read"); > > if (fdsi.ssi_signo == SIGCHLD) { > printf("Got SIGCHLD %d %d %d %d\n", > fdsi.ssi_status, fdsi.ssi_code, > fdsi.ssi_uid, fdsi.ssi_pid); > return 0; > } else { > printf("Read unexpected signal\n"); > } > } > } > > > and a multi-threaded client to test with: > > #include <unistd.h> > #include <pthread.h> > > void *f(void *arg) > { > sleep(100); > } > > int main() > { > pthread_t t[8]; > > for (int i = 0; i != 8; ++i) > { > pthread_create(&t[i], NULL, f, NULL); > } > } > > I tried to do a bit of debugging and what seems to be happening is > that > > /* From an ancestor pid namespace? */ > if (!task_pid_nr_ns(current, task_active_pid_ns(t))) { > > fails inside task_pid_nr_ns because the check for "pid_alive" fails. > > This code seems to be called from do_notify_parent and there we > actually have "tsk != current" (I am assuming both are threads of the > current process?) I instrumented the code with a warning and received the following backtrace: > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15 > Modules linked in: > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15 > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046 > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29 > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001 > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140 > FS: 0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0 > Call Trace: > send_signal+0x1c8/0x310 > do_notify_parent+0x50f/0x550 > release_task.part.21+0x4fd/0x620 > do_exit+0x6f6/0xaf0 > do_group_exit+0x42/0xb0 > get_signal+0x13b/0xbb0 > do_signal+0x2b/0x670 > ? __audit_syscall_exit+0x24d/0x2b0 > ? rcu_read_lock_sched_held+0x4d/0x60 > ? kfree+0x24c/0x2b0 > do_syscall_64+0x176/0x640 > ? trace_hardirqs_off_thunk+0x1a/0x1c > entry_SYSCALL_64_after_hwframe+0x49/0xb3 The immediate problem is as Christof noticed that "pid_alive(current) == false". This happens because do_notify_parent is called from the last thread to exit in a process after that thread has been reaped. The bigger issue is that do_notify_parent can be called from any process that manages to wait on a thread of a multi-threaded process from wait_task_zombie. So any logic based upon current for do_notify_parent is just nonsense, as current can be pretty much anything. So change do_notify_parent to call __send_signal directly. Inspecting the code it appears this problem has existed since the pid namespace support started handling this case in 2.6.30. This fix only backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid") where the problem logic was moved out of __send_signal and into send_signal. Cc: stable@vger.kernel.org Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary") Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals") Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/ Reported-by: Christof Meerwald <cmeerw@cmeerw.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29perf/core: Disable page faults when getting phys addressJiri Olsa1-3/+6
[ Upstream commit d3296fb372bf7497b0e5d0478c4e7a677ec6f6e9 ] We hit following warning when running tests on kernel compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y: WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200 CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3 RIP: 0010:__get_user_pages_fast+0x1a4/0x200 ... Call Trace: perf_prepare_sample+0xff1/0x1d90 perf_event_output_forward+0xe8/0x210 __perf_event_overflow+0x11a/0x310 __intel_pmu_pebs_event+0x657/0x850 intel_pmu_drain_pebs_nhm+0x7de/0x11d0 handle_pmi_common+0x1b2/0x650 intel_pmu_handle_irq+0x17b/0x370 perf_event_nmi_handler+0x40/0x60 nmi_handle+0x192/0x590 default_do_nmi+0x6d/0x150 do_nmi+0x2f9/0x3c0 nmi+0x8e/0xd7 While __get_user_pages_fast() is IRQ-safe, it calls access_ok(), which warns on: WARN_ON_ONCE(!in_task() && !pagefault_disabled()) Peter suggested disabling page faults around __get_user_pages_fast(), which gets rid of the warning in access_ok() call. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200407141427.3184722-1-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29kernel/gcov/fs.c: gcov_seq_next() should increase position indexVasily Averin1-1/+1
[ Upstream commit f4d74ef6220c1eda0875da30457bef5c7111ab06 ] If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: NeilBrown <neilb@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <longman@redhat.com> Link: http://lkml.kernel.org/r/f65c6ee7-bd00-f910-2f8a-37cc67e4ff88@virtuozzo.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29dma-direct: fix data truncation in dma_direct_get_required_mask()Kishon Vijay Abraham I1-1/+2
[ Upstream commit cdcda0d1f8f4ab84efe7cd9921c98364398aefd7 ] The upper 32-bit physical address gets truncated inadvertently when dma_direct_get_required_mask() invokes phys_to_dma_direct(). This results in dma_addressing_limited() return incorrect value when used in platforms with LPAE enabled. Fix it here by explicitly type casting 'max_pfn' to phys_addr_t in order to prevent overflow of intermediate value while evaluating '(max_pfn - 1) << PAGE_SHIFT'. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23bpf: fix buggy r0 retval refinement for tracing helpersDaniel Borkmann1-11/+34
[ no upstream commit ] See the glory details in 100605035e15 ("bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly") for why 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") is buggy. The whole series however is not suitable for stable since it adds significant amount [0] of verifier complexity in order to add 32bit subreg tracking. Something simpler is needed. Unfortunately, reverting 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") or just cherry-picking 100605035e15 ("bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly") is not an option since it will break existing tracing programs badly (at least those that are using bpf_get_stack() and bpf_probe_read_str() helpers). Not fixing it in stable is also not an option since on 4.19 kernels an error will cause a soft-lockup due to hitting dead-code sanitized branch since we don't hard-wire such branches in old kernels yet. But even then for 5.x 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") would cause wrong bounds on the verifier simluation when an error is hit. In one of the earlier iterations of mentioned patch series for upstream there was the concern that just using smax_value in do_refine_retval_range() would nuke bounds by subsequent <<32 >>32 shifts before the comparison against 0 [1] which eventually led to the 32bit subreg tracking in the first place. While I initially went for implementing the idea [1] to pattern match the two shift operations, it turned out to be more complex than actually needed, meaning, we could simply treat do_refine_retval_range() similarly to how we branch off verification for conditionals or under speculation, that is, pushing a new reg state to the stack for later verification. This means, instead of verifying the current path with the ret_reg in [S32MIN, msize_max_value] interval where later bounds would get nuked, we split this into two: i) for the success case where ret_reg can be in [0, msize_max_value], and ii) for the error case with ret_reg known to be in interval [S32MIN, -1]. Latter will preserve the bounds during these shift patterns and can match reg < 0 test. test_progs also succeed with this approach. [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/ [1] https://lore.kernel.org/bpf/158015334199.28573.4940395881683556537.stgit@john-XPS-13-9370/T/#m2e0ad1d5949131014748b6daa48a3495e7f0456d Fixes: 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") Reported-by: Lorenzo Fontana <fontanalorenz@gmail.com> Reported-by: Leonardo Di Donato <leodidonato@gmail.com> Reported-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23locktorture: Print ratio of acquisitions, not failuresPaul E. McKenney1-4/+4
commit 80c503e0e68fbe271680ab48f0fe29bc034b01b7 upstream. The __torture_print_stats() function in locktorture.c carefully initializes local variable "min" to statp[0].n_lock_acquired, but then compares it to statp[i].n_lock_fail. Given that the .n_lock_fail field should normally be zero, and given the initialization, it seems reasonable to display the maximum and minimum number acquisitions instead of miscomputing the maximum and minimum number of failures. This commit therefore switches from failures to acquisitions. And this turns out to be not only a day-zero bug, but entirely my own fault. I hate it when that happens! Fixes: 0af3fe1efa53 ("locktorture: Add a lock-torture kernel module") Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23dma-debug: fix displaying of dma allocation typeGrygorii Strashko1-3/+6
commit 9bb50ed7470944238ec8e30a94ef096caf9056ee upstream. The commit 2e05ea5cdc1a ("dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs") removed "dma_debug_page" enum, but missed to update type2name string table. This causes incorrect displaying of dma allocation type. Fix it by removing "page" string from type2name string table and switch to use named initializers. Before (dma_alloc_coherent()): k3-ringacc 4b800000.ringacc: scather-gather idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable k3-ringacc 4b800000.ringacc: scather-gather idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable After: k3-ringacc 4b800000.ringacc: coherent idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable k3-ringacc 4b800000.ringacc: coherent idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable Fixes: 2e05ea5cdc1a ("dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23dma-coherent: fix integer overflow in the reserved-memory dma allocationKevin Grandemange1-6/+7
[ Upstream commit 286c21de32b904131f8cf6a36ce40b8b0c9c5da3 ] pageno is an int and the PAGE_SHIFT shift is done on an int, overflowing if the memory is bigger than 2G This can be reproduced using for example a reserved-memory of 4G reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; reserved_dma: buffer@0 { compatible = "shared-dma-pool"; no-map; reg = <0x5 0x00000000 0x1 0x0>; }; }; Signed-off-by: Kevin Grandemange <kevin.grandemange@allegrodvt.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-21tracing: Fix the race between registering 'snapshot' event trigger and ↵Xiao Yang1-7/+3
triggering 'snapshot' operation commit 0bbe7f719985efd9adb3454679ecef0984cb6800 upstream. Traced event can trigger 'snapshot' operation(i.e. calls snapshot_trigger() or snapshot_count_trigger()) when register_snapshot_trigger() has completed registration but doesn't allocate buffer for 'snapshot' event trigger. In the rare case, 'snapshot' operation always detects the lack of allocated buffer so make register_snapshot_trigger() allocate buffer first. trigger-snapshot.tc in kselftest reproduces the issue on slow vm: ----------------------------------------------------------- cat trace ... ftracetest-3028 [002] .... 236.784290: sched_process_fork: comm=ftracetest pid=3028 child_comm=ftracetest child_pid=3036 <...>-2875 [003] .... 240.460335: tracing_snapshot_instance_cond: *** SNAPSHOT NOT ALLOCATED *** <...>-2875 [003] .... 240.460338: tracing_snapshot_instance_cond: *** stopping trace here! *** ----------------------------------------------------------- Link: http://lkml.kernel.org/r/20200414015145.66236-1-yangx.jy@cn.fujitsu.com Cc: stable@vger.kernel.org Fixes: 93e31ffbf417a ("tracing: Add 'snapshot' event trigger command") Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17bpf: Fix tnum constraints for 32-bit comparisonsJann Horn1-36/+72
[ Upstream commit 604dca5e3af1db98bd123b7bfc02b017af99e3a0 ] The BPF verifier tried to track values based on 32-bit comparisons by (ab)using the tnum state via 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions"). The idea is that after a check like this: if ((u32)r0 > 3) exit We can't meaningfully constrain the arithmetic-range-based tracking, but we can update the tnum state to (value=0,mask=0xffff'ffff'0000'0003). However, the implementation from 581738a681b6 didn't compute the tnum constraint based on the fixed operand, but instead derives it from the arithmetic-range-based tracking. This means that after the following sequence of operations: if (r0 >= 0x1'0000'0001) exit if ((u32)r0 > 7) exit The verifier assumed that the lower half of r0 is in the range (0, 0) and apply the tnum constraint (value=0,mask=0xffff'ffff'0000'0000) thus causing the overall tnum to be (value=0,mask=0x1'0000'0000), which was incorrect. Provide a fixed implementation. Fixes: 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330160324.15259-3-daniel@iogearbox.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17kmod: make request_module() return an error when autoloading is disabledEric Biggers1-2/+2
commit d7d27cfc5cf0766a26a8f56868c5ad5434735126 upstream. Patch series "module autoloading fixes and cleanups", v5. This series fixes a bug where request_module() was reporting success to kernel code