summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-02-25uaccess: Add speculation barrier to copy_from_user()Dave Hansen1-0/+4
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream. The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively: if (access_ok(from, size)) // Right here even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated. But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down. "copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this: if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar); If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values. Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy. Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code. Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away. Reported-by: Jordy Zomer <jordyzomer@google.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-25random: always mix cycle counter in add_latent_entropy()Jason A. Donenfeld1-3/+3
[ Upstream commit d7bf7f3b813e3755226bcb5114ad2ac477514ebf ] add_latent_entropy() is called every time a process forks, in kernel_clone(). This in turn calls add_device_randomness() using the latent entropy global state. add_device_randomness() does two things: 2) Mixes into the input pool the latent entropy argument passed; and 1) Mixes in a cycle counter, a sort of measurement of when the event took place, the high precision bits of which are presumably difficult to predict. (2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n disables both (1) and (2), instead of just (2). This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_ entropy() is called. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: PaX Team <pageexec@freemail.hu> Cc: Emese Revfy <re.emese@gmail.com> Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-22hugetlb: check for undefined shift on 32 bit architecturesMike Kravetz1-1/+4
commit ec4288fe63966b26d53907212ecd05dfa81dd2cc upstream. Users can specify the hugetlb page size in the mmap, shmget and memfd_create system calls. This is done by using 6 bits within the flags argument to encode the base-2 logarithm of the desired page size. The routine hstate_sizelog() uses the log2 value to find the corresponding hugetlb hstate structure. Converting the log2 value (page_size_log) to potential hugetlb page size is the simple statement: 1UL << page_size_log Because only 6 bits are used for page_size_log, the left shift can not be greater than 63. This is fine on 64 bit architectures where a long is 64 bits. However, if a value greater than 31 is passed on a 32 bit architecture (where long is 32 bits) the shift will result in undefined behavior. This was generally not an issue as the result of the undefined shift had to exactly match hugetlb page size to proceed. Recent improvements in runtime checking have resulted in this undefined behavior throwing errors such as reported below. Fix by comparing page_size_log to BITS_PER_LONG before doing shift. Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/ Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Sasha Levin <sashal@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22net: phy: add macros for PHYID matchingHeiner Kallweit1-0/+4
[ Upstream commit aa2af2eb447c9a21c8c9e8d2336672bb620cf900 ] Add macros for PHYID matching to be used in PHY driver configs. By using these macros some boilerplate code can be avoided. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 69ff53e4a4c9 ("net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHY") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-22mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smapsMike Kravetz1-0/+13
commit 3489dbb696d25602aea8c3e669a6d43b76bd5358 upstream. Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs". This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue. [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/ This patch (of 2): A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table. page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added. [akpm@linux-foundation.org: simplification, per David] [akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06panic: Consolidate open-coded panic_on_warn checksKees Cook1-0/+1
commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream. Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll their own warnings, and each check "panic_on_warn". Consolidate this into a single function so that future instrumentation can be added in a single location. Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Gow <davidgow@google.com> Cc: tangmeng <tangmeng@uniontech.com> Cc: Jann Horn <jannh@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06exit: Add and use make_task_dead.Eric W. Biederman1-0/+1
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream. There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06sysctl: add a new register_sysctl_init() interfaceXiaoming Ni1-0/+3
commit 3ddd9a808cee7284931312f2f3e854c9617f44b2 upstream. Patch series "sysctl: first set of kernel/sysctl cleanups", v2. Finally had time to respin the series of the work we had started last year on cleaning up the kernel/sysct.c kitchen sink. People keeps stuffing their sysctls in that file and this creates a maintenance burden. So this effort is aimed at placing sysctls where they actually belong. I'm going to split patches up into series as there is quite a bit of work. This first set adds register_sysctl_init() for uses of registerting a sysctl on the init path, adds const where missing to a few places, generalizes common values so to be more easy to share, and starts the move of a few kernel/sysctl.c out where they belong. The majority of rework on v2 in this first patch set is 0-day fixes. Eric Biederman's feedback is later addressed in subsequent patch sets. I'll only post the first two patch sets for now. We can address the rest once the first two patch sets get completely reviewed / Acked. This patch (of 9): The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Today though folks heavily rely on tables on kernel/sysctl.c so they can easily just extend this table with their needed sysctls. In order to help users move their sysctls out we need to provide a helper which can be used during code initialization. We special-case the initialization use of register_sysctl() since it *is* safe to fail, given all that sysctls do is provide a dynamic interface to query or modify at runtime an existing variable. So the use case of register_sysctl() on init should *not* stop if the sysctls don't end up getting registered. It would be counter productive to stop boot if a simple sysctl registration failed. Provide a helper for init then, and document the recommended init levels to use for callers of this routine. We will later use this in subsequent patches to start slimming down kernel/sysctl.c tables and moving sysctl registration to the code which actually needs these sysctls. [mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ] Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Turner <pjt@google.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Qing Wang <wangqing@vivo.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18quota: Factor out setup of quota inodeJan Kara1-0/+2
[ Upstream commit c7d3d28360fdb3ed3a5aa0bab19315e0fdc994a1 ] Factor out setting up of quota inode and eventual error cleanup from vfs_load_quota_inode(). This will simplify situation for filesystems that don't have any quota inodes. Signed-off-by: Jan Kara <jack@suse.cz> Stable-dep-of: d32387748476 ("ext4: fix bug_on in __es_tree_search caused by bad quota inode") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18SUNRPC: ensure the matching upcall is in-flight upon downcallminoura makoto1-0/+5
[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ] Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid but different gss service") introduced `auth` argument to __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL since it (and auth->service) was not (yet) determined. When multiple upcalls with the same uid and different service are ongoing, it could happen that __gss_find_upcall(), which returns the first match found in the pipe->in_downcall list, could not find the correct gss_msg corresponding to the downcall we are looking for. Moreover, it might return a msg which is not sent to rpc.gssd yet. We could see mount.nfs process hung in D state with multiple mount.nfs are executed in parallel. The call trace below is of CentOS 7.9 kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ elrepo kernel-ml-6.0.7-1.el7. PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc] #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc] The scenario is like this. Let's say there are two upcalls for services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe. When rpc.gssd reads pipe to get the upcall msg corresponding to service B from pipe->pipe and then writes the response, in gss_pipe_downcall the msg corresponding to service A will be picked because only uid is used to find the msg and it is before the one for B in pipe->in_downcall. And the process waiting for the msg corresponding to service A will be woken up. Actual scheduing of that process might be after rpc.gssd processes the next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A). The process is scheduled to see gss_msg->ctx == NULL and gss_msg->msg.errno == 0, therefore it cannot break the loop in gss_create_upcall and is never woken up after that. This patch adds a simple check to ensure that a msg which is not sent to rpc.gssd yet is not chosen as the matching upcall upon receiving a downcall. Signed-off-by: minoura makoto <minoura@valinux.co.jp> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com> Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com> Cc: Trond Myklebust <trondmy@hammerspace.com> Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18ext4: fix deadlock due to mbcache entry corruptionJan Kara1-2/+7
[ Upstream commit a44e84a9b7764c72896f7241a0ec9ac7e7ef38dd ] When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely. The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead. This bug has been around for many years, but it became *much* easier to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr blocks"). Cc: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Fixes: 65f8b80053a1 ("ext4: fix race when reusing xattr blocks") Reported-and-tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: Thilo Fromm <t-lo@linux.microsoft.com> Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/ Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18mbcache: automatically delete entries from cache on freeingJan Kara1-8/+16
[ Upstream commit 307af6c879377c1c63e71cbdd978201f9c7ee8df ] Use the fact that entries with elevated refcount are not removed from the hash and just move removal of the entry from the hash to the entry freeing time. When doing this we also change the generic code to hold one reference to the cache entry, not two of them, which makes code somewhat more obvious. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: a44e84a9b776 ("ext4: fix deadlock due to mbcache entry corruption") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18mbcache: add functions to delete entry if unusedJan Kara1-1/+9
[ Upstream commit 3dc96bba65f53daa217f0a8f43edad145286a8f5 ] Add function mb_cache_entry_delete_or_get() to delete mbcache entry if it is unused and also add a function to wait for entry to become unused - mb_cache_entry_wait_unused(). We do not share code between the two deleting function as one of them will go away soon. CC: stable@vger.kernel.org Fixes: 82939d7999df ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: a44e84a9b776 ("ext4: fix deadlock due to mbcache entry corruption") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write()David Howells1-0/+2
[ Upstream commit c3d96f690a790074b508fe183a41e36a00cd7ddd ] Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write(). Also provide a fallback for proc_create_net_data_write(). Fixes: 564def71765c ("proc: Add a way to make network proc files writable") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFDZhang Qilong1-1/+1
[ Upstream commit fd4e60bf0ef8eb9edcfa12dda39e8b6ee9060492 ] Commit ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()") forgot to change int to __u64 in the CONFIG_EVENTFD=n stub function. Link: https://lkml.kernel.org/r/20221124140154.104680-1-zhangqilong3@huawei.com Fixes: ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Cc: Dylan Yudaken <dylany@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18debugfs: fix error when writing negative value to atomic_t debugfs fileAkinobu Mita1-2/+17
[ Upstream commit d472cf797c4e268613dbce5ec9b95d0bcae19ecb ] The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value for a debugfs file created by debugfs_create_atomic_t(). This restores the previous behaviour by introducing DEFINE_DEBUGFS_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-4-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed valueAkinobu Mita1-2/+10
[ Upstream commit 2e41f274f9aa71cdcc69dc1f26a3f9304a651804 ] Patch series "fix error when writing negative value to simple attribute files". The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), but some attribute files want to accept a negative value. This patch (of 3): The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value. This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18timerqueue: Use rb_entry_safe() in timerqueue_getnext()Barnabás Pőcze1-1/+1
[ Upstream commit 2f117484329b233455ee278f2d9b0a4356835060 ] When `timerqueue_getnext()` is called on an empty timer queue, it will use `rb_entry()` on a NULL pointer, which is invalid. Fix that by using `rb_entry_safe()` which handles NULL pointers. This has not caused any issues so far because the offset of the `rb_node` member in `timerqueue_node` is 0, so `rb_entry()` is essentially a no-op. Fixes: 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer") Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221114195421.342929-1-pobrn@protonmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18can: sja1000: fix size of OCR_MODE_MASK defineHeiko Schocher1-1/+1
[ Upstream commit 26e8f6a75248247982458e8237b98c9fb2ffcf9d ] bitfield mode in ocr register has only 2 bits not 3, so correct the OCR_MODE_MASK define. Signed-off-by: Heiko Schocher <hs@denx.de> Link: https://lore.kernel.org/all/20221123071636.2407823-1-hs@denx.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-14memcg: fix possible use-after-free in memcg_write_event_control()Tejun Heo1-0/+1
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream. memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> [3.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08mm: Fix '.data.once' orphan section warningNathan Chancellor1-1/+1
Portions of upstream commit a4055888629b ("mm/memcg: warning on !memcg after readahead page charged") were backported as commit cfe575954ddd ("mm: add VM_WARN_ON_ONCE_PAGE() macro"). Unfortunately, the backport did not account for the lack of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") in kernels prior to 5.10, resulting in the following orphan section warnings on PowerPC clang builds with CONFIG_DEBUG_VM=y: powerpc64le-linux-gnu-ld: warning: orphan section `".data.once"' from `mm/huge_memory.o' being placed in section `".data.once"' powerpc64le-linux-gnu-ld: warning: orphan section `".data.once"' from `mm/huge_memory.o' being placed in section `".data.once"' powerpc64le-linux-gnu-ld: warning: orphan section `".data.once"' from `mm/huge_memory.o' being placed in section `".data.once"' This is a difference between how clang and gcc handle macro stringification, which was resolved for the kernel by not stringifying the argument to the __section() macro. Since that change was deemed not suitable for the stable kernels by commit 59f89518f510 ("once: fix section mismatch on clang builds"), do that same thing as that change and remove the quotes from the argument to __section(). Fixes: cfe575954ddd ("mm: add VM_WARN_ON_ONCE_PAGE() macro") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23x86/bugs: Report AMD retbleed vulnerabilityAlexandre Chartre1-0/+2
commit 6b80b59b3555706508008f1f127b5412c89c7fd8 upstream. Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary Speculative Code Execution with Return Instructions) attack. [peterz: add hygon] [kim: invert parity; fam15h] Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> [cascardo: adjusted BUG numbers to match upstream] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> [suleiman: Remove hygon] Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23x86/cpu: Add a steppings field to struct x86_cpu_idMark Gross2-1/+7
commit e9d7144597b10ff13ff2264c059f7d4a7fbc89ac upstream. Intel uses the same family/model for several CPUs. Sometimes the stepping must be checked to tell them apart. On x86 there can be at most 16 steppings. Add a steppings bitmask to x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro and support for matching against family/model/stepping. [ bp: Massage. ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [cascardo: have steppings be the last member as there are initializers that don't use named members] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> [suleiman: vmx.c moved] Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23x86/devicetable: Move x86 specific macro out of generic codeThomas Gleixner1-3/+1
commit ba5bade4cc0d2013cdf5634dae554693c968a090 upstream. There is no reason that this gunk is in a generic header file. The wildcard defines need to stay as they are required by file2alias. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/20200320131508.736205164@linutronix.de Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [suleiman: vmx.c moved] Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"Suleiman Souhlal1-6/+0
This reverts commit 6f2f28e71e6af993761b7a70bd2402a8d2096acf. This is commit e9d7144597b10ff13ff2264c059f7d4a7fbc89ac upstream. Reverting this commit makes the following patches apply cleanly. This patch is then reapplied. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10linux/bits.h: make BIT(), GENMASK(), and friends available in assemblyMasahiro Yamada1-7/+10
commit 95b980d62d52c4c1768ee719e8db3efe27ef52b2 upstream. BIT(), GENMASK(), etc. are useful to define register bits of hardware. However, low-level code is often written in assembly, where they are not available due to the hard-coded 1UL, 0UL. In fact, in-kernel headers such as arch/arm64/include/asm/sysreg.h use _BITUL() instead of BIT() so that the register bit macros are available in assembly. Using macros in include/uapi/linux/const.h have two reasons: [1] For use in uapi headers We should use underscore-prefixed variants for user-space. [2] For use in assembly code Since _BITUL() uses UL(1) instead of 1UL, it can be used as an alternative of BIT(). For [2], it is pretty easy to change BIT() etc. for use in assembly. This allows to replace _BUTUL() in kernel-space headers with BIT(). Link: http://lkml.kernel.org/r/20190609153941.17249-1-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10efi: random: reduce seed size to 32 bytesArd Biesheuvel1-1/+1
commit 161a438d730dade2ba2b1bf8785f0759aba4ca5f upstream. We no longer need at least 64 bytes of random seed to permit the early crng init to complete. The RNG is now based on Blake2s, so reduce the EFI seed size to the Blake2s hash size, which is sufficient for our purposes. While at it, drop the READ_ONCE(), which was supposed to prevent size from being evaluated after seed was unmapped. However, this cannot actually happen, so READ_ONCE() is unnecessary here. Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-01once: fix section mismatch on clang buildsGreg Kroah-Hartman1-1/+1
On older kernels (5.4 and older), building the kernel with clang can cause the section name to end up with "" in them, which can cause lots of runtime issues as that is not normally a valid portion of the string. This was fixed up in newer kernels with commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") but that's too heavy-handed for older kernels. So for now, fix up the problem that commit 62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts") caused by being backported by removing the "" characters from the section definition. Reported-by: Oleksandr Tymoshenko <ovt@google.com> Reported-by: Yongqin Liu <yongqin.liu@linaro.org> Tested-by: Yongqin Liu <yongqin.liu@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/r/20221029011211.4049810-1-ovt@google.com Link: https://lore.kernel.org/r/CAMSo37XApZ_F5nSQYWFsSqKdMv_gBpfdKG3KN1TDB+QNXqSh0A@mail.gmail.com Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: Sasha Levin <sashal@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26iommu/iova: Fix module config properlyRobin Murphy1-1/+1
[ Upstream commit 4f58330fcc8482aa90674e1f40f601e82f18ed4a ] IOMMU_IOVA is intended to be an optional library for users to select as and when they desire. Since it can be a module now, this means that built-in code which has chosen not to select it should not fail to link if it happens to have selected as a module by someone else. Replace IS_ENABLED() with IS_REACHABLE() to do the right thing. CC: Thierry Reding <thierry.reding@gmail.com> Reported-by: John Garry <john.garry@huawei.com> Fixes: 15bbdec3931e ("iommu: Make the iova library a module") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/548c2f683ca379aface59639a8f0cccc3a1ac050.1663069227.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26ata: fix ata_id_has_dipm()Niklas Cassel1-11/+4
[ Upstream commit 630624cb1b5826d753ac8e01a0e42de43d66dedf ] ACS-5 section 7.13.6.36 Word 78: Serial ATA features supported states that: If word 76 is not 0000h or FFFFh, word 78 reports the features supported by the device. If this word is not supported, the word shall be cleared to zero. (This text also exists in really old ACS standards, e.g. ACS-3.) The problem with ata_id_has_dipm() is that the while it performs a check against 0 and 0xffff, it performs the check against ATA_ID_FEATURE_SUPP (word 78), the same word where the feature bit is stored. Fix this by performing the check against ATA_ID_SATA_CAPABILITY (word 76), like required by the spec. The feature bit check itself is of course still performed against ATA_ID_FEATURE_SUPP (word 78). Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros (which already have this check), thus making it more likely that the next ATA_ID_FEATURE_SUPP macro that is added will include this check. Fixes: ca77329fb713 ("[libata] Link power management infrastructure") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26ata: fix ata_id_has_ncq_autosense()Niklas Cassel1-2/+4
[ Upstream commit a5fb6bf853148974dbde092ec1bde553bea5e49f ] ACS-5 section 7.13.6.36 Word 78: Serial ATA features supported states that: If word 76 is not 0000h or FFFFh, word 78 reports the features supported by the device. If this word is not supported, the word shall be cleared to zero. (This text also exists in really old ACS standards, e.g. ACS-3.) Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros (which already have this check), thus making it more likely that the next ATA_ID_FEATURE_SUPP macro that is added will include this check. Fixes: 5b01e4b9efa0 ("libata: Implement NCQ autosense") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26ata: fix ata_id_has_devslp()Niklas Cassel1-1/+4
[ Upstream commit 9c6e09a434e1317e09b78b3b69cd384022ec9a03 ] ACS-5 section 7.13.6.36 Word 78: Serial ATA features supported states that: If word 76 is not 0000h or FFFFh, word 78 reports the features supported by the device. If this word is not supported, the word shall be cleared to zero. (This text also exists in really old ACS standards, e.g. ACS-3.) Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros (which already have this check), thus making it more likely that the next ATA_ID_FEATURE_SUPP macro that is added will include this check. Fixes: 65fe1f0f66a5 ("ahci: implement aggressive SATA device sleep support") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()Niklas Cassel1-4/+9
[ Upstream commit 690aa8c3ae308bc696ec8b1b357b995193927083 ] ACS-5 section 7.13.6.41 Words 85..87, 120: Commands and feature sets supported or enabled states that: If bit 15 of word 86 is set to one, bit 14 of word 119 is set to one, and bit 15 of word 119 is cleared to zero, then word 119 is valid. If bit 15 of word 86 is set to one, bit 14 of word 120 is set to one, and bit 15 of word 120 is cleared to zero, then word 120 is valid. (This text also exists in really old ACS standards, e.g. ACS-3.) Currently, ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting() both check bit 15 of word 86, but neither of them check that bit 14 of word 119 is set to one, or that bit 15 of word 119 is cleared to zero. Additionally, make ata_id_sense_reporting_enabled() return false if !ata_id_has_sense_reporting(), similar to how e.g. ata_id_flush_ext_enabled() returns false if !ata_id_has_flush_ext(). Fixes: e87fd28cf9a2 ("libata: Implement support for sense data reporting") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26dyndbg: fix module.dyndbg handlingJim Cromie1-1/+1
[ Upstream commit 85d6b66d31c35158364058ee98fb69ab5bb6a6b1 ] For CONFIG_DYNAMIC_DEBUG=N, the ddebug_dyndbg_module_param_cb() stub-fn is too permissive: bash-5.1# modprobe drm JUNKdyndbg bash-5.1# modprobe drm dyndbgJUNK [ 42.933220] dyndbg param is supported only in CONFIG_DYNAMIC_DEBUG builds [ 42.937484] ACPI: bus type drm_connector registered This caused no ill effects, because unknown parameters are either ignored by default with an "unknown parameter" warning, or ignored because dyndbg allows its no-effect use on non-dyndbg builds. But since the code has an explicit feedback message, it should be issued accurately. Fix with strcmp for exact param-name match. Fixes: b48420c1d301 dynamic_debug: make dynamic-debug work for module initialization Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Jason Baron <jbaron@akamai.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220904214134.408619-3-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26once: add DO_ONCE_SLOW() for sleepable contextsEric Dumazet1-0/+28
[ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ] Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-o