| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull more clk updates from Stephen Boyd:
"This is the final part of the clk patches for this merge window.
The clk rate range series needed another week to fully bake. Maxime
fixed the bug that broke clk notifiers and prevented this from being
included in the first pull request. He also added a unit test on top
to make sure it doesn't break so easily again. The majority of the
series fixes up how the clk_set_rate_*() APIs work, particularly
around when the rate constraints are dropped and how they move around
when reparenting clks. Overall it's a much needed improvement to the
clk rate range APIs that used to be pretty broken if you looked
sideways.
Beyond the core changes there are a few driver fixes for a compilation
issue or improper data causing clks to fail to register or have the
wrong parents. These are good to get in before the first -rc so that
the system actually boots on the affected devices"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
clk: tegra: Fix Tegra PWM parent clock
clk: at91: fix the build with binutils 2.27
clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
clk: mediatek: clk-mux: Add .determine_rate() callback
clk: tests: Add tests for notifiers
clk: Update req_rate on __clk_recalc_rates()
clk: tests: Add missing test case for ranges
clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
clk: Introduce the clk_hw_get_rate_range function
clk: Zero the clk_rate_request structure
clk: Stop forwarding clk_rate_requests to the parent
clk: Constify clk_has_parent()
clk: Introduce clk_core_has_parent()
clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
clk: Add our request boundaries in clk_core_init_rate_req
clk: Introduce clk_hw_init_rate_request()
clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
clk: Change clk_core_init_rate_req prototype
clk: Set req_rate on reparenting
clk: Take into account uncached clocks in clk_set_rate_range()
...
|
|
This reverts commit 78e5a3399421 ("cpumask: fix checking valid cpu range").
syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
cpu_max_bits_warn() [1], for commit 78e5a3399421 ("cpumask: fix checking
valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE()
when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of
"cpu" will trivially hit cpu == nr_cpumask_bits condition.
Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
this problem was not fixed immediately. As a result, that patch was
sent to linux.git before the patch author recognizes this problem, and
syzbot started failing to test changes in linux.git since 2022/10/10
[3].
Andrew Jones proposed a fix for x86 and riscv architectures [4]. But
[2] and [5] indicate that affected locations are not limited to arch
code. More delay before we find and fix affected locations, less tested
kernel (and more difficult to bisect and fix) before release.
We should have inspected and fixed basically all cpumask users before
applying that patch. We should not crash kernels in order to ask
existing cpumask users to update their code, even if limited to
CONFIG_DEBUG_PER_CPU_MAPS=y case.
Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4]
Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab hotfix from Vlastimil Babka:
"A single fix for the common-kmalloc series, for warnings on mips and
sparc64 reported by Guenter Roeck"
* tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
|
|
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.
SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.
Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.
Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are three fixes for build warnings that came in during the merge
window"
* tag 'arm-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: mmp: Make some symbols static
ARM: spear6xx: Staticize few definitions
clk: spear: Move prototype to accessible header
|
|
- Various clk rate range fixes
- Drop clk rate range constraints on clk_put() (redux)
* clk-rate-range: (28 commits)
clk: mediatek: clk-mux: Add .determine_rate() callback
clk: tests: Add tests for notifiers
clk: Update req_rate on __clk_recalc_rates()
clk: tests: Add missing test case for ranges
clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
clk: Introduce the clk_hw_get_rate_range function
clk: Zero the clk_rate_request structure
clk: Stop forwarding clk_rate_requests to the parent
clk: Constify clk_has_parent()
clk: Introduce clk_core_has_parent()
clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
clk: Add our request boundaries in clk_core_init_rate_req
clk: Introduce clk_hw_init_rate_request()
clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
clk: Change clk_core_init_rate_req prototype
clk: Set req_rate on reparenting
clk: Take into account uncached clocks in clk_set_rate_range()
clk: tests: Add some tests for orphan with multiple parents
clk: tests: Add tests for mux with multiple parents
clk: tests: Add tests for single parent mux
...
|
|
There is an issue when build with older versions of binutils 2.27.0,
arch/arm/mach-at91/pm_suspend.S: Assembler messages:
arch/arm/mach-at91/pm_suspend.S:1086: Error: garbage following instruction -- `ldr tmp1,=0x00020010UL'
Use UL() macro to fix the issue in assembly file.
Fixes: 4fd36e458392 ("ARM: at91: pm: add plla disable/enable support for sam9x60")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20221012030635.13140-1-wangkefeng.wang@huawei.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull PSI updates from Ingo Molnar:
- Various performance optimizations, resulting in a 4%-9% speedup in
the mmtests/config-scheduler-perfpipe micro-benchmark.
- New interface to turn PSI on/off on a per cgroup level.
* tag 'sched-psi-2022-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/psi: Per-cgroup PSI accounting disable/re-enable interface
sched/psi: Cache parent psi_group to speed up group iteration
sched/psi: Consolidate cgroup_psi()
sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure
sched/psi: Remove NR_ONCPU task accounting
sched/psi: Optimize task switch inside shared cgroups again
sched/psi: Move private helpers to sched/stats.h
sched/psi: Save percpu memory when !psi_cgroups_enabled
sched/psi: Don't create cgroup PSI files when psi_disabled
sched/psi: Fix periodic aggregation shut off
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- fix a race which causes page refcounting errors in ZONE_DEVICE pages
(Alistair Popple)
- fix userfaultfd test harness instability (Peter Xu)
- various other patches in MM, mainly fixes
* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
highmem: fix kmap_to_page() for kmap_local_page() addresses
mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
mm/selftest: uffd: explain the write missing fault check
mm/hugetlb: use hugetlb_pte_stable in migration race check
mm/hugetlb: fix race condition of uffd missing/minor handling
zram: always expose rw_page
LoongArch: update local TLB if PTE entry exists
mm: use update_mmu_tlb() on the second thread
kasan: fix array-bounds warnings in tests
hmm-tests: add test for migrate_device_range()
nouveau/dmem: evict device private memory during release
nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
mm/migrate_device.c: add migrate_device_range()
mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
mm/memremap.c: take a pgmap reference on page allocation
mm: free device private pages have zero refcount
mm/memory.c: fix race when faulting a device private page
mm/damon: use damon_sz_region() in appropriate place
mm/damon: move sz_damon_region to damon_sz_region
lib/test_meminit: add checks for the allocation functions
...
|
|
Pull more block updates from Jens Axboe:
"Fixes that ended up landing later than the initial block pull request.
Nothing really major in here:
- NVMe pull request via Christoph:
- add NVME_QUIRK_BOGUS_NID for Lexar NM760 (Abhijit)
- add NVME_QUIRK_NO_DEEPEST_PS to avoid the deepest sleep state
on ZHITAI TiPro5000 SSDs (Xi Ruoyao)
- fix possible hang caused during ctrl deletion (Sagi Grimberg)
- fix possible hang in live ns resize with ANA access (Sagi
Grimberg)
- Proactively avoid a sign extension issue with the queue flags
(Brian)
- Regression fix for hidden disks (Christoph)
- Update OPAL maintainers entry (Jonathan)
- blk-wbt regression initialization fix (Yu)"
* tag 'block-6.1-2022-10-13' of git://git.kernel.dk/linux:
nvme-multipath: fix possible hang in live ns resize with ANA access
nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
nvme-tcp: fix possible hang caused during ctrl deletion
nvme-rdma: fix possible hang caused during ctrl deletion
block: fix leaking minors of hidden disks
block: avoid sign extend problem with default queue flags mask
blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
block: Remove the repeat word 'can'
MAINTAINERS: Update SED-Opal Maintainers
|
|
Pull more io_uring updates from Jens Axboe:
"A collection of fixes that ended up either being later than the
initial pull, or dependent on multiple branches (6.0-late being one of
them) and hence deferred purposely. This contains:
- Cleanup fixes for the single submitter late 6.0 change, which we
pushed to 6.1 to keep the 6.0 changes small (Dylan, Pavel)
- Fix for IORING_OP_CONNECT not handling -EINPROGRESS correctly (me)
- Ensure that the zc sendmsg variant gets audited correctly (me)
- Regression fix from this merge window where kiocb_end_write()
doesn't always gets called, which can cause issues with fs freezing
(me)
- Registered files SCM handling fix (Pavel)
- Regression fix for big sqe dumping in fdinfo (Pavel)
- Registered buffers accounting fix (Pavel)
- Remove leftover notification structures, we killed them off late in
6.0 (Pavel)
- Minor optimizations (Pavel)
- Cosmetic variable shadowing fix (Stefan)"
* tag 'io_uring-6.1-2022-10-13' of git://git.kernel.dk/linux:
io_uring/rw: ensure kiocb_end_write() is always called
io_uring: fix fdinfo sqe offsets calculation
io_uring: local variable rw shadows outer variable in io_write
io_uring/opdef: remove 'audit_skip' from SENDMSG_ZC
io_uring: optimise locking for local tw with submit_wait
io_uring: remove redundant memory barrier in io_req_local_work_add
io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT
io_uring: remove notif leftovers
io_uring: correct pinned_vm accounting
io_uring/af_unix: defer registered files gc to io_uring release
io_uring: limit registration w/ SINGLE_ISSUER
io_uring: remove io_register_submitter
io_uring: simplify __io_uring_add_tctx_node
|
|
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Add SD card quirk for broken discard
MMC host:
- renesas_sdhi: Fix clock rounding errors
- sdhci-sprd: Fix minimum clock limit to detect cards
- sdhci-tegra: Use actual clock rate for SW tuning correction"
* tag 'mmc-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-sprd: Fix minimum clock limit
mmc: sdhci-tegra: Use actual clock rate for SW tuning correction
mmc: renesas_sdhi: Fix rounding errors
mmc: core: Add SD card quirk for broken discard
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, and wifi.
Current release - regressions:
- Revert "net/sched: taprio: make qdisc_leaf() see the
per-netdev-queue pfifo child qdiscs", it may cause crashes when the
qdisc is reconfigured
- inet: ping: fix splat due to packet allocation refactoring in inet
- tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF
due to races when per-netns hash table is used
Current release - new code bugs:
- eth: adin1110: check in netdev_event that netdev belongs to driver
- fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co.
Previous releases - regressions:
- ipv4: handle attempt to delete multipath route when fib_info
contains an nh reference, avoid oob access
- wifi: fix handful of bugs in the new Multi-BSSID code
- wifi: mt76: fix rate reporting / throughput regression on mt7915
and newer, fix checksum offload
- wifi: iwlwifi: mvm: fix double list_add at
iwl_mvm_mac_wake_tx_queue (other cases)
- wifi: mac80211: do not drop packets smaller than the LLC-SNAP
header on fast-rx
Previous releases - always broken:
- ieee802154: don't warn zero-sized raw_sendmsg()
- ipv6: ping: fix wrong checksum for large frames
- mctp: prevent double key removal and unref
- tcp/udp: fix memory leaks and races around IPV6_ADDRFORM
- hv_netvsc: fix race between VF offering and VF association message
Misc:
- remove -Warray-bounds silencing in the drivers, compilers fixed"
* tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
sunhme: fix an IS_ERR() vs NULL check in probe
net: marvell: prestera: fix a couple NULL vs IS_ERR() checks
kcm: avoid potential race in kcm_tx_work
tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
net: phy: micrel: Fixes FIELD_GET assertion
openvswitch: add nf_ct_is_confirmed check before assigning the helper
tcp: Fix data races around icsk->icsk_af_ops.
ipv6: Fix data races around sk->sk_prot.
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
tcp/udp: Fix memory leak in ipv6_renew_options().
mctp: prevent double key removal and unref
selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
netfilter: rpfilter/fib: Populate flowic_l3mdev field
selftests: netfilter: Test reverse path filtering
net/mlx5: Make ASO poll CQ usable in atomic context
tcp: cdg: allow tcp_cdg_release() to be called multiple times
inet: ping: fix recent breakage
ipv6: ping: fix wrong checksum for large frames
net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
...
|
|
Pull ceph updates from Ilya Dryomov:
"A quiet round this time: several assorted filesystem fixes, the most
noteworthy one being some additional wakeups in cap handling code, and
a messenger cleanup"
* tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client:
ceph: remove Sage's git tree from documentation
ceph: fix incorrectly showing the .snap size for stat
ceph: fail the open_by_handle_at() if the dentry is being unlinked
ceph: increment i_version when doing a setattr with caps
ceph: Use kcalloc for allocating multiple elements
ceph: no need to wait for transition RDCACHE|RD -> RD
ceph: fail the request if the peer MDS doesn't support getvxattr op
ceph: wake up the waiters if any new caps comes
libceph: drop last_piece flag from ceph_msg_data_cursor
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add NFSv4.2 xattr tracepoints
- Replace xprtiod WQ in rpcrdma
- Flexfiles cancels I/O on layout recall or revoke
Bugfixes and Cleanups:
- Directly use ida_alloc() / ida_free()
- Don't open-code max_t()
- Prefer using strscpy over strlcpy
- Remove unused forward declarations
- Always return layout states on flexfiles layout return
- Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of
error
- Allow more xprtrdma memory allocations to fail without triggering a
reclaim
- Various other xprtrdma clean ups
- Fix rpc_killall_tasks() races"
* tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked
SUNRPC: Add API to force the client to disconnect
SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls
SUNRPC: Fix races with rpc_killall_tasks()
xprtrdma: Fix uninitialized variable
xprtrdma: Prevent memory allocations from driving a reclaim
xprtrdma: Memory allocation should be allowed to fail during connect
xprtrdma: MR-related memory allocation should be allowed to fail
xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc()
xprtrdma: Clean up synopsis of rpcrdma_req_create()
svcrdma: Clean up RPCRDMA_DEF_GFP
SUNRPC: Replace the use of the xprtiod WQ in rpcrdma
NFSv4.2: Add a tracepoint for listxattr
NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr
NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2
NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR
nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration
NFSv4: remove nfs4_renewd_prepare_shutdown() declaration
fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment
NFSv4/pNFS: Always return layout stats on layout return for flexfiles
...
|
|
Device drivers can use the migrate_vma family of functions to migrate
existing private anonymous mappings to device private pages. These pages
are backed by memory on the device with drivers being responsible for
copying data to and from device memory.
Device private pages are freed via the pgmap->page_free() callback when
they are unmapped and their refcount drops to zero. Alternatively they
may be freed indirectly via migration back to CPU memory in response to a
pgmap->migrate_to_ram() callback called whenever the CPU accesses an
address mapped to a device private page.
In other words drivers cannot control the lifetime of data allocated on
the devices and must wait until these pages are freed from userspace.
This causes issues when memory needs to reclaimed on the device, either
because the device is going away due to a ->release() callback or because
another user needs to use the memory.
Drivers could use the existing migrate_vma functions to migrate data off
the device. However this would require them to track the mappings of each
page which is both complicated and not always possible. Instead drivers
need to be able to migrate device pages directly so they can free up
device memory.
To allow that this patch introduces the migrate_device family of functions
which are functionally similar to migrate_vma but which skips the initial
lookup based on mapping.
Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use. However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.
This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount. Instead we should return
pages to the drivers page allocator with a zero reference count. Kernel
code can then safely use kernel functions such as get_page_unless_zero().
Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fix several device private page reference counting issues",
v2
This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages. These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.
During normal usage it is unlikely these will cause any problems. However
without these fixes it is possible to crash the kernel from userspace.
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting.
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.
This involves some minor changes to both PowerPC and AMD GPU code.
Unfortunately I lack hardware to test either of those so any help there
would be appreciated. The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.
This patch (of 8):
When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called. However no
reference is taken on the faulting page. Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap. This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid. It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().
Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed. Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail. To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.
[mpe@ellerman.id.au: fix build]
Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Rename sz_damon_region() to damon_sz_region(), and move it to
"include/linux/damon.h", because in many places, we can to use this func.
Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Link: https://lkml.kernel.org/r/Y0WuE3Riv4iy5Jx8@localhost.localdomain
Fixes: 7964cf8caa4d ("mm: remove vmacache")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Notifications were killed but there is a couple of fields and struct
declarations left, remove them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of putting io_uring's registered files in unix_gc() we want it
to be done by io_uring itself. The trick here is to consider io_uring
registered files for cycle detection but not actually putting them down.
Because io_uring can't register other ring instances, this will remove
all refs to the ring file triggering the ->release path and clean up
with io_ring_ctx_free().
Cc: stable@vger.kernel.org
Fixes: 6b06314c47e1 ("io_uring: add file set registration")
Reported-and-tested-by: David Bouman <dbouman03@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[axboe: add kerneldoc comment to skb, fold in skb leak fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull VFIO updates from Alex Williamson:
- Prune private items from vfio_pci_core.h to a new internal header,
fix missed function rename, and refactor vfio-pci interrupt defines
(Jason Gunthorpe)
- Create consistent naming and handling of ioctls with a function per
ioctl for vfio-pci and vfio group handling, use proper type args
where available (Jason Gunthorpe)
- Implement a set of low power device feature ioctls allowing userspace
to make use of power states such as D3cold where supported (Abhishek
Sahu)
- Remove device counter on vfio groups, which had restricted the page
pinning interface to singleton groups to account for limitations in
the type1 IOMMU backend. Document usage as limited to emulated IOMMU
devices, ie. traditional mdev devices where this restriction is
consistent (Jason Gunthorpe)
- Correct function prefix in hisi_acc driver incurred during previous
refactoring (Shameer Kolothum)
- Correct typo and remove redundant warning triggers in vfio-fsl driver
(Christophe JAILLET)
- Introduce device level DMA dirty tracking uAPI and implementation in
the mlx5 variant driver (Yishai Hadas & Joao Martins)
- Move much of the vfio_device life cycle management into vfio core,
simplifying and avoiding duplication across drivers. This also
facilitates adding a struct device to vfio_device which begins the
introduction of device rather than group level user support and fills
a gap allowing userspace identify devices as vfio capable without
implicit knowledge of the driver (Kevin Tian & Yi Liu)
- Split vfio container handling to a separate file, creating a more
well defined API between the core and container code, masking IOMMU
backend implementation from the core, allowing for an easier future
transition to an iommufd based implementation of the same (Jason
Gunthorpe)
- Attempt to resolve race accessing the iommu_group for a device
between vfio releasing DMA ownership and removal of the device from
the IOMMU driver. Follow-up with support to allow vfio_group to exist
with NULL iommu_group pointer to support existing userspace use cases
of holding the group file open (Jason Gunthorpe)
- Fix error code and hi/lo register manipulation issues in the hisi_acc
variant driver, along with various code cleanups (Longfang Liu)
- Fix a prior regression in GVT-g group teardown, resulting in
unreleased resources (Jason Gunthorpe)
- A significant cleanup and simplification of the mdev interface,
consolidating much of the open coded per driver sysfs interface
support into the mdev core (Christoph Hellwig)
- Simplification of tracking and locking around vfio_groups that fall
out from previous refactoring (Jason Gunthorpe)
- Replace trivial open coded f_ops tests with new helper (Alex
Williamson)
* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
vfio: More vfio_file_is_group() use cases
vfio: Make the group FD disassociate from the iommu_group
vfio: Hold a reference to the iommu_group in kvm for SPAPR
vfio: Add vfio_file_is_group()
vfio: Change vfio_group->group_rwsem to a mutex
vfio: Remove the vfio_group->users and users_comp
vfio/mdev: add mdev available instance checking to the core
vfio/mdev: consolidate all the description sysfs into the core code
vfio/mdev: consolidate all the available_instance sysfs into the core code
vfio/mdev: consolidate all the name sysfs into the core code
vfio/mdev: consolidate all the device_api sysfs into the core code
vfio/mdev: remove mtype_get_parent_dev
vfio/mdev: remove mdev_parent_dev
vfio/mdev: unexport mdev_bus_type
vfio/mdev: remove mdev_from_dev
vfio/mdev: simplify mdev_type handling
vfio/mdev: embedd struct mdev_parent in the parent data structure
vfio/mdev: make mdev.h standalone includable
drm/i915/gvt: simplify vgpu configuration management
drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one
is not"
* tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
nilfs2: fix use-after-free bug of struct nilfs_root
mm/damon/core: initialize damon_target->list in damon_new_target()
mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt updates from Thomas Gleixner:
"Core code:
- Provide a generic wrapper which can be utilized in drivers to
handle the problem of force threaded demultiplex interrupts on RT
enabled kernels. This avoids conditionals and horrible quirks in
drivers all over the place
- Fix up affected pinctrl and GPIO drivers to make them cleanly RT
safe
Interrupt drivers:
- A new driver for the FSL MU platform specific MSI implementation
- Make irqchip_init() available for pure ACPI based systems
- Provide a functional DT binding for the Realtek RTL interrupt chip
- The usual DT updates and small code improvements all over the
place"
* tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
irqchip: IMX_MU_MSI should depend on ARCH_MXC
irqchip/imx-mu-msi: Fix wrong register offset for 8ulp
irqchip/ls-extirq: Fix invalid wait context by avoiding to use regmap
dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
irqchip: Add IMX MU MSI controller driver
dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support
irqchip/gic-v3: Fix typo in comment
dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding
dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells
irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END
platform-msi: Export symbol platform_msi_create_irq_domain()
irqchip/realtek-rtl: use parent interrupts
dt-bindings: interrupt-controller: realtek,rtl-intc: require parents
irqchip/realtek-rtl: use irq_domain_add_linear()
irqchip: Make irqchip_init() usable on pure ACPI systems
bcma: gpio: Use generic_handle_irq_safe()
gpio: mlxbf2: Use generic_handle_irq_safe()
platform/x86: intel_int0002_vgpio: Use generic_handle_irq_safe()
ssb: gpio: Use generic_handle_irq_safe()
pinctrl: amd: Use generic_handle_irq_safe()
...
|
|
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
arch_do_signal_or_restart() prototype
The argument has_signal of arch_do_signal_or_restart() has been removed in
commit 8ba62d37949e ("task_work: Call tracehook_notify_signal from
get_signal on all architectures"), let us remove the related comment.
Link: https://lkml.kernel.org/r/1662090106-5545-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 8ba62d37949e ("task_work: Call tracehook_notify_signal from get_signal on all architectures")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:
@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)
@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@
- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)
@collapse_ret@
type T;
identifier VAR;
expression E;
@@
{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}
@drop_var@
type T;
identifier VAR;
@@
{
- T VAR;
... when != VAR
}
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Resource management:
- Distribute spare |