summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-05-19Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of ↵Linus Torvalds6-11/+61
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: "Mainly singleton patches, documented in their respective changelogs. Notable series include: - Some maintenance and performance work for ocfs2 in Heming Zhao's series "improve write IO performance when fragmentation is high". - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes exposed by fstests". - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean up kfifo.h". - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes for $lx_current and $lx_per_cpu". - After much discussion, a coding-style update from Barry Song explaining one reason why inline functions are preferred over macros. The series is "codingstyle: avoid unused parameters for a function-like macro"" * tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits) fs/proc: fix softlockup in __read_vmcore nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON() scripts: checkpatch: check unused parameters for function-like macro Documentation: coding-style: ask function-like macros to evaluate parameters nilfs2: use __field_struct() for a bitwise field selftests/kcmp: remove unused open mode nilfs2: remove calls to folio_set_error() and folio_clear_error() kernel/watchdog_perf.c: tidy up kerneldoc watchdog: allow nmi watchdog to use raw perf event watchdog: handle comma separated nmi_watchdog command line nilfs2: make superblock data array index computation sparse friendly squashfs: remove calls to set the folio error flag squashfs: convert squashfs_symlink_read_folio to use folio APIs scripts/gdb: fix detection of current CPU in KGDB scripts/gdb: make get_thread_info accept pointers scripts/gdb: fix parameter handling in $lx_per_cpu scripts/gdb: fix failing KGDB detection during probe kfifo: don't use "proxy" headers media: stih-cec: add missing io.h media: rc: add missing io.h ...
2024-05-19Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefsLinus Torvalds1-0/+12
Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ...
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds56-798/+1539
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-18Merge tag 'ext4_for_linus-6.10-rc1' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - more folio conversion patches - add support for FS_IOC_GETFSSYSFSPATH - mballoc cleaups and add more kunit tests - sysfs cleanups and bug fixes - miscellaneous bug fixes and cleanups * tag 'ext4_for_linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: fix error pointer dereference in ext4_mb_load_buddy_gfp() jbd2: add prefix 'jbd2' for 'shrink_type' jbd2: use shrink_type type instead of bool type for __jbd2_journal_clean_checkpoint_list() ext4: fix uninitialized ratelimit_state->lock access in __ext4_fill_super() ext4: remove calls to to set/clear the folio error flag ext4: propagate errors from ext4_sb_bread() in ext4_xattr_block_cache_find() ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find() jbd2: remove redundant assignement to variable err ext4: remove the redundant folio_wait_stable() ext4: fix potential unnitialized variable ext4: convert ac_buddy_page to ac_buddy_folio ext4: convert ac_bitmap_page to ac_bitmap_folio ext4: convert ext4_mb_init_cache() to take a folio ext4: convert bd_buddy_page to bd_buddy_folio ext4: convert bd_bitmap_page to bd_bitmap_folio ext4: open coding repeated check in next_linear_group ext4: use correct criteria name instead stale integer number in comment ext4: call ext4_mb_mark_free_simple to free continuous bits in found chunk ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used ext4: keep "prefetch_grp" and "nr" consistent ...
2024-05-18Merge tag 'nfsd-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds2-0/+11
Pull nfsd updates from Chuck Lever: "This is a light release containing mostly optimizations, code clean- ups, and minor bug fixes. This development cycle has focused on non- upstream kernel work: 1. Continuing to build upstream CI for NFSD, based on kdevops 2. Backporting NFSD filecache-related fixes to selected LTS kernels One notable new feature in v6.10 NFSD is the addition of a new netlink protocol dedicated to configuring NFSD. A new user space tool, nfsdctl, is to be added to nfs-utils. Lots more to come here. As always I am very grateful to NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (29 commits) NFSD: Force all NFSv4.2 COPY requests to be synchronous SUNRPC: Fix gss_free_in_token_pages() NFS/knfsd: Remove the invalid NFS error 'NFSERR_OPNOTSUPP' knfsd: LOOKUP can return an illegal error value nfsd: set security label during create operations NFSD: Add COPY status code to OFFLOAD_STATUS response NFSD: Record status of async copy operation in struct nfsd4_copy SUNRPC: Remove comment for sp_lock NFSD: add listener-{set,get} netlink command SUNRPC: add a new svc_find_listener helper SUNRPC: introduce svc_xprt_create_from_sa utility routine NFSD: add write_version to netlink command NFSD: convert write_threads to netlink command NFSD: allow callers to pass in scope string to nfsd_svc NFSD: move nfsd_mutex handling into nfsd_svc callers lockd: host: Remove unnecessary statements'host = NULL;' nfsd: don't create nfsv4recoverydir in nfsdfs when not used. nfsd: optimise recalculate_deny_mode() for a common case nfsd: add tracepoint in mark_client_expired_locked nfsd: new tracepoint for check_slot_seqid ...
2024-05-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+7
Pull rdma updates from Jason Gunthorpe: "Aside from the usual things this has an arch update for __iowrite64_copy() used by the RDMA drivers. This API was intended to generate large 64 byte MemWr TLPs on PCI. These days most processors had done this by just repeating writel() in a loop. S390 and some new ARM64 designs require a special helper to get this to generate. - Small improvements and fixes for erdma, efa, hfi1, bnxt_re - Fix a UAF crash after module unload on leaking restrack entry - Continue adding full RDMA support in mana with support for EQs, GID's and CQs - Improvements to the mkey cache in mlx5 - DSCP traffic class support in hns and several bug fixes - Cap the maximum number of MADs in the receive queue to avoid OOM - Another batch of rxe bug fixes from large scale testing - __iowrite64_copy() optimizations for write combining MMIO memory - Remove NULL checks before dev_put/hold() - EFA support for receive with immediate - Fix a recent memleaking regression in a cma error path" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits) RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw RDMA/IPoIB: Fix format truncation compilation errors bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq RDMA/efa: Support QP with unsolicited write w/ imm. receive IB/hfi1: Remove generic .ndo_get_stats64 IB/hfi1: Do not use custom stat allocator RDMA/hfi1: Use RMW accessors for changing LNKCTL2 RDMA/mana_ib: implement uapi for creation of rnic cq RDMA/mana_ib: boundary check before installing cq callbacks RDMA/mana_ib: introduce a helper to remove cq callbacks RDMA/mana_ib: create and destroy RNIC cqs RDMA/mana_ib: create EQs for RNIC CQs RDMA/core: Remove NULL check before dev_{put, hold} RDMA/ipoib: Remove NULL check before dev_{put, hold} RDMA/mlx5: Remove NULL check before dev_{put, hold} RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources. RDMA/core: Add an option to display driver-specific QPs in the rdmatool RDMA/efa: Add shutdown notifier RDMA/mana_ib: Fix missing ret value IB/mlx5: Use __iowrite64_copy() for write combining stores ...
2024-05-18Merge tag 'for-v6.10' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: - core: simplify POWER_SUPPLY_PROP_CHARGE_BEHAVIOUR handling - test-power: add POWER_SUPPLY_PROP_CHARGE_BEHAVIOUR support - chrome EC drivers: add ID based probing - bq27xxx: simplify update loop to reduce I2C traffic - max8903 binding: fix GPIO polarity description * tag 'for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: dt-bindings: power: supply: max8903: specify flt-gpios as input power: supply: bq27xxx: Move health reading out of update loop power: supply: bq27xxx: Move cycle count reading out of update loop power: supply: bq27xxx: Move energy reading out of update loop power: supply: bq27xxx: Move charge reading out of update loop power: supply: bq27xxx: Move time reading out of update loop power: supply: bq27xxx: Move temperature reading out of update loop power: supply: cros_pchg: provide ID table for avoiding fallback match power: supply: cros_usbpd: provide ID table for avoiding fallback match power: supply: core: simplify charge_behaviour formatting power: supply: test-power: implement charge_behaviour property
2024-05-18Merge tag 'iommu-updates-v6.10' of ↵Linus Torvalds6-18/+33
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core: - IOMMU memory usage observability - This will make the memory used for IO page tables explicitly visible. - Simplify arch_setup_dma_ops() Intel VT-d: - Consolidate domain cache invalidation - Remove private data from page fault message - Allocate DMAR fault interrupts locally - Cleanup and refactoring ARM-SMMUv2: - Support for fault debugging hardware on Qualcomm implementations - Re-land support for the ->domain_alloc_paging() callback ARM-SMMUv3: - Improve handling of MSI allocation failure - Drop support for the "disable_bypass" cmdline option - Major rework of the CD creation code, following on directly from the STE rework merged last time around. - Add unit tests for the new STE/CD manipulation logic AMD-Vi: - Final part of SVA changes with generic IO page fault handling Renesas IPMMU: - Add support for R8A779H0 hardware ... and a couple smaller fixes and updates across the sub-tree" * tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits) iommu/arm-smmu-v3: Make the kunit into a module arm64: Properly clean up iommu-dma remnants iommu/amd: Enable Guest Translation after reading IOMMU feature register iommu/vt-d: Decouple igfx_off from graphic identity mapping iommu/amd: Fix compilation error iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() iommu/arm-smmu-v3: Move the CD generation for SVA into a function iommu/arm-smmu-v3: Allocate the CD table entry in advance iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() iommu/arm-smmu-v3: Consolidate clearing a CD table entry iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() iommu/arm-smmu-v3: Add an ops indirection to the STE code iommu/arm-smmu-qcom: Don't build debug features as a kernel module iommu/amd: Add SVA domain support iommu: Add ops->domain_alloc_sva() iommu/amd: Initial SVA support for AMD IOMMU iommu/amd: Add support for enable/disable IOPF iommu/amd: Add IO page fault notifier handler ...
2024-05-18Merge tag 'net-accept-more-20240515' of git://git.kernel.dk/linuxLinus Torvalds2-2/+5
Pull more io_uring updates from Jens Axboe: "This adds support for IORING_CQE_F_SOCK_NONEMPTY for io_uring accept requests. This is very similar to previous work that enabled the same hint for doing receives on sockets. By far the majority of the work here is refactoring to enable the networking side to pass back whether or not the socket had more pending requests after accepting the current one, the last patch just wires it up for io_uring. Not only does this enable applications to know whether there are more connections to accept right now, it also enables smarter logic for io_uring multishot accept on whether to retry immediately or wait for a poll trigger" * tag 'net-accept-more-20240515' of git://git.kernel.dk/linux: io_uring/net: wire up IORING_CQE_F_SOCK_NONEMPTY for accept net: pass back whether socket was empty post accept net: have do_accept() take a struct proto_accept_arg argument net: change proto and proto_ops accept type
2024-05-17Merge tag 'trace-ringbuffer-v6.10' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing ring buffer updates from Steven Rostedt: "Add ring_buffer memory mappings. The tracing ring buffer was created based on being mostly used with the splice system call. It is broken up into page ordered sub-buffers and the reader swaps a new sub-buffer with an existing sub-buffer that's part of the write buffer. It then has total access to the swapped out sub-buffer and can do copyless movements of the memory into other mediums (file system, network, etc). The buffer is great for passing around the ring buffer contents in the kernel, but is not so good for when the consumer is the user space task itself. A new interface is added that allows user space to memory map the ring buffer. It will get all the write sub-buffers as well as reader sub-buffer (that is not written to). It can send an ioctl to change which sub-buffer is the new reader sub-buffer. The ring buffer is read only to user space. It only needs to call the ioctl when it is finished with a sub-buffer and needs a new sub-buffer that the writer will not write over. A self test program was also created for testing and can be used as an example for the interface to user space. The libtracefs (external to the kernel) also has code that interacts with this, although it is disabled until the interface is in a official release. It can be enabled by compiling the library with a special flag. This was used for testing applications that perform better with the buffer being mapped. Memory mapped buffers have limitations. The main one is that it can not be used with the snapshot logic. If the buffer is mapped, snapshots will be disabled. If any logic is set to trigger snapshots on a buffer, that buffer will not be allowed to be mapped" * tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Add cast to unsigned long addr passed to virt_to_page() ring-buffer: Have mmapped ring buffer keep track of missed events ring-buffer/selftest: Add ring-buffer mapping test Documentation: tracing: Add ring-buffer mapping tracing: Allow user-space mapping of the ring-buffer ring-buffer: Introducing ring-buffer mapping functions ring-buffer: Allocate sub-buffers with __GFP_COMP
2024-05-17Merge tag 'trace-v6.10' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Remove unused ftrace_direct_funcs variables - Fix a possible NULL pointer dereference race in eventfs - Update do_div() usage in trace event benchmark test - Speedup direct function registration with asynchronous RCU callback. The synchronization was done in the registration code and this caused delays when registering direct callbacks. Move the freeing to a call_rcu() that will prevent delaying of the registering. - Replace simple_strtoul() usage with kstrtoul() * tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix a possible null pointer dereference in eventfs_find_events() ftrace: Fix possible use-after-free issue in ftrace_location() ftrace: Remove unused global 'ftrace_direct_func_count' ftrace: Remove unused list 'ftrace_direct_funcs' tracing: Improve benchmark test performance by using do_div() ftrace: Use asynchronous grace period for register_ftrace_direct() ftrace: Replaces simple_strtoul in ftrace
2024-05-17Merge tag 'probes-v6.10' of ↵Linus Torvalds4-11/+121
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - tracing/probes: Add new pseudo-types %pd and %pD support for dumping dentry name from 'struct dentry *' and file name from 'struct file *' - uprobes performance optimizations: - Speed up the BPF uprobe event by delaying the fetching of the uprobe event arguments that are not used in BPF - Avoid locking by speculatively checking whether uprobe event is valid - Reduce lock contention by using read/write_lock instead of spinlock for uprobe list operation. This improved BPF uprobe benchmark result 43% on average - rethook: Remove non-fatal warning messages when tracing stack from BPF and skip rcu_is_watching() validation in rethook if possible - objpool: Optimize objpool (which is used by kretprobes and fprobe as rethook backend storage) by inlining functions and avoid caching nr_cpu_ids because it is a const value - fprobe: Add entry/exit callbacks types (code cleanup) - kprobes: Check ftrace was killed in kprobes if it uses ftrace * tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobe/ftrace: bail out if ftrace was killed selftests/ftrace: Fix required features for VFS type test case objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids objpool: enable inlining objpool_push() and objpool_pop() operations rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get() ftrace: make extra rcu_is_watching() validation check optional uprobes: reduce contention on uprobes_tree access rethook: Remove warning messages printed for finding return address of a frame. fprobe: Add entry/exit callbacks types selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD" selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD" Documentation: tracing: add new type '%pd' and '%pD' for kprobe tracing/probes: support '%pD' type for print struct file's name tracing/probes: support '%pd' type for print struct dentry's name uprobes: add speculative lockless system-wide uprobe filter check uprobes: prepare uprobe args buffer lazily uprobes: encapsulate preparation of uprobe args buffer
2024-05-17Merge tag 'sysctl-6.10-rc1' of ↵Linus Torvalds1-13/+12
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove sentinel elements from ctl_table structs in kernel/* Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. Removals for net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline through their respective subsystems making the next release the most likely place where the final series that removes the check for proc_name == NULL will land. This adds to removals already in arch/, drivers/ and fs/. - Adjust ctl_table definitions and references to allow constification - Remove unused ctl_table function arguments - Move non-const elements from ctl_table to ctl_table_header - Make ctl_table pointers const in ctl_table_root structure Making the static ctl_table structs const will increase safety by keeping the pointers to proc_handler functions in .rodata. Though no ctl_tables where made const in this PR, the ground work for making that possible has started with these changes sent by Thomas Weißschuh. * tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: drop now unnecessary out-of-bounds check sysctl: move sysctl type to ctl_table_header sysctl: drop sysctl_is_perm_empty_ctl_table sysctl: treewide: constify argument ctl_table_root::permissions(table) sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) bpf: Remove the now superfluous sentinel elements from ctl_table array delayacct: Remove the now superfluous sentinel elements from ctl_table array kprobes: Remove the now superfluous sentinel elements from ctl_table array printk: Remove the now superfluous sentinel elements from ctl_table array scheduler: Remove the now superfluous sentinel elements from ctl_table array seccomp: Remove the now superfluous sentinel elements from ctl_table array timekeeping: Remove the now superfluous sentinel elements from ctl_table array ftrace: Remove the now superfluous sentinel elements from ctl_table array umh: Remove the now superfluous sentinel elements from ctl_table array kernel misc: Remove the now superfluous sentinel elements from ctl_table array
2024-05-17Merge tag 'devicetree-for-6.10' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DT Bindings: - Convert samsung,exynos5-dp, atmel,lcdc, aspeed,ast2400-wdt bindings to schemas - Add bindings for Allwinner H616 NMI controller, Renesas r8a779g0 irqc, Renesas R-Car V4M TMU and CMT timers, Freescale S32G3 linflexuart, and Mediatek MT7988 XHCI - Add 'reg' constraints on DSI and SPI display panels - More dropping of unnecessary quotes in schemas - Use full paths rather than relative paths in schema $refs - Drop redundant storing of phandle for reserved memory DT Core: - Use scope based cleanups for kfree() and of_node_put() - Track interrupt-map and power-supplies for fw_devlink - Add buffer overflow check in of_modalias() - Add and use __of_prop_free() helper for freeing struct property" * tag 'devicetree-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (25 commits) of: property: Add fw_devlink support for interrupt-map property dt-bindings: display: panel: constrain 'reg' in DSI panels dt-bindings: display: panel: constrain 'reg' in SPI panels dt-bindings: display: samsung,ams495qa01: add missing SPI properties ref dt-bindings: Use full path to other schemas dt-bindings: PCI: qcom,pcie-sm8350: Drop redundant 'oneOf' sub-schema of: module: add buffer overflow check in of_modalias() dt-bindings: PCI: microchip: increase number of items in ranges property dt-bindings: Drop unnecessary quotes on keys dt-bindings: interrupt-controller: mediatek,mt6577-sysirq: Drop unnecessary quotes of: property: Use scope based cleanup on port_node of: reserved_mem: Remove the use of phandle from the reserved_mem APIs of: property: fw_devlink: Add support for "power-supplies" binding dt-bindings: watchdog: aspeed,ast2400-wdt: Convert to DT schema dt-bindings: irq: sun7i-nmi: Add binding for the H616 NMI controller dt-bindings: interrupt-controller: renesas,irqc: Add r8a779g0 support dt-bindings: timer: renesas,tmu: Add R-Car V4M support dt-bindings: timer: renesas,cmt: Add R-Car V4M support of: Use scope based of_node_put() cleanups of: Use scope based kfree() cleanups ...
2024-05-17Merge tag 'powerpc-6.10-1' of ↵Linus Torvalds2-12/+14
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT. - Allow per-process DEXCR (Dynamic Execution Control Register) settings via prctl, notably NPHIE which controls hashst/hashchk for ROP protection. - Install powerpc selftests in sub-directories. Note this changes the way run_kselftest.sh needs to be invoked for powerpc selftests. - Change fadump (Firmware Assisted Dump) to better handle memory add/remove. - Add support for passing additional parameters to the fadump kernel. - Add support for updating the kdump image on CPU/memory add/remove events. - Other small features, cleanups and fixes. Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui. * tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits) powerpc/fadump: Fix section mismatch warning powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP powerpc/fadump: update documentation about bootargs_append powerpc/fadump: pass additional parameters when fadump is active powerpc/fadump: setup additional parameters for dump capture kernel powerpc/pseries/fadump: add support for multiple boot memory regions selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction" KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info() KVM: PPC: Fix documentation for ppc mmu caps KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#" powerpc/code-patching: Use dedicated memory routines for patching powerpc/code-patching: Test patch_instructions() during boot powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region() powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX powerpc: Fix typos powerpc/eeh: Fix spelling of the word "auxillary" and update comment macintosh/ams: Fix unused variable warning powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large ...
2024-05-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linuxLinus Torvalds1-2/+9
Pull ARM updates from Russell King: - Updates to AMBA bus subsystem to drop .owner struct device_driver initialisations, moving that to code instead. - Add LPAE privileged-access-never support - Add support for Clang CFI - clkdev: report over-sized device or connection strings * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: (36 commits) ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y clkdev: report over-sized strings when creating clkdev entries ARM: 9393/1: mm: Use conditionals for CFI branches ARM: 9392/2: Support CLANG CFI ARM: 9391/2: hw_breakpoint: Handle CFI breakpoints ARM: 9390/2: lib: Annotate loop delay instructions for CFI ARM: 9389/2: mm: Define prototypes for all per-processor calls ARM: 9388/2: mm: Type-annotate all per-processor assembly routines ARM: 9387/2: mm: Rewrite cacheflush vtables in CFI safe C ARM: 9386/2: mm: Use symbol alias for cache functions ARM: 9385/2: mm: Type-annotate all cache assembly routines ARM: 9384/2: mm: Make tlbflush routines CFI safe ARM: 9382/1: ftrace: Define ftrace_stub_graph ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement ARM: 9357/2: Reduce the number of #ifdef CONFIG_CPU_SW_DOMAIN_PAN ARM: 9356/2: Move asm statements accessing TTBCR into C functions ARM: 9355/2: Add TTBCR_* definitions to pgtable-3level-hwdef.h ARM: 9379/1: coresight: tpda: drop owner assignment ARM: 9378/1: coresight: etm4x: drop owner assignment ARM: 9377/1: hwrng: nomadik: drop owner assignment ...
2024-05-16Merge tag 'platform-drivers-x86-v6.10-1' of ↵Linus Torvalds4-5/+18
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: - New drivers/platform/arm64 directory for arm64 embedded-controller drivers - New drivers: - Acer Aspire 1 embedded controllers (for arm64 models) - ACPI quickstart PNP0C32 buttons - Dell All-In-One backlight support (dell-uart-backlight) - Lenovo WMI camera buttons - Lenovo Yoga Tablet 2 Pro 1380F/L fast charging - MeeGoPad ANX7428 Type-C Cross Switch (power sequencing only) - MSI WMI sensors (fan speed sensors only for now) - Asus WMI: - 2024 ROG Mini-LED support - MCU powersave support - Vivobook GPU MUX support - Misc. other improvements - Ideapad laptop: - Export FnLock LED as LED class device - Switch platform profiles using thermal management key - Intel drivers: - IFS: various improvements - PMC: Lunar Lake support - SDSI: various improvements - TPMI/ISST: various improvements - tools: intel-speed-select: various improvements - MS Surface drivers: - Fan profile switching support - Surface Pro thermal sensors support - ThinkPad ACPI: - Reworked hotkey support to use sparse keymaps - Add support for new trackpoint-doubletap, Fn+N and Fn+G hotkeys - WMI core: - New WMI driver development guide - x86 Android tablets: - Lenovo Yoga Tablet 2 Pro 1380F/L support - Xiaomi MiPad 2 status LED and bezel touch buttons backlight support - Miscellaneous cleanups / fixes / improvements * tag 'platform-drivers-x86-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (128 commits) platform/x86: Add new MeeGoPad ANX7428 Type-C Cross Switch driver devm-helpers: Fix a misspelled cancellation in the comments tools arch x86: Add dell-uart-backlight-emulator platform/x86: Add new Dell UART backlight driver platform/x86: x86-android-tablets: Create LED device for Xiaomi Pad 2 bottom bezel touch buttons platform/x86: x86-android-tablets: Xiaomi pad2 RGB LED fwnode updates platform/x86: x86-android-tablets: Pass struct device to init() platform/x86/amd: pmc: Add new ACPI ID AMDI000B platform/x86/amd: pmf: Add new ACPI ID AMDI0105 platform/x86: p2sb: Don't init until unassigned resources have been assigned platform/surface: aggregator: Log critical errors during SAM probing platform/x86: ISST: Support SST-BF and SST-TF per level platform/x86/fujitsu-laptop: Replace sprintf() with sysfs_emit() tools/power/x86/intel-speed-select: v1.19 release tools/power/x86/intel-speed-select: Display CPU as None for -1 tools/power/x86/intel-speed-select: SST BF/TF support per level tools/power/x86/intel-speed-select: Increase number of CPUs displayed tools/power/x86/intel-speed-select: Present all TRL levels for turbo-freq tools/power/x86/intel-speed-select: Fix display for unsupported levels tools/power/x86/intel-speed-select: Support multiple dies ...
2024-05-16Merge tag 'mmc-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds2-4/+7
Pull MMC updates from Ulf Hansson: "MMC core: - Increase the timeout period of the ACMD41 command - Add card entry for quirks to debugfs - Add mmc_gpiod_set_cd_config() function - Store owner from SDIO modules with sdio_register_driver() MMC host: - atmel-mci: Some cleanups and a switch to use dev_err_probe() - renesas_sdhi: - Add support for RZ/G2L, RZ/G3S and RZ/V2M variants - Set the SDBUF after reset - sdhci: Add support for "Tuning Error" interrupts - sdhci-acpi: - Add quirk to enable pull-up on the card-detect GPIO on Asus T100TA - Disable write protect detection on Toshiba WT10-A - Fix Lenovo Yoga Tablet 2 Pro 1380 sdcard slot not working - sdhci_am654: - Re-work and fix the tuning support for multiple speed-modes - Add tuning algorithm for delay chain - sdhci-esdhc-imx: Add NXP S32G3 support - sdhci-of-dwcmshc: - Add tuning support for Sophgo CV1800B and SG200X - Implement SDHCI CQE support - sdhci-pci-gli: Use the proper pci_set_power_state() instead of PMCSR writes" MEMSTICK: - Convert a couple of drivers to use the ->remove_new() callback" * tag 'mmc-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (59 commits) mmc: renesas_sdhi: Add compatible string for RZ/G2L family, RZ/G3S, and RZ/V2M SoCs dt-bindings: mmc: renesas,sdhi: Document RZ/G2L family compatibility dt-bindings: mmc: renesas,sdhi: Group single const value items into an enum list mmc: renesas_sdhi: Set the SDBUF after reset mmc: core: Increase the timeout period of the ACMD41 command mmc: core: Convert to use __mmc_poll_for_busy() SD_APP_OP_COND too mmc: atmel-mci: Switch to use dev_err_probe() mmc: atmel-mci: Incapsulate used to be a platform data into host structure mmc: atmel-mci: Replace platform device pointer by generic one mmc: atmel-mci: Use temporary variable for struct device mmc: atmel-mci: Get rid of platform data leftovers mmc: sdhci-of-dwcmshc: Add tuning support for Sophgo CV1800B and SG200X mmc: sdhci-of-dwcmshc: Remove useless "&" of th1520_execute_tuning mmc: sdhci-s3c: Choose sdhci_ops based on variant mmc: sdhci_am654: Constify struct sdhci_ops mmc: sdhci-sprd: Constify struct sdhci_ops mmc: sdhci-omap: Constify struct sdhci_ops mmc: sdhci-esdhc-mcf: Constify struct sdhci_ops mmc: slot-gpio: Use irq_handler_t type mmc: sdhci-acpi: Add quirk to enable pull-up on the card-detect GPIO on Asus T100TA ...
2024-05-15Merge tag 'wq-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-6/+48
Pull workqueue updates from Tejun Heo: - Work items can now be disabled and enabled, and cancel_work_sync() and disable_work() can be called form atomic contexts for BH work items. This closes feature gap with tasklet and should allow converting all existing tasklet users to BH workqueues. - Improve pool sharing for unbound workqueues with strict affinity. - Misc changes including doc updates, improved debug annotations and cleanups. * tag 'wq-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use "@..." in function comment to describe variable length argument workqueue: Add destroy_work_on_stack() in workqueue_softirq_dead() workqueue: remove unnecessary import and function in wq_monitor.py workqueue: Introduce enable_and_queue_work() convenience function workqueue: add function in event of workqueue_activate_work workqueue: Cleanup subsys attribute registration workqueue: Use list_last_entry() to get the last idle worker workqueue: Move attrs->cpumask out of worker_pool's properties when attrs->affn_strict workqueue: Use INIT_WORK_ONSTACK in workqueue_softirq_dead() workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items workqueue: Remember whether a work item was on a BH workqueue workqueue: Remove WORK_OFFQ_CANCELING workqueue: Implement disable/enable for (delayed) work items workqueue: Preserve OFFQ bits in cancel[_sync] paths
2024-05-15Merge tag 'cgroup-for-6.10' of ↵Linus Torvalds2-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - The locking around cpuset hotplug processing has always been a bit of mess which was worked around by making hotplug processing asynchronous. The asynchronity isn't great and led to other issues. We tried to make the behavior synchronous a while ago but that led to lockdep splats. Waiman took another stab at cleaning up and making it synchronous. The patch has been in -next for well over a month and there haven't been any complaints, so fingers crossed. - Tracepoints added to help understanding rstat lock contentions. - A bunch of minor changes - doc updates, code cleanups and selftests. * tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits) cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints selftests/cgroup: Drop define _GNU_SOURCE docs: cgroup-v1: Update page cache removal functions selftests/cgroup: fix uninitialized variables in test_zswap.c selftests/cgroup: cpu_hogger init: use {} instead of {NULL} selftests/cgroup: fix clang warnings: uninitialized fd variable selftests/cgroup: fix clang build failures for abs() calls cgroup/cpuset: Remove outdated comment in sched_partition_write() cgroup/cpuset: Fix incorrect top_cpuset flags cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice cgroup/cpuset: Statically initialize more members of top_cpuset cgroup: Avoid unnecessary looping in cgroup_no_v1() cgroup, legacy_freezer: update comment for freezer_css_offline() docs, cgroup: add entries for pids to cgroup-v2.rst cgroup: don't call cgroup1_pidlist_destroy_all() for v2 cgroup_freezer: update comment for freezer_css_online() cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints cgroup/pids: Remove superfluous zeroing docs: cgroup-v1: Fix description for css_online ...
2024-05-16kprobe/ftrace: bail out if ftrace was killedStephen Brennan1-0/+7
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Link: https://lore.kernel.org/all/20240501162956.229427-1-stephen.s.brennan@oracle.com/ Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-48/+9
Pull KVM updates from Paolo Bonzini: "ARM: - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions al