summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)AuthorFilesLines
2025-08-01Merge tag 'trace-v6.17' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing When tracefs was first introduced back in 2014, the directory /sys/kernel/tracing was added and is the designated location to mount tracefs. To keep backward compatibility, tracefs was auto-mounted in /sys/kernel/debug/tracing as well. All distros now mount tracefs on /sys/kernel/tracing. Having it seen in two different locations has lead to various issues and inconsistencies. The VFS folks have to also maintain debugfs_create_automount() for this single user. It's been over 10 years. Tooling and scripts should start replacing the debugfs location with the tracefs one. The reason tracefs was created in the first place was to allow access to the tracing facilities without the need to configure debugfs into the kernel. Using tracefs should now be more robust. A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED which is default y, so that the kernel is still built with the automount. This config allows those that want to remove the automount from debugfs to do so. When tracefs is accessed from /sys/kernel/debug/tracing, the following printk is triggerd: pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n"); This gives users another 5 years to fix their scripts. - Use queue_rcu_work() instead of call_rcu() for freeing event filters The number of filters to be free can be many depending on the number of events within an event system. Freeing them from softirq context can potentially cause undesired latency. Use the RCU workqueue to free them instead. - Remove pointless memory barriers in latency code Memory barriers were added to some of the latency code a long time ago with the idea of "making them visible", but that's not what memory barriers are for. They are to synchronize access between different variables. There was no synchronization here making them pointless. - Remove "__attribute__()" from the type field of event format When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y and PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with the following: field:const char * filename; offset:24; size:8; signed:0; Turns into: field:const char __attribute__((btf_type_tag("user"))) * filename; offset:24; size:8; signed:0; This confuses parsers. Add code to strip these tags from the strings. - Add eprobe config option CONFIG_EPROBE_EVENTS Eprobes were added back in 5.15 but were only enabled when another probe was enabled (kprobe, fprobe, uprobe, etc). The eprobes had no config option of their own. Add one as they should be a separate entity. It's default y to keep with the old kernels but still has dependencies on TRACING and HAVE_REGS_AND_STACK_ACCESS_API. - Add eprobe documentation When eprobes were added back in 5.15 no documentation was added to describe them. This needs to be rectified. - Replace open coded cpumask_next_wrap() in move_to_next_cpu() - Have preemptirq_delay_run() use off-stack CPU mask - Remove obsolete comment about pelt_cfs event DECLARE_TRACE() appends "_tp" to trace events now, but the comment above pelt_cfs still mentioned appending it manually. - Remove EVENT_FILE_FL_SOFT_MODE flag The SOFT_MODE flag was required when the soft enabling and disabling of trace events was first introduced. But there was a bug with this approach as it only worked for a single instance. When multiple users required soft disabling and disabling the code was changed to have a ref count. The SOFT_MODE flag is now set iff the ref count is non zero. This is redundant and just reading the ref count is good enough. - Fix typo in comment * tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Add documentation about eprobes tracing: Have eprobes have their own config option tracing: Remove "__attribute__()" from the type field of event format tracing: Deprecate auto-mounting tracefs in debugfs tracing: Fix comment in trace_module_remove_events() tracing: Remove EVENT_FILE_FL_SOFT_MODE flag tracing: Remove pointless memory barriers tracing/sched: Remove obsolete comment on suffixes kernel: trace: preemptirq_delay_test: use offstack cpu mask tracing: Use queue_rcu_work() to free filters tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu()
2025-07-31Merge tag 'cgroup-for-6.17' of ↵Linus Torvalds1-47/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Allow css_rstat_updated() in NMI context to enable memory accounting for allocations in NMI context. - /proc/cgroups doesn't contain useful information for cgroup2 and was updated to only show v1 controllers. This unfortunately broke something in the wild. Add an option to bring back the old behavior to ease transition. - selftest updates and other cleanups. * tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Add compatibility option for content of /proc/cgroups selftests/cgroup: fix cpu.max tests cgroup: llist: avoid memory tears for llist_node selftests: cgroup: Fix missing newline in test_zswap_writeback_one selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup memcg: cgroup: call css_rstat_updated irrespective of in_nmi() cgroup: remove per-cpu per-subsystem locks cgroup: make css_rstat_updated nmi safe cgroup: support to enable nmi-safe css_rstat_updated selftests: cgroup: Fix compilation on pre-cgroupns kernels selftests: cgroup: Optionally set up v1 environment selftests: cgroup: Add support for named v1 hierarchies in test_core selftests: cgroup_util: Add helpers for testing named v1 hierarchies Documentation: cgroup: add section explaining controller availability cgroup: Drop sock_cgroup_classid() dummy implementation
2025-07-31Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds6-104/+48
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-31Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-2/+11
Pull SCSI updates from James Bottomley: "Smaller set of driver updates than usual (ufs, lpfc, mpi3mr). The rest (including the core file changes) are doc updates and some minor bug fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (49 commits) scsi: libiscsi: Initialize iscsi_conn->dd_data only if memory is allocated scsi: scsi_transport_fc: Add comments to describe added 'rport' parameter scsi: bfa: Double-free fix scsi: isci: Fix dma_unmap_sg() nents value scsi: mvsas: Fix dma_unmap_sg() nents value scsi: elx: efct: Fix dma_unmap_sg() nents value scsi: scsi_transport_fc: Change to use per-rport devloss_work_q scsi: ufs: exynos: Fix programming of HCI_UTRL_NEXUS_TYPE scsi: core: Fix kernel doc for scsi_track_queue_full() scsi: ibmvscsi_tgt: Fix dma_unmap_sg() nents value scsi: ibmvscsi_tgt: Fix typo in comment scsi: mpi3mr: Update driver version to 8.14.0.5.50 scsi: mpi3mr: Serialize admin queue BAR writes on 32-bit systems scsi: mpi3mr: Drop unnecessary volatile from __iomem pointers scsi: mpi3mr: Fix race between config read submit and interrupt completion scsi: ufs: ufs-qcom: Enable QUnipro Internal Clock Gating scsi: ufs: core: Add ufshcd_dme_rmw() to modify DME attributes scsi: ufs: ufs-qcom: Update esi_vec_mask for HW major version >= 6 scsi: core: Use scsi_cmd_priv() instead of open-coding it scsi: qla2xxx: Remove firmware URL ...
2025-07-31Merge tag 'ext4_for_linus_6.17-rc1' of ↵Linus Torvalds1-11/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major ext4 changes for 6.17: - Better scalability for ext4 block allocation - Fix insufficient credits when writing back large folios Miscellaneous bug fixes, especially when handling exteded attriutes, inline data, and fast commit" * tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits) ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr ext4: implement linear-like traversal across order xarrays ext4: refactor choose group to scan group ext4: convert free groups order lists to xarrays ext4: factor out ext4_mb_scan_group() ext4: factor out ext4_mb_might_prefetch() ext4: factor out __ext4_mb_scan_group() ext4: fix largest free orders lists corruption on mb_optimize_scan switch ext4: fix zombie groups in average fragment size lists ext4: merge freed extent with existing extents before insertion ext4: convert sbi->s_mb_free_pending to atomic_t ext4: fix typo in CR_GOAL_LEN_SLOW comment ext4: get rid of some obsolete EXT4_MB_HINT flags ext4: utilize multiple global goals to reduce contention ext4: remove unnecessary s_md_lock on update s_mb_last_group ext4: remove unnecessary s_mb_last_start ext4: separate stream goal hits from s_bal_goals for better tracking ext4: add ext4_try_lock_group() to skip busy groups ext4: initialize superblock fields in the kballoc-test.c kunit tests ext4: refactor the inline directory conversion and new directory codepaths ...
2025-07-30Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds1-4/+34
Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
2025-07-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-82/+29
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-30Merge tag 'trace-unused-v6.17' of ↵Linus Torvalds5-51/+41
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracepoint cleanup from Steven Rostedt: "Remove or hide unused tracepoints Tracepoints take up memory (around 5K per tracepoint) even when they are unused. Changes are being made to detect when a tracepoint is defined but unused and a warning is shown at build. But those changes are not yet ready for inclusion. - Fix some of the unused tracepoints that it detected Some tracepoints were removed and others were hidden by config settings to match the config settings of where they are instantiated. Some tracepoints were moved into architecture specific code as only one architecture used them. - Call the ftrace_test_filter tracepoint in an unreachable if statement The ftrace_test_filter tracepoint which is defined when ftrace selftests are configured and is used to test the filter logic, but the tracepoint is not actually called. It is put into an if statement to not have it get compiled out, but also not warn for not being used" * tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64 tracing: Call trace_ftrace_test_filter() for the event tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit binder: Remove unused binder lock events PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS PM: tracing: Hide device_pm_callback events under PM_SLEEP PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE PM: cpufreq: powernv/tracing: Move powernv_throttle trace event alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured tracing, AER: Hide PCIe AER event when PCIEAER is not configured
2025-07-30Merge tag 'trace-rv-6.17' of ↵Linus Torvalds1-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verification updates from Steven Rostedt: - Added Linear temporal logic monitors for RT application Real-time applications may have design flaws causing them to have unexpected latency. For example, the applications may raise page faults, or may be blocked trying to take a mutex without priority inheritance. However, while attempting to implement DA monitors for these real-time rules, deterministic automaton is found to be inappropriate as the specification language. The automaton is complicated, hard to understand, and error-prone. For these cases, linear temporal logic is found to be more suitable. The LTL is more concise and intuitive. - Make printk_deferred() public The new monitors needed access to printk_deferred(). Make them visible for the entire kernel. - Add a vpanic() to allow for va_list to be passed to panic. - Add rtapp container monitor. A collection of monitors that check for common problems with real-time applications that cause unexpected latency. - Add page fault tracepoints to risc-v These tracepoints are necessary to for the RV monitor to run on risc-v. - Fix the behaviour of the rv tool with -s and idle tasks. - Allow the rv tool to gracefully terminate with SIGTERM - Adjusts dot2c not to create lines over 100 columns - Properly order nested monitors in the RV Kconfig file - Return the registration error in all DA monitor instead of 0 - Update and add new sched collection monitors Replace tss and sncid monitors with more complete sts: Not only prove that switches occur in scheduling context and scheduling needs interrupt disabled but also that each call to the scheduler disables interrupts to (optionally) switch. New monitor: nrp Preemption requires need resched which is cleared by any switch (includes a non optimal workaround for /nested/ preemptions) New monitor: sssw suspension requires setting the task to sleepable and, after the switch occurs, the task requires a wakeup to come back to runnable New monitor: opid waking and need-resched operations occur with interrupts and preemption disabled or in IRQ without explicitly disabling preemption" * tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits) rv: Add opid per-cpu monitor rv: Add nrp and sssw per-task monitors rv: Replace tss and sncid monitors with more complete sts sched: Adapt sched tracepoints for RV task model rv: Retry when da monitor detects race conditions rv: Adjust monitor dependencies rv: Use strings in da monitors tracepoints rv: Remove trailing whitespace from tracepoint string rv: Add da_handle_start_run_event_ to per-task monitors rv: Fix wrong type cast in reactors_show() and monitor_reactor_show() rv: Fix wrong type cast in monitors_show() rv: Remove struct rv_monitor::reacting rv: Remove rv_reactor's reference counter rv: Merge struct rv_reactor_def into struct rv_reactor rv: Merge struct rv_monitor_def into struct rv_monitor rv: Remove unused field in struct rv_monitor_def rv: Return init error when registering monitors verification/rvgen: Organise Kconfig entries for nested monitors tools/dot2c: Fix generated files going over 100 column limit tools/rv: Stop gracefully also on SIGTERM ...
2025-07-30Merge tag 'net-next-6.17' of ↵Linus Torvalds2-37/+13
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
2025-07-29Merge tag 'soc-drivers-6.17' of ↵Linus Torvalds1-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "Changes are all over the place, but very little sticks out as noteworthy. There is a new misc driver for the Raspberry Pi 5's RP1 multifunction I/O chip, along with hooking it up to the pinctrl and clk frameworks. The reset controller and memory subsystems have mainly small updates, but there are two new reset drivers for the K230 and VC1800B SoCs, and new memory driver support for Tegra264. The ARM SMCCC and SCMI firmware drivers gain a few more features that should help them be supported across more environments. Similarly, the SoC specific firmware on Tegra and Qualcomm get minor enhancements and chip support. In the drivers/soc/ directory, the ASPEED LPC snoop driver gets an overhaul for code robustness, the Tegra and Qualcomm and NXP drivers grow to support more chips, while the Hisilicon, Mediatek and Renesas drivers see mostly janitorial fixes" * tag 'soc-drivers-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (100 commits) bus: del unnecessary init var soc: fsl: qe: convert set_multiple() to returning an integer pinctrl: rp1: use new GPIO line value setter callbacks soc: hisilicon: kunpeng_hccs: Fix incorrect log information dt-bindings: soc: qcom: qcom,pmic-glink: document Milos compatible dt-bindings: soc: qcom,aoss-qmp: document the Milos Always-On Subsystem side channel dt-bindings: firmware: qcom,scm: document Milos SCM Firmware Interface soc: qcom: socinfo: Add support to retrieve APPSBL build details soc: qcom: pmic_glink: fix OF node leak soc: qcom: spmi-pmic: add more PMIC SUBTYPE IDs soc: qcom: socinfo: Add PM7550 & PMIV0108 PMICs soc: qcom: socinfo: Add SoC IDs for SM7635 family dt-bindings: arm: qcom,ids: Add SoC IDs for SM7635 family firmware: qcom: scm: request the waitqueue irq *after* initializing SCM firmware: qcom: scm: initialize tzmem before marking SCM as available firmware: qcom: scm: take struct device as argument in SHM bridge enable firmware: qcom: scm: remove unused arguments from SHM bridge routines soc: qcom: rpmh-rsc: Add RSC version 4 support memory: tegra: Add Tegra264 MC and EMC support firmware: tegra: bpmp: Fix build failure for tegra264-only config ...
2025-07-29Merge tag 'kvm-x86-generic-6.17' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-0/+27
KVM generic changes for 6.17 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions. - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL.
2025-07-29Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-82/+2
KVM IRQ changes for 6.17 - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand. - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c. - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Dedup x86's device posted IRQ code, as the vast majority of functionality can be shared verbatime between SVM and VMX. - Harden the device posted IRQ code against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique.
2025-07-28Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linuxLinus Torvalds1-1/+90
Pull block up