summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-07-24Merge tag 'random-6.11-rc1-for-linus' of ↵Linus Torvalds2-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-24io_uring: kill REQ_F_CANCEL_SEQPavel Begunkov1-3/+0
We removed the reliance on the flag by the cancellation path and now it's unused. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e57afe566bbe4fefeb44daffb08900f2a4756577.1721819383.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-23Merge tag 'f2fs-for-6.11-rc1' of ↵Linus Torvalds1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "A pretty small update including mostly minor bug fixes in zoned storage along with the large section support. Enhancements: - add support for FS_IOC_GETFSSYSFSPATH - enable atgc dynamically if conditions are met - use new ioprio Macro to get ckpt thread ioprio level - remove unreachable lazytime mount option parsing Bug fixes: - fix null reference error when checking end of zone - fix start segno of large section - fix to cover read extent cache access with lock - don't dirty inode for readonly filesystem - allocate a new section if curseg is not the first seg in its zone - only fragment segment in the same section - truncate preallocated blocks in f2fs_file_open() - fix to avoid use SSR allocate when do defragment - fix to force buffered IO on inline_data inode And some minor code clean-ups and sanity checks" * tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (26 commits) f2fs: clean up addrs_per_{inode,block}() f2fs: clean up F2FS_I() f2fs: use meta inode for GC of COW file f2fs: use meta inode for GC of atomic file f2fs: only fragment segment in the same section f2fs: fix to update user block counts in block_operations() f2fs: remove unreachable lazytime mount option parsing f2fs: fix null reference error when checking end of zone f2fs: fix start segno of large section f2fs: remove redundant sanity check in sanity_check_inode() f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid f2fs: fix to use mnt_{want,drop}_write_file replace file_{start,end}_wrtie f2fs: clean up set REQ_RAHEAD given rac f2fs: enable atgc dynamically if conditions are met f2fs: fix to truncate preallocated blocks in f2fs_file_open() f2fs: fix to cover read extent cache access with lock f2fs: fix return value of f2fs_convert_inline_inode() f2fs: use new ioprio Macro to get ckpt thread ioprio level f2fs: fix to don't dirty inode for readonly filesystem f2fs: fix to avoid use SSR allocate when do defragment ...
2024-07-23Merge tag 'jfs-6.11' of github.com:kleikamp/linux-shaggyLinus Torvalds1-6/+0
Pull jfs updates from David Kleikamp: "Folio conversion from Matthew Wilcox and a few various fixes" * tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy: jfs: don't walk off the end of ealist jfs: Fix shift-out-of-bounds in dbDiscardAG jfs: Fix array-index-out-of-bounds in diFree jfs: fix null ptr deref in dtInsertEntry jfs: Remove use of folio error flag fs: Remove i_blocks_per_page jfs: Change metapage->page to metapage->folio jfs: Convert force_metapage to use a folio jfs: Convert inc_io to take a folio jfs: Convert page_to_mp to folio_to_mp jfs; Convert __invalidate_metapages to use a folio jfs: Convert dec_io to take a folio jfs: Convert drop_metapage and remove_metapage to take a folio jfs; Convert release_metapage to use a folio jfs: Convert insert_metapage() to take a folio jfs: Convert __get_metapage to use a folio jfs: Convert metapage_writepage to metapage_write_folio jfs: Convert metapage_read_folio to use folio APIs
2024-07-23Merge tag 'rproc-v6.11' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull remoteproc updates from Bjorn Andersson: - The maximum amount of DDR memory used by the Mediatek MT8188/MT8195 SCP is increased to handle new use cases. Handling of optional L1TCM memory is made actually optional. - An optimization is introduced to only clear the unused portion of IPI shared buffers, rather than the entire buffer before writing the message. - Detection for IPC-only mode in the TI K3 DSP remoteproc driver is corrected. The loglevel of a debug print in the same is lowered from error. - Support for attaching to an running remote processor is added to the Xilinx R5F. - An in-kernel implementation of the Qualcomm "protected domain mapper" (aka service registry) service is introduced, to remove the dependency on a userspace implementation to detect when the battery monitor and USB Type-C port manager becomes available. This is then integrated with the Qualcomm remoteproc driver. - The Qualcomm PAS remoteproc driver gains support for attempting to bust hwspinlocks held by the remote processor when it crashed/stopped. - The TI OMAP remoteproc driver is transitioned to use devres helpers for various forms of allocations. - Parsing of memory-regions in the i.MX remoteproc driver is improved to avoid a NULL pointer dereference if the phandle reference is empty. of_node reference counting is corrected in the same. * tag 'rproc-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: remoteproc: mediatek: Increase MT8188/MT8195 SCP core0 DRAM size remoteproc: k3-dsp: Fix log levels where appropriate remoteproc: xlnx: Add attach detach support remoteproc: qcom: select AUXILIARY_BUS remoteproc: k3-r5: Fix IPC-only mode detection remoteproc: mediatek: Don't attempt to remap l1tcm memory if missing remoteproc: qcom: enable in-kernel PD mapper dt-bindings: remoteproc: imx_rproc: Add minItems for power-domain remoteproc: imx_rproc: Fix refcount mistake in imx_rproc_addr_init remoteproc: omap: Use devm_rproc_add() helper remoteproc: omap: Use devm action to release reserved memory remoteproc: omap: Use devm_rproc_alloc() helper remoteproc: imx_rproc: Skip over memory region when node value is NULL dt-bindings: remoteproc: k3-dsp: Correct optional sram properties for AM62A SoCs remoteproc: qcom_q6v5_pas: Add hwspinlock bust on stop soc: qcom: smem: Add qcom_smem_bust_hwspin_lock_by_host() remoteproc: mediatek: Zero out only remaining bytes of IPI buffer
2024-07-23Merge tag 'hwlock-v6.11' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull hwspinlock updates from Bjorn Andersson: "This introduces a mechanism in the hardware spinlock framework, and the Qualcomm TCSR mutex driver, for allowing clients to bust locks held by a remote processor in the event that this enters a faulty state while holding the shared lock" * tag 'hwlock-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: hwspinlock: qcom: implement bust operation hwspinlock: Introduce hwspin_lock_bust()
2024-07-23Merge tag 'modules-6.11-rc1' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module update from Luis Chamberlain: "This is a super boring development cycle this time around for modules, there is only one patch in this pull request. The patch deals with a corner case set of dependencies which is not resolved today to ensure users get the module they need on initramfs. Currently only one module is known to exist which needs this, however this can grow to capture other corner cases likely escaped and not reported before. The kernel change is just a section update, the real work is done and merged already on upstream kmod. This has been on linux-next for 3 weeks now" * tag 'modules-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: create weak dependecies
2024-07-23Merge tag 'i2c-for-6.11-rc1-second-batch' of ↵Linus Torvalds2-14/+9
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: "The I2C core has two header documentation updates as the dependecies are in now. The I2C host drivers add some patches which nearly fell through the cracks: - Added descriptions in the DTS for the Qualcomm SM8650 and SM8550 Camera Control Interface (CCI). - Added support for the "settle-time-us" property, which allows the gpio-mux device to switch from one bus to another with a configurable delay. The time can be set in the DTS. The latest change also includes file sorting. - Fixed slot numbering in the SMBus framework to prevent failures when more than 8 slots are occupied. It now enforces a a maximum of 8 slots to be used. This ensures that the Intel PIIX4 device can register the SPDs correctly without failure, even if other slots are populated but not used" * tag 'i2c-for-6.11-rc1-second-batch' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: header: improve kdoc for i2c_algorithm i2c: header: remove unneeded stuff regarding i2c_algorithm i2c: piix4: Register SPDs i2c: smbus: remove i801 assumptions from SPD probing i2c: mux: gpio: Add support for the 'settle-time-us' property i2c: mux: gpio: Re-order #include to match alphabetic order dt-bindings: i2c: mux-gpio: Add 'settle-time-us' property dt-bindings: i2c: qcom-cci: Document sm8650 compatible dt-bindings: i2c: qcom-cci: Document sm8550 compatible
2024-07-23Merge tag 'for-v6.11' of ↵Linus Torvalds1-13/+6
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: "Power-supply core: - new charging_orange_full_green RGB LED trigger - simplify and cleanup power-supply LED trigger code - expose power information via hwmon compatibility layer New hardware support: - enable battery support for Qualcomm Snapdragon X Elite - new battery driver for Maxim MAX17201/MAX17205 - new battery driver for Lenovo Yoga C630 laptop (custom EC) Cleanups: - cleanup 'struct i2c_device_id' initializations - misc small battery driver cleanups and fixes" * tag 'for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: supply: sysfs: use power_supply_property_is_writeable() power: supply: qcom_battmgr: Enable battery support on x1e80100 power: supply: add support for MAX1720x standalone fuel gauge dt-bindings: power: supply: add support for MAX17201/MAX17205 fuel gauge power: reset: piix4: add missing MODULE_DESCRIPTION() macro power: supply: samsung-sdi-battery: Constify struct power_supply_maintenance_charge_table power: supply: samsung-sdi-battery: Constify struct power_supply_vbat_ri_table power: supply: lenovo_yoga_c630_battery: add Lenovo C630 driver power: supply: ingenic: Fix some error handling paths in ingenic_battery_get_property() power: supply: ab8500: Clean some error messages power: supply: ab8500: Use iio_read_channel_processed_scale() power: supply: ab8500: Fix error handling when calling iio_read_channel_processed() power: supply: hwmon: Add support for power sensors power: supply: ab8500: remove unused struct 'inst_curr_result_list' power: supply: bd99954: remove unused struct 'battery_data' power: supply: leds: Add activate() callback to triggers power: supply: leds: Share trig pointer for online and charging_full power: supply: leds: Add power_supply_[un]register_led_trigger() power: supply: Drop explicit initialization of struct i2c_device_id::driver_data to 0
2024-07-22Merge tag 'irq-msi-2024-07-22' of ↵Linus Torvalds1-50/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI interrupt updates from Thomas Gleixner: "Switch ARM/ARM64 over to the modern per device MSI domains. This simplifies the handling of platform MSI and wire to MSI controllers and removes about 500 lines of legacy code. Aside of that it paves the way for ARM/ARM64 to utilize the dynamic allocation of PCI/MSI interrupts and to support the upcoming non standard IMS (Interrupt Message Store) mechanism on PCIe devices" * tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) irqchip/gic-v3-its: Correctly fish out the DID for platform MSI irqchip/gic-v3-its: Correctly honor the RID remapping genirq/msi: Move msi_device_data to core genirq/msi: Remove platform MSI leftovers irqchip/irq-mvebu-icu: Remove platform MSI leftovers irqchip/irq-mvebu-sei: Switch to MSI parent irqchip/mvebu-odmi: Switch to parent MSI irqchip/mvebu-gicp: Switch to MSI parent irqchip/irq-mvebu-icu: Prepare for real per device MSI irqchip/imx-mu-msi: Switch to MSI parent irqchip/gic-v2m: Switch to device MSI irqchip/gic_v3_mbi: Switch over to parent domain genirq/msi: Remove platform_msi_create_device_domain() irqchip/mbigen: Remove platform_msi_create_device_domain() fallback irqchip/gic-v3-its: Switch platform MSI to MSI parent irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI irqchip/mbigen: Prepare for real per device MSI irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X] irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX ...
2024-07-22Merge tag 'irq-core-2024-07-15' of ↵Linus Torvalds5-13/+179
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: "Core: - Provide a new mechanism to create interrupt domains. The existing interfaces have already too many parameters and it's a pain to expand any of this for new required functionality. The new function takes a pointer to a data structure as argument. The data structure combines all existing parameters and allows for easy extension. The first extension for this is to handle the instantiation of generic interrupt chips at the core level and to allow drivers to provide extra init/exit callbacks. This is necessary to do the full interrupt chip initialization before the new domain is published, so that concurrent usage sites won't see a half initialized interrupt domain. Similar problems exist on teardown. This has turned out to be a real problem due to the deferred and parallel probing which was added in recent years. Handling this at the core level allows to remove quite some accrued boilerplate code in existing drivers and avoids horrible workarounds at the driver level. - The usual small improvements all over the place Drivers: - Add support for LAN966x OIC and RZ/Five SoC - Split the STM ExtI driver into a microcontroller and a SMP version to allow building the latter as a module for multi-platform kernels - Enable MSI support for Armada 370XP on platforms which do not support IPIs - The usual small fixes and enhancements all over the place" * tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) irqdomain: Fix the kernel-doc and plug it into Documentation genirq: Set IRQF_COND_ONESHOT in request_irq() irqchip/imx-irqsteer: Handle runtime power management correctly irqchip/gic-v3: Pass #redistributor-regions to gic_of_setup_kvm_info() irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock irqchip/gic-v4: Always configure affinity on VPE activation Revert "irqchip/dw-apb-ictl: Support building as module" Revert "Loongarch: Support loongarch avec" arm64: Kconfig: Allow build irq-stm32mp-exti driver as module ARM: stm32: Allow build irq-stm32mp-exti driver as module irqchip/stm32mp-exti: Allow building as module irqchip/stm32mp-exti: Rename internal symbols irqchip/stm32-exti: Split MCU and MPU code arm64: Kconfig: Select STM32MP_EXTI on STM32 platforms ARM: stm32: Use different EXTI driver on ARMv7m and ARMv7a irqchip/stm32-exti: Add CONFIG_STM32MP_EXTI irqchip/dw-apb-ictl: Support building as module irqchip/riscv-aplic: Simplify the initialization code ...
2024-07-22Merge tag 'for-6.11/block-20240722' of git://git.kernel.dk/linuxLinus Torvalds5-67/+100
Pull more block updates from Jens Axboe: - MD fixes via Song: - md-cluster fixes (Heming Zhao) - raid1 fix (Mateusz Jończyk) - s390/dasd module description (Jeff) - Series cleaning up and hardening the blk-mq debugfs flag handling (John, Christoph) - blk-cgroup cleanup (Xiu) - Error polled IO attempts if backend doesn't support it (hexue) - Fix for an sbitmap hang (Yang) * tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits) blk-cgroup: move congestion_count to struct blkcg sbitmap: fix io hung due to race on sbitmap_word::cleared block: avoid polling configuration errors block: Catch possible entries missing from rqf_name[] block: Simplify definition of RQF_NAME() block: Use enum to define RQF_x bit indexes block: Catch possible entries missing from cmd_flag_name[] block: Catch possible entries missing from alloc_policy_name[] block: Catch possible entries missing from hctx_flag_name[] block: Catch possible entries missing from hctx_state_name[] block: Catch possible entries missing from blk_queue_flag_name[] block: Make QUEUE_FLAG_x as an enum block: Relocate BLK_MQ_MAX_DEPTH block: Relocate BLK_MQ_CPU_WORK_BATCH block: remove QUEUE_FLAG_STOPPED block: Add missing entry to hctx_flag_name[] block: Add zone write plugging entry to rqf_name[] block: Add missing entries from cmd_flag_name[] s390/dasd: fix error checks in dasd_copy_pair_store() s390/dasd: add missing MODULE_DESCRIPTION() macros ...
2024-07-22Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linuxLinus Torvalds3-156/+153
Pull block integrity mapping updates from Jens Axboe: "A set of cleanups and fixes for the block integrity support. Sent separately from the main block changes from last week, as they depended on later fixes in the 6.10-rc cycle" * tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux: block: don't free the integrity payload in bio_integrity_unmap_free_user block: don't free submitter owned integrity payload on I/O completion block: call bio_integrity_unmap_free_user from blk_rq_unmap_user block: don't call bio_uninit from bio_endio block: also return bio_integrity_payload * from stubs block: split integrity support out of bio.h
2024-07-22Merge patch series "Add ACPI NUMA support for RISC-V"Palmer Dabbelt1-0/+6
Haibo Xu <haibo1.xu@intel.com> says: This patch series enable RISC-V ACPI NUMA support which was based on the recently approved ACPI ECR[1]. Patch 1/4 add RISC-V specific acpi_numa.c file to parse NUMA information from SRAT and SLIT ACPI tables. Patch 2/4 add the common SRAT RINTC affinity structure handler. Patch 3/4 change the ACPI_NUMA to a hidden option since it would be selected by default on all supported platform. Patch 4/4 replace pr_info with pr_debug in arch_acpi_numa_init() to avoid potential boot noise on ACPI platforms that are not NUMA. Based-on: https://github.com/linux-riscv/linux-riscv/tree/for-next [1] https://drive.google.com/file/d/1YTdDx2IPm5IeZjAW932EYU-tUtgS08tX/view?usp=sharing Testing: Since the ACPI AIA/PLIC support patch set is still under upstream review, hence it is tested using the poll based HVC SBI console and RAM disk. 1) Build latest Qemu with the following patch backported https://github.com/vlsunil/qemu/commit/42bd4eeefd5d4410a68f02d54fee406d8a1269b0 2) Build latest EDK-II https://github.com/tianocore/edk2/blob/master/OvmfPkg/RiscVVirt/README.md 3) Build Linux with the following configs enabled CONFIG_RISCV_SBI_V01=y CONFIG_SERIAL_EARLYCON_RISCV_SBI=y CONFIG_NONPORTABLE=y CONFIG_HVC_RISCV_SBI=y CONFIG_NUMA=y CONFIG_ACPI_NUMA=y 4) Build buildroot rootfs.cpio 5) Launch the Qemu machine qemu-system-riscv64 -nographic \ -machine virt,pflash0=pflash0,pflash1=pflash1 -smp 4 -m 8G \ -blockdev node-name=pflash0,driver=file,read-only=on,filename=RISCV_VIRT_CODE.fd \ -blockdev node-name=pflash1,driver=file,filename=RISCV_VIRT_VARS.fd \ -object memory-backend-ram,size=4G,id=m0 \ -object memory-backend-ram,size=4G,id=m1 \ -numa node,memdev=m0,cpus=0-1,nodeid=0 \ -numa node,memdev=m1,cpus=2-3,nodeid=1 \ -numa dist,src=0,dst=1,val=30 \ -kernel linux/arch/riscv/boot/Image \ -initrd buildroot/output/images/rootfs.cpio \ -append "root=/dev/ram ro console=hvc0 earlycon=sbi" [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000-0x17fffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x180000000-0x27fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x17fe3bc40-0x17fe3cfff] [ 0.000000] NUMA: NODE_DATA [mem 0x27fff4c40-0x27fff5fff] ... [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> HARTID 0x0 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> HARTID 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> HARTID 0x2 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> HARTID 0x3 -> Node 1 * b4-shazam-merge: ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ACPI: RISCV: Add NUMA support based on SRAT and SLIT Link: https://lore.kernel.org/r/cover.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22timers/migration: Move hierarchy setup into cpuhotplug prepare callbackAnna-Maria Behnsen1-0/+1
When a CPU comes online the first time, it is possible that a new top level group will be created. In general all propagation is done from the bottom to top. This minimizes complexity and prevents possible races. But when a new top level group is created, the formely top level group needs to be connected to the new level. This is the only time, when the direction to propagate changes is changed: the changes are propagated from top (new top level group) to bottom (formerly top level group). This introduces two races (see (A) and (B)) as reported by Frederic: (A) This race happens, when marking the formely top level group as active, but the last active CPU of the formerly top level group goes idle. Then it's likely that formerly group is no longer active, but marked nevertheless as active in new top level group: [GRP0:0] migrator = 0 active = 0 nextevt = KTIME_MAX / \ 0 1 .. 7 active idle 0) Hierarchy has for now only 8 CPUs and CPU 0 is the only active CPU. [GRP1:0] migrator = TMIGR_NONE active = NONE nextevt = KTIME_MAX \ [GRP0:0] [GRP0:1] migrator = 0 migrator = TMIGR_NONE active = 0 active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 active idle !online 1) CPU 8 is booting and creates a new group in first level GRP0:1 and therefore also a new top group GRP1:0. For now the setup code proceeded only until the connected between GRP0:1 to the new top group. The connection between CPU8 and GRP0:1 is not yet established and CPU 8 is still !online. [GRP1:0] migrator = TMIGR_NONE active = NONE nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0 migrator = TMIGR_NONE active = 0 active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 active idle !online 2) Setup code now connects GRP0:0 to GRP1:0 and observes while in tmigr_connect_child_parent() that GRP0:0 is not TMIGR_NONE. So it prepares to call tmigr_active_up() on it. It hasn't done it yet. [GRP1:0] migrator = TMIGR_NONE active = NONE nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = NONE active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 idle idle !online 3) CPU 0 goes idle. Since GRP0:0->parent has been updated by CPU 8 with GRP0:0->lock held, CPU 0 observes GRP1:0 after calling tmigr_update_events() and it propagates the change to the top (no change there and no wakeup programmed since there is no timer). [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = NONE active = NONE nextevt = KTIME_MAX nextevt = KTIME_MAX / \ 0 1 .. 7 8 idle idle !online 4) Now the setup code finally calls tmigr_active_up() to and sets GRP0:0 active in GRP1:0 [GRP1:0] migrator = GRP0:0 active = GRP0:0, GRP0:1 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = 8 active = NONE active = 8 nextevt = KTIME_MAX nextevt = KTIME_MAX / \ | 0 1 .. 7 8 idle idle active 5) Now CPU 8 is connected with GRP0:1 and CPU 8 calls tmigr_active_up() out of tmigr_cpu_online(). [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T8 / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = NONE active = NONE nextevt = KTIME_MAX nextevt = T8 / \ | 0 1 .. 7 8 idle idle idle 5) CPU 8 goes idle with a timer T8 and relies on GRP0:0 as the migrator. But it's not really active, so T8 gets ignored. --> The update which is done in third step is not noticed by setup code. So a wrong migrator is set to top level group and a timer could get ignored. (B) Reading group->parent and group->childmask when an hierarchy update is ongoing and reaches the formerly top level group is racy as those values could be inconsistent. (The notation of migrator and active now slightly changes in contrast to the above example, as now the childmasks are used.) [GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = 0x00 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 0 childmask= 1 parent = NULL parent = GRP1:0 / \ 0 1 .. 7 8 idle idle !online childmask=1 1) Hierarchy has 8 CPUs. CPU 8 is at the moment in the process of onlining but did not yet connect GRP0:0 to GRP1:0. [GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = TMIGR_NONE migrator = TMIGR_NONE active = 0x00 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 0 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 idle idle !online childmask=1 2) Setup code (running on CPU 8) now connects GRP0:0 to GRP1:0, updates parent pointer of GRP0:0 and ... [GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0x01 migrator = TMIGR_NONE active = 0x01 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 0 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 active idle !online childmask=1 tmigr_walk.childmask = 0 3) ... CPU 0 comes active in the same time. As migrator in GRP0:0 was TMIGR_NONE, childmask of GRP0:0 is stored in update propagation data structure tmigr_walk (as update of childmask is not yet visible/updated). And now ... [GRP1:0] migrator = TMIGR_NONE active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0x01 migrator = TMIGR_NONE active = 0x01 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 2 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 active idle !online childmask=1 tmigr_walk.childmask = 0 4) ... childmask of GRP0:0 is updated by CPU 8 (still part of setup code). [GRP1:0] migrator = 0x00 active = 0x00 nextevt = KTIME_MAX / \ [GRP0:0] [GRP0:1] migrator = 0x01 migrator = TMIGR_NONE active = 0x01 active = 0x00 nextevt = KTIME_MAX nextevt = KTIME_MAX childmask= 2 childmask= 1 parent = GRP1:0 parent = GRP1:0 / \ 0 1 .. 7 8 active idle !online childmask=1 tmigr_walk.childmask = 0 5) CPU 0 sees the connection to GRP1:0 and now propagates active state to GRP1:0 but with childmask = 0 as stored in propagation data structure. --> Now GRP1:0 always has a migrator as 0x00 != TMIGR_NONE and for all CPUs it looks like GRP1:0 is always active. To prevent those races, the setup of the hierarchy is moved into the cpuhotplug prepare callback. The prepare callback is not executed by the CPU which will come online, it is executed by the CPU which prepares onlining of the other CPU. This CPU is active while it is connecting the formerly top level to the new one. This prevents from (A) to happen and it also prevents from any further walk above the formerly top level until that active CPU becomes inactive, releasing the new ->parent and ->childmask updates to be visible by any subsequent walk up above the formerly top level hierarchy. This prevents from (B) to happen. The direction for the updates is now forced to look like "from bottom to top". However if the active CPU prevents from tmigr_cpu_(in)active() to walk up with the update not-or-half visible, nothing prevents walking up to the new top with a 0 childmask in tmigr_handle_remote_up() or tmigr_requires_handle_remote_up() if the active CPU doing the prepare is not the migrator. But then it looks fine because: * tmigr_check_migrator() should just return false * The migrator is active and should eventually observe the new childmask at some point in a future tick. Split setup functionality of online callback into the cpuhotplug prepare callback and setup hotplug state. Change init call into early_initcall() to make sure an already active CPU prepares everything for newly upcoming CPUs. Reorder the code, that all prepare related functions are close to each other and online and offline callbacks are also close together. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240717094940.18687-1-anna-maria@linutronix.de
2024-07-22ACPI: RISCV: Add NUMA support based on SRAT and SLITHaibo Xu1-0/+6
Add acpi_numa.c file to enable parse NUMA information from ACPI SRAT and SLIT tables. SRAT table provide CPUs(Hart) and memory nodes to proximity domain mapping, while SLIT table provide the distance metrics between proximity domains. Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/65dbad1fda08a32922c44886e4581e49b4a2fecc.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-21Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of ↵Linus Torvalds31-150/+247
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits) ia64: scrub ia64 from poison.h watchdog/perf: properly initialize the turbo mode timestamp and rearm counter tsacct: replace strncpy() with strscpy() lib/bch.c: use swap() to improve code test_bpf: convert comma to semicolon init/modpost: conditionally check section mismatch to __meminit* init: remove unused __MEMINIT* macros nilfs2: Constify struct kobj_type nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro math: rational: add missing MODULE_DESCRIPTION() macro lib/zlib: add missing MODULE_DESCRIPTION() macro fs: ufs: add MODULE_DESCRIPTION() lib/rbtree.c: fix the example typo ocfs2: add bounds checking to ocfs2_check_dir_entry() fs: add kernel-doc comments to ocfs2_prepare_orphan_dir() coredump: simplify zap_process() selftests/fpu: add missing MODULE_DESCRIPTION() macro compiler.h: simplify data_race() macro build-id: require program headers to be right after ELF header resource: add missing MODULE_DESCRIPTION() ...
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds43-456/+601
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-11/+76
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...