linux.git - Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Age	Commit message (Collapse)	Author	Files	Lines
2024-10-10	arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()	Fares Mehanna	1	-2/+4
	[ Upstream commit 7eced90b202d63cdc1b9b11b1353adb1389830f9 ] The reasons for PTEs in the kernel direct map to be marked invalid are not limited to kfence / debug pagealloc machinery. In particular, memfd_secret() also steals pages with set_direct_map_invalid_noflush(). When building the transitional page tables for kexec from the current kernel's page tables, those pages need to become regular writable pages, otherwise, if the relocation places kexec segments over such pages, a fault will occur during kexec, leading to host going dark during kexec. This patch addresses the kexec issue by marking any PTE as valid if it is not none. While this fixes the kexec crash, it does not address the security concern that if processes owning secret memory are not terminated before kexec, the secret content will be mapped in the new kernel without being scrubbed. Suggested-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Fares Mehanna <faresx@amazon.de> Link: https://lore.kernel.org/r/20240902163309.97113-1-faresx@amazon.de Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10	virt: sev-guest: Ensure the SNP guest messages do not exceed a page	Nikunj A Dadhania	1	-1/+1
	[ Upstream commit 2b9ac0b84c2cae91bbaceab62df4de6d503421ec ] Currently, struct snp_guest_msg includes a message header (96 bytes) and a payload (4000 bytes). There is an implicit assumption here that the SNP message header will always be 96 bytes, and with that assumption the payload array size has been set to 4000 bytes - a magic number. If any new member is added to the SNP message header, the SNP guest message will span more than a page. Instead of using a magic number for the payload, declare struct snp_guest_msg in a way that payload plus the message header do not exceed a page. [ bp: Massage. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240731150811.156771-5-nikunj@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10	crypto: simd - Do not call crypto_alloc_tfm during registration	Herbert Xu	2	-2/+2
	[ Upstream commit 3c44d31cb34ce4eb8311a2e73634d57702948230 ] Algorithm registration is usually carried out during module init, where as little work as possible should be carried out. The SIMD code violated this rule by allocating a tfm, this then triggers a full test of the algorithm which may dead-lock in certain cases. SIMD is only allocating the tfm to get at the alg object, which is in fact already available as it is what we are registering. Use that directly and remove the crypto_alloc_tfm call. Also remove some obsolete and unused SIMD API. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10	crypto: x86/sha256 - Add parentheses around macros' single arguments	Fangrui Song	1	-8/+8
	[ Upstream commit 3363c460ef726ba693704dbcd73b7e7214ccc788 ] The macros FOUR_ROUNDS_AND_SCHED and DO_4ROUNDS rely on an unexpected/undocumented behavior of the GNU assembler, which might change in the future (https://sourceware.org/bugzilla/show_bug.cgi?id=32073). M (1) (2) // 1 arg !? Future: 2 args M 1 + 2 // 1 arg !? Future: 3 args M 1 2 // 2 args Add parentheses around the single arguments to support future GNU assembler and LLVM integrated assembler (when the IsOperator hack from the following link is dropped). Link: https://github.com/llvm/llvm-project/commit/055006475e22014b28a070db1bff41ca15f322f0 Signed-off-by: Fangrui Song <maskray@google.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	s390/ftrace: Avoid calling unwinder in ftrace_return_address()	Vasily Gorbik	2	-20/+16
	commit a84dd0d8ae24bdc6da341187fc4c1a0adfce2ccc upstream. ftrace_return_address() is called extremely often from performance-critical code paths when debugging features like CONFIG_TRACE_IRQFLAGS are enabled. For example, with debug_defconfig, ftrace selftests on my LPAR currently execute ftrace_return_address() as follows: ftrace_return_address(0) - 0 times (common code uses __builtin_return_address(0) instead) ftrace_return_address(1) - 2,986,805,401 times (with this patch applied) ftrace_return_address(2) - 140 times ftrace_return_address(>2) - 0 times The use of __builtin_return_address(n) was replaced by return_address() with an unwinder call by commit cae74ba8c295 ("s390/ftrace: Use unwinder instead of __builtin_return_address()") because __builtin_return_address(n) simply walks the stack backchain and doesn't check for reaching the stack top. For shallow stacks with fewer than "n" frames, this results in reads at low addresses and random memory accesses. While calling the fully functional unwinder "works", it is very slow for this purpose. Moreover, potentially following stack switches and walking past IRQ context is simply wrong thing to do for ftrace_return_address(). Reimplement return_address() to essentially be __builtin_return_address(n) with checks for reaching the stack top. Since the ftrace_return_address(n) argument is always a constant, keep the implementation in the header, allowing both GCC and Clang to unroll the loop and optimize it to the bare minimum. Fixes: cae74ba8c295 ("s390/ftrace: Use unwinder instead of __builtin_return_address()") Cc: stable@vger.kernel.org Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	ARM: dts: imx6ull-seeed-npi: fix fsl,pins property in tscgrp pinctrl	Krzysztof Kozlowski	1	-6/+6
	commit 3dedd4889cfc2851444a1f7626b293c0bfd1e42c upstream. The property is "fsl,pins", not "fsl,pin". Wrong property means the pin configuration was not applied. Fixes dtbs_check warnings: imx6ull-seeed-npi-dev-board-emmc.dtb: pinctrl@20e0000: uart1grp: 'fsl,pins' is a required property imx6ull-seeed-npi-dev-board-emmc.dtb: pinctrl@20e0000: uart1grp: 'fsl,pin' does not match any of the regexes: 'pinctrl-[0-9]+' Cc: stable@vger.kernel.org Fixes: e3b5697195c8 ("ARM: dts: imx6ull: add seeed studio NPi dev board") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Parthiban Nallathambi <parthiban@linumiz.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	ARM: dts: imx6ul-geam: fix fsl,pins property in tscgrp pinctrl	Krzysztof Kozlowski	1	-1/+1
	commit 1b0e32753d8550908dff8982410357b5114be78c upstream. The property is "fsl,pins", not "fsl,pin". Wrong property means the pin configuration was not applied. Fixes dtbs_check warnings: imx6ul-geam.dtb: pinctrl@20e0000: tscgrp: 'fsl,pins' is a required property imx6ul-geam.dtb: pinctrl@20e0000: tscgrp: 'fsl,pin' does not match any of the regexes: 'pinctrl-[0-9]+' Cc: stable@vger.kernel.org Fixes: a58e4e608bc8 ("ARM: dts: imx6ul-geam: Add Engicam IMX6UL GEA M6UL initial support") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: dts: rockchip: Correct the Pinebook Pro battery design capacity	Dragan Simic	1	-1/+1
	commit def33fb1191207f5afa6dcb681d71fef2a6c1293 upstream. All batches of the Pine64 Pinebook Pro, except the latest batch (as of 2024) whose hardware design was revised due to the component shortage, use a 1S lithium battery whose nominal/design capacity is 10,000 mAh, according to the battery datasheet. [1][2] Let's correct the design full-charge value in the Pinebook Pro board dts, to improve the accuracy of the hardware description, and to hopefully improve the accuracy of the fuel gauge a bit on all units that don't belong to the latest batch. The above-mentioned latest batch uses a different 1S lithium battery with a slightly lower capacity, more precisely 9,600 mAh. To make the fuel gauge work reliably on the latest batch, a sample battery would need to be sent to CellWise, to obtain its proprietary battery profile, whose data goes into "cellwise,battery-profile" in the Pinebook Pro board dts. Without that data, the fuel gauge reportedly works unreliably, so changing the design capacity won't have any negative effects on the already unreliable operation of the fuel gauge in the Pinebook Pros that belong to the latest batch. According to the battery datasheet, its voltage can go as low as 2.75 V while discharging, but it's better to leave the current 3.0 V value in the dts file, because of the associated Pinebook Pro's voltage regulation issues. [1] https://wiki.pine64.org/index.php/Pinebook_Pro#Battery [2] https://files.pine64.org/doc/datasheet/pinebook/40110175P%203.8V%2010000mAh%E8%A7%84%E6%A0%BC%E4%B9%A6-14.pdf Fixes: c7c4d698cd28 ("arm64: dts: rockchip: add fuel gauge to Pinebook Pro dts") Cc: stable@vger.kernel.org Cc: Marek Kraus <gamiee@pine64.org> Signed-off-by: Dragan Simic <dsimic@manjaro.org> Link: https://lore.kernel.org/r/731f8ef9b1a867bcc730d19ed277c8c0534c0842.1721065172.git.dsimic@manjaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: dts: qcom: sa8775p: Mark APPS and PCIe SMMUs as DMA coherent	Qingqing Zhou	1	-0/+2
	commit 421688265d7f5d3ff4211982e7231765378bb64f upstream. The SMMUs on sa8775p are cache-coherent. GPU SMMU is marked as such, mark the APPS and PCIe ones as well. Fixes: 603f96d4c9d0 ("arm64: dts: qcom: add initial support for qcom sa8775p-ride") Fixes: 2dba7a613a6e ("arm64: dts: qcom: sa8775p: add the pcie smmu node") Cc: stable@vger.kernel.org Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Qingqing Zhou <quic_qqzhou@quicinc.com> Rule: add Link: https://lore.kernel.org/stable/20240723075948.9545-1-quic_qqzhou%40quicinc.com Link: https://lore.kernel.org/r/20240725072117.22425-1-quic_qqzhou@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: dts: rockchip: Raise Pinebook Pro's panel backlight PWM frequency	Dragan Simic	1	-1/+1
	commit 8c51521de18755d4112a77a598a348b38d0af370 upstream. Increase the frequency of the PWM signal that drives the LED backlight of the Pinebook Pro's panel, from about 1.35 KHz (which equals to the PWM period of 740,740 ns), to exactly 8 kHz (which equals to the PWM period of 125,000 ns). Using a higher PWM frequency for the panel backlight, which reduces the flicker, can only be beneficial to the end users' eyes. On top of that, increasing the backlight PWM signal frequency reportedly eliminates the buzzing emitted from the Pinebook Pro's built-in speakers when certain backlight levels are set, which cause some weird interference with some of the components of the Pinebook Pro's audio chain. The old value for the backlight PWM period, i.e. 740,740 ns, is pretty much an arbitrary value that was selected during the very early bring-up of the Pinebook Pro, only because that value seemed to minimize horizontal line distortion on the display, which resulted from the old X.org drivers causing screen tearing when dragging windows around. That's no longer an issue, so there are no reasons to stick with the old PWM period value. The lower and the upper backlight PWM frequency limits for the Pinebook Pro's panel, according to its datasheet, are 200 Hz and 10 kHz, respectively. [1] These changes still leave some headroom, which may have some positive effects on the lifetime expectancy of the panel's backlight LEDs. [1] https://files.pine64.org/doc/datasheet/PinebookPro/NV140FHM-N49_Rev.P0_20160804_201710235838.pdf Fixes: 5a65505a6988 ("arm64: dts: rockchip: Add initial support for Pinebook Pro") Cc: stable@vger.kernel.org Reported-by: Nikola Radojevic <nikola@radojevic.rs> Signed-off-by: Dragan Simic <dsimic@manjaro.org> Tested-by: Nikola Radojević <nikola@radojevic.rs> Link: https://lore.kernel.org/r/2a23b6cfd8c0513e5b233b4006ee3d3ed09b824f.1722805655.git.dsimic@manjaro.org Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: dts: mediatek: mt8186-corsola: Disable DPI display interface	Chen-Yu Tsai	1	-1/+2
	commit 3079fb09ddac159bd8bb87f6f15b924e265f8d4d upstream. The DPI display interface feeds the external display pipeline. However the pipeline representation is currently incomplete. Efforts are still under way to come up with a way to represent the "creative" repurposing of the DP bridge chip's internal output mux, which is meant to support USB type-C orientation changes, to output to one of two type-C ports. Until that is finalized, the external display can't be fully described, and thus won't work. Even worse, the half complete graph potentially confuses the OS, breaking the internal display as well. Disable the external display interface across the whole Corsola family until the DP / USB Type-C muxing graph binding is ready. Reported-by: Alper Nebi Yasak <alpernebiyasak@gmail.com> Closes: https://lore.kernel.org/linux-mediatek/38a703a9-6efb-456a-a248-1dd3687e526d@gmail.com/ Fixes: 8855d01fb81f ("arm64: dts: mediatek: Add MT8186 Krabby platform based Tentacruel / Tentacool") Cc: <stable@vger.kernel.org> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Tested-by: Alper Nebi Yasak <alpernebiyasak@gmail.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20240821042836.2631815-1-wenst@chromium.org Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: errata: Enable the AC03_CPU_38 workaround for ampere1a	D Scott Phillips	3	-2/+12
	commit db0d8a84348b876df7c4276f0cbce5df3b769f5f upstream. The ampere1a cpu is affected by erratum AC04_CPU_10 which is the same bug as AC03_CPU_38. Add ampere1a to the AC03_CPU_38 workaround midr list. Cc: <stable@vger.kernel.org> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827211701.2216719-1-scott@os.amperecomputing.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: esr: Define ESR_ELx_EC_* constants as UL	Anastasia Belova	1	-44/+44
	commit b6db3eb6c373b97d9e433530d748590421bbeea7 upstream. Add explicit casting to prevent expantion of 32th bit of u32 into highest half of u64 in several places. For example, in inject_abt64: ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT = 0x24 << 26. This operation's result is int with 1 in 32th bit. While casting this value into u64 (esr is u64) 1 fills 32 highest bits. Found by Linux Verification Center (linuxtesting.org) with SVACE. Cc: <stable@vger.kernel.org> Fixes: aa8eff9bfbd5 ("arm64: KVM: fault injection into a guest") Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/stable/20240910085016.32120-1-abelova%40astralinux.ru Link: https://lore.kernel.org/r/20240910085016.32120-1-abelova@astralinux.ru Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	perf/x86/intel/pt: Fix sampling synchronization	Adrian Hunter	1	-8/+7
	commit d92792a4b26e50b96ab734cbe203d8a4c932a7a9 upstream. pt_event_snapshot_aux() uses pt->handle_nmi to determine if tracing needs to be stopped, however tracing can still be going because pt->handle_nmi is set to zero before tracing is stopped in pt_event_stop, whereas pt_event_snapshot_aux() requires that tracing must be stopped in order to copy a sample of trace from the buffer. Instead call pt_config_stop() always, which anyway checks config for RTIT_CTL_TRACEEN and does nothing if it is already clear. Note pt_event_snapshot_aux() can continue to use pt->handle_nmi to determine if the trace needs to be restarted afterwards. Fixes: 25e8920b301c ("perf/x86/intel/pt: Add sampling support") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240715160712.127117-2-adrian.hunter@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	perf/x86/intel: Allow to setup LBR for counting event for BPF	Kan Liang	1	-2/+6
	commit ef493f4b122d6b14a6de111d1acac1eab1d673b0 upstream. The BPF subsystem may capture LBR data on a counting event. However, the current implementation assumes that LBR can/should only be used with sampling events. For instance, retsnoop tool ([0]) makes an extensive use of this functionality and sets up perf event as follows: struct perf_event_attr attr; memset(&attr, 0, sizeof(attr)); attr.size = sizeof(attr); attr.type = PERF_TYPE_HARDWARE; attr.config = PERF_COUNT_HW_CPU_CYCLES; attr.sample_type = PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_KERNEL; To limit the LBR for a sampling event is to avoid unnecessary branch stack setup for a counting event in the sample read. Because LBR is only read in the sampling event's overflow. Although in most cases LBR is used in sampling, there is no HW limit to bind LBR to the sampling mode. Allow an LBR setup for a counting event unless in the sample read mode. Fixes: 85846b27072d ("perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag") Closes: https://lore.kernel.org/lkml/20240905180055.1221620-1-andrii@kernel.org/ Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240909155848.326640-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	x86/entry: Remove unwanted instrumentation in common_interrupt()	Dmitry Vyukov	2	-5/+9
	commit 477d81a1c47a1b79b9c08fc92b5dea3c5143800b upstream. common_interrupt() and related variants call kvm_set_cpu_l1tf_flush_l1d(), which is neither marked noinstr nor __always_inline. So compiler puts it out of line and adds instrumentation to it. Since the call is inside of instrumentation_begin/end(), objtool does not warn about it. The manifestation is that KCOV produces spurious coverage in kvm_set_cpu_l1tf_flush_l1d() in random places because the call happens when preempt count is not yet updated to say that the kernel is in an interrupt. Mark kvm_set_cpu_l1tf_flush_l1d() as __always_inline and move it out of the instrumentation_begin/end() section. It only calls __this_cpu_write() which is already safe to call in noinstr contexts. Fixes: 6368558c3710 ("x86/entry: Provide IDTENTRY_SYSVEC") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/3f9a1de9e415fcb53d07dc9e19fa8481bb021b1b.1718092070.git.dvyukov@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: dts: mediatek: mt8395-nio-12l: Mark USB 3.0 on xhci1 as disabled	Chen-Yu Tsai	1	-0/+1
	commit be985531a5dd9ca50fc9f3f85b8adeb2a4a75a58 upstream. USB 3.0 on xhci1 is not used, as the controller shares the same PHY as pcie1. The latter is enabled to support the M.2 PCIe WLAN card on this design. Mark USB 3.0 as disabled on this controller using the "mediatek,u3p-dis-msk" property. Fixes: 96564b1e2ea4 ("arm64: dts: mediatek: Introduce the MT8395 Radxa NIO 12L board") Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/r/20240731034411.371178-3-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	arm64: dts: mediatek: mt8195-cherry: Mark USB 3.0 on xhci1 as disabled	Chen-Yu Tsai	1	-0/+1
	commit 09d385679487c58f0859c1ad4f404ba3df2f8830 upstream. USB 3.0 on xhci1 is not used, as the controller shares the same PHY as pcie1. The latter is enabled to support the M.2 PCIe WLAN card on this design. Mark USB 3.0 as disabled on this controller using the "mediatek,u3p-dis-msk" property. Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI Closes: https://lore.kernel.org/all/9fce9838-ef87-4d1b-b3df-63e1ddb0ec51@notapiano/ Fixes: b6267a396e1c ("arm64: dts: mediatek: cherry: Enable T-PHYs and USB XHCI controllers") Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/r/20240731034411.371178-2-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	powerpc/atomic: Use YZ constraints for DS-form instructions	Michael Ellerman	3	-8/+10
	commit 39190ac7cff1fd15135fa8e658030d9646fdb5f2 upstream. The 'ld' and 'std' instructions require a 4-byte aligned displacement because they are DS-form instructions. But the "m" asm constraint doesn't enforce that. That can lead to build errors if the compiler chooses a non-aligned displacement, as seen with GCC 14: /tmp/ccuSzwiR.s: Assembler messages: /tmp/ccuSzwiR.s:2579: Error: operand out of domain (39 is not a multiple of 4) make[5]: *** [scripts/Makefile.build:229: net/core/page_pool.o] Error 1 Dumping the generated assembler shows: ld 8,39(8) # MEM[(const struct atomic64_t *)_29].counter, t Use the YZ constraints to tell the compiler either to generate a DS-form displacement, or use an X-form instruction, either of which prevents the build error. See commit 2d43cc701b96 ("powerpc/uaccess: Fix build errors seen with GCC 13/14") for more details on the constraint letters. Fixes: 9f0cbea0d8cc ("[POWERPC] Implement atomic{, 64}_{read, write}() without volatile") Cc: stable@vger.kernel.org # v2.6.24+ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20240913125302.0a06b4c7@canb.auug.org.au Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240916120510.2017749-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	x86/tdx: Fix "in-kernel MMIO" check	Alexey Gladkov (Intel)	1	-0/+6
	commit d4fc4d01471528da8a9797a065982e05090e1d81 upstream. TDX only supports kernel-initiated MMIO operations. The handle_mmio() function checks if the #VE exception occurred in the kernel and rejects the operation if it did not. However, userspace can deceive the kernel into performing MMIO on its behalf. For example, if userspace can point a syscall to an MMIO address, syscall does get_user() or put_user() on it, triggering MMIO #VE. The kernel will treat the #VE as in-kernel MMIO. Ensure that the target MMIO address is within the kernel before decoding instruction. Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO") Signed-off-by: Alexey Gladkov (Intel) <legion@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.1726237595.git.legion%40kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)	Sean Christopherson	4	-12/+36
	commit 73b42dc69be8564d4951a14d00f827929fe5ef79 upstream. Re-introduce the "split" x2APIC ICR storage that KVM used prior to Intel's IPI virtualization support, but only for AMD. While not stated anywhere in the APM, despite stating the ICR is a single 64-bit register, AMD CPUs store the 64-bit ICR as two separate 32-bit values in ICR and ICR2. When IPI virtualization (IPIv on Intel, all AVIC flavors on AMD) is enabled, KVM needs to match CPU behavior as some ICR ICR writes will be handled by the CPU, not by KVM. Add a kvm_x86_ops knob to control the underlying format used by the CPU to store the x2APIC ICR, and tune it to AMD vs. Intel regardless of whether or not x2AVIC is enabled. If KVM is handling all ICR writes, the storage format for x2APIC mode doesn't matter, and having the behavior follow AMD versus Intel will provide better test coverage and ease debugging. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20240719235107.3023592-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()	Sean Christopherson	1	-23/+23
	commit d33234342f8b468e719e05649fd26549fb37ef8a upstream. Hoist kvm_x2apic_icr_write() above kvm_apic_write_nodecode() so that a local helper to _read_ the x2APIC ICR can be added and used in the nodecode path without needing a forward declaration. No functional change intended. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240719235107.3023592-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits	Sean Christopherson	1	-1/+14
	commit 71bf395a276f0578d19e0ae137a7d1d816d08e0e upstream. Inject a #GP on a WRMSR(ICR) that attempts to set any reserved bits that are must-be-zero on both Intel and AMD, i.e. any reserved bits other than the BUSY bit, which Intel ignores and basically says is undefined. KVM's xapic_state_test selftest has been fudging the bug since commit 4b88b1a518b3 ("KVM: selftests: Enhance handling WRMSR ICR register in x2APIC mode"), which essentially removed the testcase instead of fixing the bug. WARN if the nodecode path triggers a #GP, as the CPU is supposed to check reserved bits for ICR when it's partially virtualized. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240719235107.3023592-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer	Snehal Koukuntla	1	-6/+15
	commit f26a525b77e040d584e967369af1e018d2d59112 upstream. When we share memory through FF-A and the description of the buffers exceeds the size of the mapped buffer, the fragmentation API is used. The fragmentation API allows specifying chunks of descriptors in subsequent FF-A fragment calls and no upper limit has been established for this. The entire memory region transferred is identified by a handle which can be used to reclaim the transferred memory. To be able to reclaim the memory, the description of the buffers has to fit in the ffa_desc_buf. Add a bounds check on the FF-A sharing path to prevent the memory reclaim from failing. Also do_ffa_mem_xfer() does not need __always_inline, except for the BUILD_BUG_ON() aspect, which gets moved to a macro. [maz: fixed the BUILD_BUG_ON() breakage with LLVM, thanks to Wei-Lin Chang for the timely report] Fixes: 634d90cf0ac65 ("KVM: arm64: Handle FFA_MEM_LEND calls from the host") Cc: stable@vger.kernel.org Reviewed-by: Sebastian Ene <sebastianene@google.com> Signed-off-by: Snehal Koukuntla <snehalreddy@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240909180154.3267939-1-snehalreddy@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	xen: allow mapping ACPI data using a different physical address	Juergen Gross	8	-1/+59
	commit 9221222c717dbddac1e3c49906525475d87a3a44 upstream. When running as a Xen PV dom0 the system needs to map ACPI data of the host using host physical addresses, while those addresses can conflict with the guest physical addresses of the loaded linux kernel. The same problem might apply in case a PV guest is configured to use the host memory map. This conflict can be solved by mapping the ACPI data to a different guest physical address, but mapping the data via acpi_os_ioremap() must still be possible using the host physical address, as this address might be generated by AML when referencing some of the ACPI data. When configured to support running as a Xen PV domain, have an implementation of acpi_os_ioremap() being aware of the possibility to need above mentioned translation of a host physical address to the guest physical address. This modification requires to #include linux/acpi.h in some sources which need to include asm/acpi.h directly. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	xen: move checks for e820 conflicts further up	Juergen Gross	1	-22/+22
	commit c4498ae316da5b5786ccd448fc555f3339b8e4ca upstream. Move the checks for e820 memory map conflicts using the xen_chk_is_e820_usable() helper further up in order to prepare resolving some of the possible conflicts by doing some e820 map modifications, which must happen before evaluating the RAM layout. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04	um: remove ARCH_NO_PREEMPT_DYNAMIC	Johannes Berg	1	-1/+0
	[ Upstream commit 64dcf0b8779363ca07dfb5649a4cc71f9fdf390b ] There's no such symbol and we currently don't have any of the mechanisms to make boot-time selection cheap enough, so we can't have HAVE_PREEMPT_DYNAMIC_CALL or HAVE_PREEMPT_DYNAMIC_KEY. Remove the select statement. Reported-by: Lukas Bulwahn <lbulwahn@redhat.com> Fixes: cd01672d64a3 ("um: Enable preemption in UML") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	ep93xx: clock: Fix off by one in ep93xx_div_recalc_rate()	Dan Carpenter	1	-1/+1
	[ Upstream commit c7f06284a6427475e3df742215535ec3f6cd9662 ] The psc->div[] array has psc->num_div elements. These values come from when we call clk_hw_register_div(). It's adc_divisors and ARRAY_SIZE(adc_divisors)) and so on. So this condition needs to be >= instead of > to prevent an out of bounds read. Fixes: 9645ccc7bd7a ("ep93xx: clock: convert in-place to COMMON_CLK") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Reviewed-by: Nikita Shubin <nikita.shubin@maquefel.me> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Link: https://lore.kernel.org/r/1caf01ad4c0a8069535813c26c7f0b8ea011155e.camel@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	crypto: powerpc/p10-aes-gcm - Disable CRYPTO_AES_GCM_P10	Danny Tsen	1	-0/+1
	[ Upstream commit 44ac4625ea002deecd0c227336c95b724206c698 ] Data mismatch found when testing ipsec tunnel with AES/GCM crypto. Disabling CRYPTO_AES_GCM_P10 in Kconfig for this feature. Fixes: fd0e9b3e2ee6 ("crypto: p10-aes-gcm - An accelerated AES/GCM stitched implementation") Fixes: cdcecfd9991f ("crypto: p10-aes-gcm - Glue code for AES/GCM stitched implementation") Fixes: 45a4672b9a6e2 ("crypto: p10-aes-gcm - Update Kconfig and Makefile") Signed-off-by: Danny Tsen <dtsen@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	riscv: Fix fp alignment bug in perf_callchain_user()	Jinjie Ruan	1	-1/+1
	[ Upstream commit 22ab08955ea13be04a8efd20cc30890e0afaa49c ] The standard RISC-V calling convention said: "The stack grows downward and the stack pointer is always kept 16-byte aligned". So perf_callchain_user() should check whether 16-byte aligned for fp. Link: https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf Fixes: dbeb90b0c1eb ("riscv: Add perf callchain support") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240708032847.2998158-2-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	x86/PCI: Check pcie_find_root_port() return for NULL	Samasth Norway Ananda	1	-2/+2
	[ Upstream commit dbc3171194403d0d40e4bdeae666f6e76e428b53 ] If pcie_find_root_port() is unable to locate a Root Port, it will return NULL. Check the pointer for NULL before dereferencing it. This particular case is in a quirk for devices that are always below a Root Port, so this won't avoid a problem and doesn't need to be backported, but check as a matter of style and to prevent copy/paste mistakes. Link: https://lore.kernel.org/r/20240812202659.1649121-1-samasth.norway.ananda@oracle.com Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> [bhelgaas: drop Fixes: and explain why there's no problem in this case] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	s390/entry: Make early program check handler relocated lowcore aware	Heiko Carstens	2	-7/+11
	[ Upstream commit f101b305a7b9513a8042a2cf09018de4ff371af2 ] Add the missing pieces so the early program check handler also works with a relocated lowcore. Right now the result of an early program check in case of a relocated lowcore would be a program check loop. Fixes: 8f1e70adb1a3 ("s390/boot: Add cmdline option to relocate lowcore") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	s390/entry: Move early program check handler to entry.S	Heiko Carstens	3	-24/+16
	[ Upstream commit f2bb5b97b51ce5425e8f59f899643ce4eadba667 ] Have all program check handlers in one file to make future changes easy. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Stable-dep-of: f101b305a7b9 ("s390/entry: Make early program check handler relocated lowcore aware") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	bpf, arm64: Fix tailcall hierarchy	Leon Hwang	1	-16/+41
	[ Upstream commit 66ff4d61dc124eafe9efaeaef696a09b7f236da2 ] This patch fixes a tailcall issue caused by abusing the tailcall in bpf2bpf feature on arm64 like the way of "bpf, x64: Fix tailcall hierarchy". On arm64, when a tail call happens, it uses tail_call_cnt_ptr to increment tail_call_cnt, too. At the prologue of main prog, it has to initialize tail_call_cnt and prepare tail_call_cnt_ptr. At the prologue of subprog, it pushes x26 register twice, and does not initialize tail_call_cnt. At the epilogue, it pops x26 twice, no matter whether it is main prog or subprog. Fixes: d4609a5d8c70 ("bpf, arm64: Keep tail call count across bpf2bpf calls") Acked-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20240714123902.32305-3-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	bpf, x64: Fix tailcall hierarchy	Leon Hwang	1	-28/+79
	[ Upstream commit 116e04ba1459fc08f80cf27b8c9f9f188be0fcb2 ] This patch fixes a tailcall issue caused by abusing the tailcall in bpf2bpf feature. As we know, tail_call_cnt propagates by rax from caller to callee when to call subprog in tailcall context. But, like the following example, MAX_TAIL_CALL_CNT won't work because of missing tail_call_cnt back-propagation from callee to caller. \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail1(struct __sk_buff skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } static __noinline int subprog_tail2(struct __sk_buff skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("tc") int entry(struct __sk_buff skb) { volatile int ret = 1; count++; subprog_tail1(skb); subprog_tail2(skb); return ret; } char __license[] SEC("license") = "GPL"; At run time, the tail_call_cnt in entry() will be propagated to subprog_tail1() and subprog_tail2(). But, when the tail_call_cnt in subprog_tail1() updates when bpf_tail_call_static(), the tail_call_cnt in entry() won't be updated at the same time. As a result, in entry(), when tail_call_cnt in entry() is less than MAX_TAIL_CALL_CNT and subprog_tail1() returns because of MAX_TAIL_CALL_CNT limit, bpf_tail_call_static() in suprog_tail2() is able to run because the tail_call_cnt in subprog_tail2() propagated from entry() is less than MAX_TAIL_CALL_CNT. So, how many tailcalls are there for this case if no error happens? From top-down view, does it look like hierarchy layer and layer? With this view, there will be 2+4+8+...+2^33 = 2^34 - 2 = 17,179,869,182 tailcalls for this case. How about there are N subprog_tail() in entry()? There will be almost N^34 tailcalls. Then, in this patch, it resolves this case on x86_64. In stead of propagating tail_call_cnt from caller to callee, it propagates its pointer, tail_call_cnt_ptr, tcc_ptr for short. However, where does it store tail_call_cnt? It stores tail_call_cnt on the stack of main prog. When tail call happens in subprog, it increments tail_call_cnt by tcc_ptr. Meanwhile, it stores tail_call_cnt_ptr on the stack of main prog, too. And, before jump to tail callee, it has to pop tail_call_cnt and tail_call_cnt_ptr. Then, at the prologue of subprog, it must not make rax as tail_call_cnt_ptr again. It has to reuse tail_call_cnt_ptr from caller. As a result, at run time, it has to recognize rax is tail_call_cnt or tail_call_cnt_ptr at prologue by: 1. rax is tail_call_cnt if rax is <= MAX_TAIL_CALL_CNT. 2. rax is tail_call_cnt_ptr if rax is > MAX_TAIL_CALL_CNT, because a pointer won't be <= MAX_TAIL_CALL_CNT. Here's an example to dump JITed. struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("tc") int entry(struct __sk_buff skb) { int ret = 1; count++; subprog_tail(skb); subprog_tail(skb); return ret; } When bpftool p d j id 42: int entry(struct __sk_buff skb): bpf_prog_0c0f4c2413ef19b1_entry: ; int entry(struct __sk_buff skb) 0: endbr64 4: nopl (%rax,%rax) 9: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 \| 1b: movq %rsp, %rax ;; rax = rbp - 8 \| 1e: jmp 0x21 ;; ---------+ \| 20: pushq %rax ;; <--------\|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 26: movabsq $-82417199407104, %rdi 30: movl (%rdi), %esi 33: addl $1, %esi 36: movl %esi, (%rdi) ; subprog_tail(skb); 39: movq %rbx, %rdi ;; rdi = skb 3c: movq -16(%rbp), %rax ;; rax = tcc_ptr 43: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 48: movq %rbx, %rdi ;; rdi = skb 4b: movq -16(%rbp), %rax ;; rax = tcc_ptr 52: callq 0x80 ;; call subprog_tail() ; return ret; 57: movl $1, %eax 5c: popq %rbx 5d: leave 5e: retq int subprog_tail(struct __sk_buff skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff skb) 0: endbr64 4: nopl (%rax,%rax) 9: nopl (%rax) ;; do not touch tail_call_cnt c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-105487587488768, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if tcc_ptr >= 33 goto 0x4e --------+ 3b: jmp 0x4e ;; jmp bypass, toggled by poking \| 40: addq $1, (%rax) ;; (*tcc_ptr)++ \| 44: popq %r13 ;; callee saved \| 46: popq %rbx ;; callee saved \| 47: popq %rax ;; undo rbp-16 push \| 48: popq %rax ;; undo rbp-8 push \| 49: nopl (%rax,%rax) ;; tail call target, toggled by poking \| ; return 0; ;; \| 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq Furthermore, when trampoline is the caller of bpf prog, which is tail_call_reachable, it is required to propagate rax through trampoline. Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20240714123902.32305-2-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	xen: tolerate ACPI NVS memory overlapping with Xen allocated memory	Juergen Gross	1	-1/+91
	[ Upstream commit be35d91c8880650404f3bf813573222dfb106935 ] In order to minimize required special handling for running as Xen PV dom0, the memory layout is modified to match that of the host. This requires to have only RAM at the locations where Xen allocated memory is living. Unfortunately there seem to be some machines, where ACPI NVS is located at 64 MB, resulting in a conflict with the loaded kernel or the initial page tables built by Xen. Avoid this conflict by swapping the ACPI NVS area in the memory map with unused RAM. This is possible via modification of the dom0 P2M map. Accesses to the ACPI NVS area are done either for saving and restoring it across suspend operations (this will work the same way as before), or by ACPI code when NVS memory is referenced from other ACPI tables. The latter case is handled by a Xen specific indirection of acpi_os_ioremap(). While the E820 map can (and should) be modified right away, the P2M map can be updated only after memory allocation is working, as the P2M map might need to be extended. Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout") Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	xen: add capability to remap non-RAM pages to different PFNs	Juergen Gross	2	-0/+66
	[ Upstream commit d05208cf7f05420ad10cc7f9550f91d485523659 ] When running as a Xen PV dom0 it can happen that the kernel is being loaded to a guest physical address conflicting with the host memory map. In order to be able to resolve this conflict, add the capability to remap non-RAM areas to different guest PFNs. A function to use this remapping information for other purposes than doing the remap will be added when needed. As the number of conflicts should be rather low (currently only machines with max. 1 conflict are known), save the remap data in a small statically allocated array. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: be35d91c8880 ("xen: tolerate ACPI NVS memory overlapping with Xen allocated memory") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04	xen: move max_pfn in xen_memory_setup() out of function scope	Juergen Gross	1	-26/+26
	[ Upstream commit 43dc2a0f479b9cd30f6674986d7a40517e999d31 ] Instead of having max_pfn as a local variable of xen_memory_setup(), make it a static variable in setup.c instead. This avoids having to pass it to subfunctions, which will be needed in more cases in future. Rename it to ini_nr_pages, as the value denotes the currently usable number of memory pages as passed from the hypervisor at boot time. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: be35d91c8880 (