linux.git - Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Age	Commit message (Collapse)	Author	Files	Lines
2022-01-27	clocksource: Reduce clocksource-skew threshold	Paul E. McKenney	1	-0/+1
	[ Upstream commit 2e27e793e280ff12cb5c202a1214c08b0d3a0f26 ] Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed by more than 12.5% in order to be marked unstable. Except that a clock that is skewed by that much is probably destroying unsuspecting software right and left. And given that there are now checks for false-positive skews due to delays between reading the two clocks, it should be possible to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks such as TSC. Therefore, add a new uncertainty_margin field to the clocksource structure that contains the maximum uncertainty in nanoseconds for the corresponding clock. This field may be initialized manually, as it is for clocksource_tsc_early and clocksource_jiffies, which is copied to refined_jiffies. If the field is not initialized manually, it will be computed at clock-registry time as the period of the clock in question based on the scale and freq parameters to __clocksource_update_freq_scale() function. If either of those two parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative to dividing by zero. No matter how the uncertainty_margin field is calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100 microseconds. Note that manually initialized uncertainty_margin fields are not adjusted, but there is a WARN_ON_ONCE() that triggers if any such field is less than twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage production use of the one-nanosecond uncertainty_margin values that are used to test the clock-skew code itself. The actual clock-skew check uses the sum of the uncertainty_margin fields of the two clocksource structures being compared. Integer overflow is avoided because the largest computed value of the uncertainty_margin fields is one billion (10^9), and double that value fits into an unsigned int. However, if someone manually specifies (say) UINT_MAX, they will get what they deserve. Note that the refined_jiffies uncertainty_margin field is initialized to TICK_NSEC, which means that skew checks involving this clocksource will be sufficently forgiving. In a similar vein, the clocksource_tsc_early uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which replicates the current behavior and allows custom setting if needed in order to address the rare skews detected for this clocksource in current mainline. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/32s: Fix shift-out-of-bounds in KASAN init	Christophe Leroy	1	-1/+2
	[ Upstream commit af11dee4361b3519981fa04d014873f9d9edd6ac ] ================================================================================ UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c [c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138 [c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c [c214bf30] [c1c0ed84] kasan_init+0xc0/0x198 [c214bf70] [c1c08024] setup_arch+0x18/0x54c [c214bfc0] [c1c037f0] start_kernel+0x90/0x33c [c214bff0] [00003610] 0x3610 ================================================================================ This happens when the directly mapped memory is a power of 2. Fix it by checking the shift and set the result to 0 when shift is -1 Fixes: 7974c4732642 ("powerpc/32s: Implement dedicated kasan_init_region()") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169 Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an ↵	Athira Rajeev	2	-1/+97
	overflown PMC [ Upstream commit 2c9ac51b850d84ee496b0a5d832ce66d411ae552 ] Running perf fuzzer showed below in dmesg logs: "Can't find PMC that caused IRQ" This means a PMU exception happened, but none of the PMC's (Performance Monitor Counter) were found to be overflown. There are some corner cases that clears the PMCs after PMI gets masked. In such cases, the perf interrupt handler will not find the active PMC values that had caused the overflow and thus leads to this message while replaying. Case 1: PMU Interrupt happens during replay of other interrupts and counter values gets cleared by PMU callbacks before replay: During replay of interrupts like timer, __do_irq() and doorbell exception, we conditionally enable interrupts via may_hard_irq_enable(). This could potentially create a window to generate a PMI. Since irq soft mask is set to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before perf interrupt is replayed and the PMU events could be deleted or stopped. This will change the PMU SPR values and resets the counters. Snippet of ftrace log showing PMU callbacks invoked in __do_irq(): <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq <<>> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable <<>> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts Case 2: PMI's masked during local_* operations, example local_add(). If the local_add() operation happens within a local_irq_save(), replay of PMI will be during local_irq_restore(). Similar to case 1, this could also create a window before replay where PMU events gets deleted or stopped. Fix it by updating the PMU callback function power_pmu_disable() to check for pending perf interrupt. If there is an overflown PMC and pending perf interrupt indicated in paca, clear the PMI bit in paca to drop that sample. Clearing of PMI bit is done in power_pmu_disable() since disable is invoked before any event gets deleted/stopped. With this fix, if there are more than one event running in the PMU, there is a chance that we clear the PMI bit for the event which is not getting deleted/stopped. The other events may still remain active. Hence to make sure we don't drop valid sample in such cases, another check is added in power_pmu_enable. This checks if there is an overflown PMC found among the active events and if so enable back the PMI bit. Two new helper functions are introduced to clear/set the PMI, ie clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function pmi_irq_pending() is introduced to give a warning if there is pending PMI bit in paca, but no PMC is overflown. Also there are corner cases which result in performance monitor interrupts being triggered during power_pmu_disable(). This happens since PMXE bit is not cleared along with disabling of other MMCR0 bits in the pmu_disable. Such PMI's could leave the PMU running and could trigger PMI again which will set MMCR0 PMAO bit. This could lead to spurious interrupts in some corner cases. Example, a timer after power_pmu_del() which will re-enable interrupts and triggers a PMI again since PMAO bit is still set. But fails to find valid overflow since PMC was cleared in power_pmu_del(). Fix that by disabling PMXE along with disabling of other MMCR0 bits in power_pmu_disable(). We can't just replay PMI any time. Hence this approach is preferred rather than replaying PMI before resetting overflown PMC. Patch also documents core-book3s on a race condition which can trigger these PMC messages during idle path in PowerNV. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Reported-by: Nageswara R Sastry <nasastry@in.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make pmi_irq_pending() return bool, reflow/reword some comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/irq: Add helper to set regs->softe	Christophe Leroy	1	-2/+9
	[ Upstream commit fb5608fd117a8b48752d2b5a7e70847c1ed33d33 ] regs->softe doesn't exist on PPC32. Add irq_soft_mask_regs_set_state() helper to set regs->softe. This helper will void on PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5f37d1177a751fdbca79df461d283850ca3a34a2.1612796617.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/perf: move perf irq/nmi handling details into traps.c	Nicholas Piggin	3	-59/+32
	[ Upstream commit 156b5371a9c2482a9ad23ec82d1a4f89a3ab430d ] This is required in order to allow more significant differences between NMI type interrupt handlers and regular asynchronous handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-20-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/perf: MMCR0 control for PMU registers under PMCC=00	Athira Rajeev	5	-0/+15
	[ Upstream commit 91668ab7db4bcfae332e561df1de2401f3f18553 ] PowerISA v3.1 introduces new control bit (PMCCEXT) for restricting access to group B PMU registers in problem state when MMCR0 PMCC=0b00. In problem state and when MMCR0 PMCC=0b00, setting the Monitor Mode Control Register bit 54 (MMCR0 PMCCEXT), will restrict read permission on Group B Performance Monitor Registers (SIER, SIAR, SDAR and MMCR1). When this bit is set to zero, group B registers will be readable. In other platforms (like power9), the older behaviour is retained where group B PMU SPRs are readable. Patch adds support for MMCR0 PMCCEXT bit in power10 by enabling this bit during boot and during the PMU event enable/disable callback functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606409684-1589-8-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C	Jordan Niethe	4	-260/+287
	[ Upstream commit 344fbab991a568dc33ad90711b489d870e18d26d ] The only thing keeping the cpu_setup() and cpu_restore() functions used in the cputable entries for Power7, Power8, Power9 and Power10 in assembly was cpu_restore() being called before there was a stack in generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack for secondaries before cpu_restore()") means that it is now possible to use C. Rewrite the functions in C so they are a little bit easier to read. This is not changing their functionality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Tweak copyright and authorship notes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	powerpc/prom_init: Fix improper check of prom_getprop()	Peiwei Hu	1	-1/+1
	[ Upstream commit 869fb7e5aecbc163003f93f36dcc26d0554319f6 ] prom_getprop() can return PROM_ERROR. Binary operator can not identify it. Fixes: 94d2dde738a5 ("[POWERPC] Efika: prune fixups and make them more carefull") Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	x86/mce/inject: Avoid out-of-bounds write when setting flags	Zhang Zixun	1	-1/+1
	[ Upstream commit de768416b203ac84e02a757b782a32efb388476f ] A contrived zero-length write, for example, by using write(2): ... ret = write(fd, str, 0); ... to the "flags" file causes: BUG: KASAN: stack-out-of-bounds in flags_write Write of size 1 at addr ffff888019be7ddf by task writefile/3787 CPU: 4 PID: 3787 Comm: writefile Not tainted 5.16.0-rc7+ #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 due to accessing buf one char before its start. Prevent such out-of-bounds access. [ bp: Productize into a proper patch. Link below is the next best thing because the original mail didn't get archived on lore. ] Fixes: 0451d14d0561 ("EDAC, mce_amd_inj: Modify flags attribute to use string arguments") Signed-off-by: Zhang Zixun <zhang133010@icloud.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/linux-edac/YcnePfF1OOqoQwrX@zn.tnic/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	x86/boot/compressed: Move CLANG_FLAGS to beginning of KBUILD_CFLAGS	Nathan Chancellor	1	-2/+5
	[ Upstream commit 5fe392ff9d1f7254a1fbb3f72d9893088e4d23eb ] When cross compiling i386_defconfig on an arm64 host with clang, there are a few instances of '-Waddress-of-packed-member' and '-Wgnu-variable-sized-type-not-at-end' in arch/x86/boot/compressed/, which should both be disabled with the cc-disable-warning calls in that directory's Makefile, which indicates that cc-disable-warning is failing at the point of testing these flags. The cc-disable-warning calls fail because at the point that the flags are tested, KBUILD_CFLAGS has '-march=i386' without $(CLANG_FLAGS), which has the '--target=' flag to tell clang what architecture it is targeting. Without the '--target=' flag, the host architecture (arm64) is used and i386 is not a valid value for '-march=' in that case. This error can be seen by adding some logging to try-run: clang-14: error: the clang compiler does not support '-march=i386' Invoking the compiler has to succeed prior to calling cc-option or cc-disable-warning in order to accurately test whether or not the flag is supported; if it doesn't, the requested flag can never be added to the compiler flags. Move $(CLANG_FLAGS) to the beginning of KBUILD_FLAGS so that any new flags that might be added in the future can be accurately tested. Fixes: d5cbd80e302d ("x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211222163040.1961481-1-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	um: virtio_uml: Fix time-travel external time propagation	Johannes Berg	1	-0/+4
	[ Upstream commit 85e73968a040c642fd38f6cba5b73b61f5d0f052 ] When creating an external event, the current time needs to be propagated to other participants of a simulation. This is done in the places here where we kick a virtq etc. However, it must be done for _all_ external events, and that includes making the initial socket connection and later closing it. Call time_travel_propagate_time() to do this before making or closing the socket connection. Apparently, at least for the initial connection creation, due to the remote side in my use cases using microseconds (rather than nanoseconds), this wasn't a problem yet; only started failing between 5.14-rc1 and 5.15-rc1 (didn't test others much), or possibly depending on the configuration, where more delays happen before the virtio devices are initialized. Fixes: 88ce64249233 ("um: Implement time-travel=ext") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	um: fix ndelay/udelay defines	Johannes Berg	1	-2/+2
	[ Upstream commit 5f8539e2ff962e25b57742ca7106456403abbc94 ] Many places in the kernel use 'udelay' as an identifier, and are broken with the current "#define udelay um_udelay". Fix this by adding an argument to the macro, and do the same to 'ndelay' as well, just in case. Fixes: 0bc8fb4dda2b ("um: Implement ndelay/udelay in time-travel mode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ARM: dts: armada-38x: Add generic compatible to UART nodes	Marek Behún	1	-2/+2
	[ Upstream commit 62480772263ab6b52e758f2346c70a526abd1d28 ] Add generic compatible string "ns16550a" to serial port nodes of Armada 38x. This makes it possible to use earlycon. Fixes: 0d3d96ab0059 ("ARM: mvebu: add Device Tree description of the Armada 380/385 SoCs") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: marvell: cn9130: enable CP0 GPIO controllers	Robert Marko	1	-0/+8
	[ Upstream commit 0734f8311ce72c9041e5142769eff2083889c172 ] CN9130 has a built-in CP115 which has 2 GPIO controllers, but unlike in Armada 7k and 8k both are left disabled by the SoC DTSI. This first of all makes no sense as they are always present due to being SoC built-in and its an issue as boards like CN9130-CRB use the CPO GPIO2 pins for regulators and SD card support without enabling them first. So, enable both of them like Armada 7k and 8k do. Fixes: 6b8970bd8d7a ("arm64: dts: marvell: Add support for Marvell CN9130 SoC support") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: marvell: cn9130: add GPIO and SPI aliases	Robert Marko	1	-0/+7
	[ Upstream commit effd42600b987c1e95f946b14fefc1c7639e7439 ] CN9130 has one CP115 built in, which like the CP110 has 2 GPIO and 2 SPI controllers built-in. However, unlike the Armada 7k and 8k the SoC DTSI doesn't add the required aliases as both the Orion SPI driver and MVEBU GPIO drivers require the aliases to be present. So add the required aliases for GPIO and SPI controllers. Fixes: 6b8970bd8d7a ("arm64: dts: marvell: Add support for Marvell CN9130 SoC support") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding	Andre Przywara	2	-9/+16
	[ Upstream commit a92882a4d270fbcc021ee6848de5e48b7f0d27f3 ] In the decompressor's head.S we need to start with an instruction that is some kind of NOP, but also mimics as the PE/COFF header, when the kernel is linked as an UEFI application. The clever solution here is "tstne r0, #0x4d000", which in the worst case just clobbers the condition flags, and bears the magic "MZ" signature in the lowest 16 bits. However the encoding used (0x13105a4d) is actually not valid, since bits [15:12] are supposed to be 0 (written as "(0)" in the ARM ARM). Violating this is UNPREDICTABLE, and can trigger an UNDEFINED exception. Common Cortex cores seem to ignore those bits, but QEMU chooses to trap, so the code goes fishing because of a missing exception handler at this point. We are just saved by the fact that commonly (with -kernel or when running from U-Boot) the "Z" bit is set, so the instruction is never executed. See [0] for more details. To make things more robust and avoid UNPREDICTABLE behaviour in the kernel code, lets replace this with a "two-instruction NOP": The first instruction is an exclusive OR, the effect of which the second instruction reverts. This does not leave any trace, neither in a register nor in the condition flags. Also it's a perfectly valid encoding. Kudos to Peter Maydell for coming up with this gem. [0] https://lore.kernel.org/qemu-devel/YTPIdbUCmwagL5%2FD@os.inf.tu-dresden.de/T/ Link: https://lore.kernel.org/linux-arm-kernel/20210908162617.104962-1-andre.przywara@arm.com/T/ Fixes: 81a0bc39ea19 ("ARM: add UEFI stub support") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reported-by: Adam Lackorzynski <adam@l4re.org> Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: qcom: ipq6018: Fix gpio-ranges property	Baruch Siach	1	-1/+1
	[ Upstream commit 72cb4c48a46a7cfa58eb5842c0d3672ddd5bd9ad ] There must be three parameters in gpio-ranges property. Fixes this not very helpful error message: OF: /soc/pinctrl@1000000: (null) = 3 found 3 Fixes: 1e8277854b49 ("arm64: dts: Add ipq6018 SoC and CP01 board support") Cc: Sricharan R <sricharan@codeaurora.org> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/8a744cfd96aff5754bfdcf7298d208ddca5b319a.1638862030.git.baruch@tkos.co.il Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: qcom: c630: Fix soundcard setup	Srinivas Kandagatla	1	-0/+27
	[ Upstream commit c02b360ca67ebeb9de07b47b2fe53f964c2561d1 ] Currently Soundcard has 1 rx device for headset and SoundWire Speaker Playback. This setup has issues, ex if we try to play on headset the audio stream is also sent to SoundWire Speakers and we will hear sound in both headsets and speakers. Make a separate device for Speakers and Headset so that the streams are different and handled properly. Fixes: 45021d35fcb2 ("arm64: dts: qcom: c630: Enable audio support") Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211209175342.20386-2-srinivas.kandagatla@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ARM: dts: gemini: NAS4220-B: fis-index-block with 128 KiB sectors	Christian Lamparter	1	-1/+1
	[ Upstream commit 4754eab7e5a78bdefe7a960c5c260c95ebbb5fa6 ] Steven Maddox reported in the OpenWrt bugzilla, that his RaidSonic IB-NAS4220-B was no longer booting with the new OpenWrt 21.02 (uses linux 5.10's device-tree). However, it was working with the previous OpenWrt 19.07 series (uses 4.14). \|[ 5.548038] No RedBoot partition table detected in 30000000.flash \|[ 5.618553] Searching for RedBoot partition table in 30000000.flash at offset 0x0 \|[ 5.739093] No RedBoot partition table detected in 30000000.flash \|... \|[ 7.039504] Waiting for root device /dev/mtdblock3... The provided bootlog shows that the RedBoot partition parser was looking for the partition table "at offset 0x0". Which is strange since the comment in the device-tree says it should be at 0xfe0000. Further digging on the internet led to a review site that took some useful PCB pictures of their review unit back in February 2009. Their picture shows a Spansion S29GL128N11TFI01 flash chip. >From Spansion's Datasheet: "S29GL128N: One hundred twenty-eight 64 Kword (128 Kbyte) sectors" Steven also provided a "cat /sys/class/mtd/mtd0/erasesize" from his unit: "131072". With the 128 KiB Sector/Erasesize in mind. This patch changes the fis-index-block property to (0xfe0000 / 0x20000) = 0x7f. Fixes: b5a923f8c739 ("ARM: dts: gemini: Switch to redboot partition parsing") Reported-by: Steven Maddox <s.maddox@lantizia.me.uk> Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Steven Maddox <s.maddox@lantizia.me.uk> Link: https://lore.kernel.org/r/20211206004334.4169408-1-linus.walleij@linaro.org' Bugzilla: https://bugs.openwrt.org/index.php?do=details&task_id=4137 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	x86/uaccess: Move variable into switch case statement	Kees Cook	1	-2/+3
	[ Upstream commit 61646ca83d3889696f2772edaff122dd96a2935e ] When building with automatic stack variable initialization, GCC 12 complains about variables defined outside of switch case statements. Move the variable into the case that uses it, which silences the warning: ./arch/x86/include/asm/uaccess.h:317:23: warning: statement will never be executed [-Wswitch-unreachable] 317 \| unsigned char x_u8__; \ \| ^~~~~~ Fixes: 865c50e1d279 ("x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211209043456.1377875-1-keescook@chromium.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1	Reiji Watanabe	1	-0/+10
	[ Upstream commit f0616abd4e67143b45b04b565839148458857347 ] Currently, clear_page() uses DC ZVA instruction unconditionally. But it should make sure that DCZID_EL0.DZP, which indicates whether or not use of DC ZVA instruction is prohibited, is zero when using the instruction. Use STNP instead when DCZID_EL0.DZP == 1. Fixes: f27bb139c387 ("arm64: Miscellaneous library functions") Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20211206004736.1520989-2-reijiw@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: lib: Annotate {clear, copy}_page() as position-independent	Will Deacon	2	-4/+4
	[ Upstream commit 8d9902055c57548bb342dc3ca78caa21e9643024 ] clear_page() and copy_page() are suitable for use outside of the kernel address space, so annotate them as position-independent code. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-2-qperret@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: ti: k3-j7200: Correct the d-cache-sets info	Nishanth Menon	1	-2/+2
	[ Upstream commit a172c86931709d6663318609d71a811333bdf4b0 ] A72 Cluster (chapter 1.3.1 [1]) has 48KB Icache, 32KB Dcache and 1MB L2 Cache - ICache is 3-way set-associative - Dcache is 2-way set-associative - Line size are 64bytes 32KB (Dcache)/64 (fixed line length of 64 bytes) = 512 ways 512 ways / 2 (Dcache is 2-way per set) = 256 sets. So, correct the d-cache-sets info. [1] https://www.ti.com/lit/pdf/spruiu1 Fixes: d361ed88455f ("arm64: dts: ti: Add support for J7200 SoC") Reported-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Pratyush Yadav <p.yadav@ti.com> Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20211113042640.30955-1-nm@ti.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: ti: k3-j721e: Fix the L2 cache sets	Nishanth Menon	1	-1/+1
	[ Upstream commit e9ba3a5bc6fdc2c796c69fdaf5ed6c9957cf9f9d ] A72's L2 cache[1] on J721e[2] is 1MB. A72's L2 is fixed line length of 64 bytes and 16-way set-associative cache structure. 1MB of L2 / 64 (line length) = 16384 ways 16384 ways / 16 = 1024 sets Fix the l2 cache-sets. [1] https://developer.arm.com/documentation/100095/0003/Level-2-Memory-System/About-the-L2-memory-system [2] http://www.ti.com/lit/pdf/spruil1 Fixes: 2d87061e70de ("arm64: dts: ti: Add Support for J721E SoC") Reported-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Pratyush Yadav <p.yadav@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20211113043639.4413-1-nm@ti.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: ti: k3-j7200: Fix the L2 cache sets	Nishanth Menon	1	-1/+1
	[ Upstream commit d0c826106f3fc11ff97285102b576b65576654ae ] A72's L2 cache[1] on J7200[2] is 1MB. A72's L2 is fixed line length of 64 bytes and 16-way set-associative cache structure. 1MB of L2 / 64 (line length) = 16384 ways 16384 ways / 16 = 1024 sets Fix the l2 cache-sets. [1] https://developer.arm.com/documentation/100095/0003/Level-2-Memory-System/About-the-L2-memory-system [2] https://www.ti.com/lit/pdf/spruiu1 Fixes: d361ed88455f ("arm64: dts: ti: Add support for J7200 SoC") Reported-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Pratyush Yadav <p.yadav@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20211113043638.4358-1-nm@ti.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: qcom: msm8916: fix MMC controller aliases	Dmitry Baryshkov	1	-2/+2
	[ Upstream commit b0293c19d42f6d6951c2fab9a47fed50baf2c14d ] Change sdhcN aliases to mmcN to make them actually work. Currently the board uses non-standard aliases sdhcN, which do not work, resulting in mmc0 and mmc1 hosts randomly changing indices between boots. Fixes: c4da5a561627 ("arm64: dts: qcom: Add msm8916 sdhci configuration nodes") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211201020559.1611890-1-dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: ti: k3-j721e: correct cache-sets info	Peng Fan	1	-2/+2
	[ Upstream commit 7a0df1f969c14939f60a7f9a6af72adcc314675f ] A72 Cluster has 48KB Icache, 32KB Dcache and 1MB L2 Cache - ICache is 3-way set-associative - Dcache is 2-way set-associative - Line size are 64bytes So correct the cache-sets info. Fixes: 2d87061e70dea ("arm64: dts: ti: Add Support for J721E SoC") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Nishanth Menon <nm@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20211112063155.3485777-1-peng.fan@oss.nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	ARM: dts: stm32: fix dtbs_check warning on ili9341 dts binding on stm32f429 ↵	Dillon Min	1	-1/+1
	disco [ Upstream commit b046049e59dca5e5830dc75ed16acf7657a95161 ] Since the compatible string defined from ilitek,ili9341.yaml is "st,sf-tc240t-9370-t", "ilitek,ili9341" so, append "ilitek,ili9341" to avoid the below dtbs_check warning. arch/arm/boot/dts/stm32f429-disco.dt.yaml: display@1: compatible: ['st,sf-tc240t-9370-t'] is too short Fixes: a726e2f000ec ("ARM: dts: stm32: enable ltdc binding with ili9341, gyro l3gd20 on stm32429-disco board") Signed-off-by: Dillon Min <dillon.minfei@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: renesas: cat875: Add rx/tx delays	Biju Das	1	-0/+1
	[ Upstream commit e1a9faddffe7e555304dc2e3284c84fbee0679ee ] The CAT875 sub board from Silicon Linux uses a Realtek PHY. The phy driver commit bbc4d71d63549bcd003 ("net: phy: realtek: fix rtl8211e rx/tx delay config") introduced NFS mount failures. Now it needs both rx/tx delays for the NFS mount to work. This patch fixes the NFS mount failure issue by adding "rgmii-id" mode to the avb device node. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Fixes: bbc4d71d63549bcd ("net: phy: realtek: fix rtl8211e rx/tx delay config") Link: https://lore.kernel.org/r/20211115142830.12651-1-biju.das.jz@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: meson-gxbb-wetek: fix missing GPIO binding	Christian Hewitt	1	-0/+1
	[ Upstream commit c019abb2feba3cbbd7cf7178f8e6499c4fa6fced ] The absence of this binding appears to be harmless in Linux but it breaks Ethernet support in mainline u-boot. So add the binding (which is present in all other u-boot supported GXBB device-trees). Fixes: fb72c03e0e32 ("ARM64: dts: meson-gxbb-wetek: add a wetek specific dtsi to cleanup hub and play2") Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20211012052522.30873-3-christianshewitt@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: meson-gxbb-wetek: fix HDMI in early boot	Christian Hewitt	1	-0/+2
	[ Upstream commit 8182a35868db5f053111d5d9d4da8fcb3f99259d ] Mark the VDDIO_AO18 regulator always-on and set hdmi-supply for the hdmi_tx node to ensure HDMI is powered in the early stages of boot. Fixes: fb72c03e0e32 ("ARM64: dts: meson-gxbb-wetek: add a wetek specific dtsi to cleanup hub and play2") Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20211012052522.30873-2-christianshewitt@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: amlogic: Fix SPI NOR flash node name for ODROID N2/N2+	Alexander Stein	1	-1/+1
	[ Upstream commit 95d35256b564aca33fb661eac77dc94bfcffc8df ] Fix the schema warning: "spi-flash@0: $nodename:0: 'spi-flash@0' does not match '^flash(@.*)?$'" from jedec,spi-nor.yaml Fixes: a084eaf3096c ("arm64: dts: meson-g12b-odroid-n2: add SPIFC controller node") Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Alexander Stein <alexander.stein@mailbox.org> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20211026182813.900775-3-alexander.stein@mailbox.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	arm64: dts: amlogic: meson-g12: Fix GPU operating point table node name	Alexander Stein	1	-1/+1
	[ Upstream commit bb98a6fd0b0e227cefb2ba91cea2b55455f203b7 ] Starting with commit 94274f20f6bf ("dt-bindings: opp: Convert to DT schema") the opp node name has a mandatory pattern. This change fixes the dtbs_check warning: gpu-opp-table: $nodename:0: 'gpu-opp-table' does not match '^opp-table(-[a-z0-9]+)?$' Put the 'gpu' part at the end to match the pattern. Fixes: 916a0edc43f0 ("arm64: dts: amlogic: meson-g12: add the Mali OPP table and use DVFS") Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Alexander Stein <alexander.stein@mailbox.org> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20211026182813.900775-2-alexander.stein@mailbox.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27	x86/gpu: Reserve stolen memory for first integrated Intel GPU	Lucas De Marchi	1	-1/+9
	commit 9c494ca4d3a535f9ca11ad6af1813983c1c6cbdd upstream. "Stolen memory" is memory set aside for use by an Intel integrated GPU. The intel_graphics_quirks() early quirk reserves this memory when it is called for a GPU that appears in the intel_early_ids[] table of integrated GPUs. Previously intel_graphics_quirks() was marked as QFLAG_APPLY_ONCE, so it was called only for the first Intel GPU found. If a discrete GPU happened to be enumerated first, intel_graphics_quirks() was called for it but not for any integrated GPU found later. Therefore, stolen memory for such an integrated GPU was never reserved. For example, this problem occurs in this Alderlake-P (integrated) + DG2 (discrete) topology where the DG2 is found first, but stolen memory is associated with the integrated GPU: - 00:01.0 Bridge `- 03:00.0 DG2 discrete GPU - 00:02.0 Integrated GPU (with stolen memory) Remove the QFLAG_APPLY_ONCE flag and call intel_graphics_quirks() for every Intel GPU. Reserve stolen memory for the first GPU that appears in intel_early_ids[]. [bhelgaas: commit log, add code comment, squash in https://lore.kernel.org/r/20220118190558.2ququ4vdfjuahicm@ldmartin-desk2] Link: https://lore.kernel.org/r/20220114002843.2083382-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27	KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock	Marcelo Tosatti	1	-8/+8
	commit 5f02ef741a785678930f3ff0a8b6b2b0ef1bb402 upstream. blocked_vcpu_on_cpu_lock is taken from hard interrupt context (pi_wakeup_handler), therefore it cannot sleep. Switch it to a raw spinlock. Fixes: [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001 [41297.066323] Preemption disabled at: [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60 [41297.066339] Call Trace: [41297.066342] <IRQ> [41297.066346] dump_stack_lvl+0x34/0x44 [41297.066353] ? irq_enter_rcu+0xf/0x60 [41297.066356] __schedule_bug.cold+0x7d/0x8b [41297.066361] __schedule+0x439/0x5b0 [41297.066365] ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440 [41297.066369] schedule_rtlock+0x1e/0x40 [41297.066371] rtlock_slowlock_locked+0xf1/0x260 [41297.066374] rt_spin_lock+0x3b/0x60 [41297.066378] pi_wakeup_handler+0x31/0x90 [kvm_intel] [41297.066388] sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0 [41297.066392] </IRQ> [41297.066392] asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20 ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20	powerpc/pseries: Get entry and uaccess flush required bits from ↵	Nicholas Piggin	2	-0/+8
	H_GET_CPU_CHARACTERISTICS commit 65c7d070850e109a8a75a431f5a7f6eb4c007b77 upstream. This allows the hypervisor / firmware to describe these workarounds to the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503130243.891868-2-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20	KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all	Wei Wang	1	-1/+1
	commit 9fb12fe5b93b94b9e607509ba461e17f4cc6a264 upstream. The fixed counter 3 is used for the Topdown metrics, which hasn't been enabled for KVM guests. Userspace accessing to it will fail as it's not included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines, which have this counter. To reproduce it on ICX+ machines, ./state_test reports: ==== Test Assertion Failure ==== lib/x86_64/processor.c:1078: r == nmsrs pid=4564 tid=4564 - Argument list too long 1 0x000000000040b1b9: vcpu_save_state at processor.c:1077 2 0x0000000000402478: main at state_test.c:209 (discriminator 6) 3 0x00007fbe21ed5f92: ?? ??:0 4 0x000000000040264d: _start at ??:? Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c) With this patch, it works well. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20	KVM: s390: Clarify SIGP orders versus STOP/RESTART	Eric Farman	4	-2/+43
	commit 812de04661c4daa7ac385c0dfd62594540538034 upstream. With KVM_CAP_S390_USER_SIGP, there are only five Signal Processor orders (CONDITIONAL EMERGENCY SIGNAL, EMERGENCY SIGNAL, EXTERNAL CALL, SENSE, and SENSE RUNNING STATUS) which are intended for frequent use and thus are processed in-kernel. The remainder are sent to userspace with the KVM_CAP_S390_USER_SIGP capability. Of those, three orders (RESTART, STOP, and STOP AND STORE STATUS) have the potential to inject work back into the kernel, and thus are asynchronous. Let's look for those pending IRQs when processing one of the in-kernel SIGP orders, and return BUSY (CC2) if one is in process. This is in agreement with the Principles of Operation, which states that only one order can be "active" on a CPU at a time. Cc: stable@vger.kernel.org Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211213210550.856213-2-farman@linux.ibm.com [borntraeger@linux.ibm.com: add stable tag] Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20	KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest	Sean Christopherson	3	-1/+6
	commit f4b027c5c8199abd4fb6f00d67d380548dbfdfa8 upstream. Override the Processor Trace (PT) interrupt handler for guest mode if and only if PT is configured for host+guest mode, i.e. is being used independently by both host and guest. If PT is configured for system mode, the host fully controls PT and must handle all events. Fixes: 8479e04e7d6b ("KVM: x86: Inject PMI for KVM guest") Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: Artem Kashkanov <artem.kashkanov@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20	perf: Protect perf_guest_cbs with RCU	Sean Christopherson	7	-31/+60
	commit ff083a2d972f56bebfd82409ca62e5dfce950961 upstream. Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily, all paths that read perf_guest_cbs already require RCU protection, e.g. to protect the callback chains, so only the direct perf_guest_cbs touchpoints need to be modified. Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure perf_guest_cbs isn't reloaded between a !NULL check and a dereference. Fixed via the READ_ONCE() in rcu_dereference(). Bug #2 is that on weakly-ordered architectures, updates to the callbacks themselves are not guaranteed to be visible before the pointer is made visible to readers. Fixed by the smp_store_release() in rcu_assign_pointer() when the new pointer is non-NULL. Bug #3 is that, because the callbacks are global, it's possible for readers to run in parallel with an unregisters, and thus a module implementing the callbacks can be unloaded while readers are in flight, resulting in a use-after-free. Fixed by a synchronize_rcu() call when unregistering callbacks. Bug #1 escaped notice because it's extremely unlikely a compiler will reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest() guard all but guarantees the consumer will win the race, e.g. to nullify perf_guest_cbs, KVM has to completely exit the guest and teardown down all VMs before KVM start its module unload / unregister sequence. This also makes it all but impossible to encounter bug #3. Bug #2 has not been a problem because all architectures that register callbacks are strongly ordered and/or have a static set of callbacks. But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming kvm_intel module load/unload leads to: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459 Hardware nam