Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 223e7fb979fa06934f1595b6ad0ae1d4ead1147f ]
Also initialize regs->psw.mask in perf_arch_fetch_caller_regs().
This way user_mode(regs) will return false, like it should.
It looks like all current users initialize regs to zero, so that this
doesn't fix a bug currently. However it is better to not rely on callers
to do this.
Fixes: 914d52e46490 ("s390: implement perf_arch_fetch_caller_regs")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3cd03ea57e8e16cc78cc357d5e9f26078426f236 ]
The Linux implementation of PCI error recovery for s390 was based on the
understanding that firmware error recovery is a two step process with an
optional initial error event to indicate the cause of the error if known
followed by either error event 0x3A (Success) or 0x3B (Failure) to
indicate whether firmware was able to recover. While this has been the
case in testing and the error cases seen in the wild it turns out this
is not correct. Instead firmware only generates 0x3A for some error and
service scenarios and expects the OS to perform recovery for all PCI
events codes except for those indicating permanent error (0x3B, 0x40)
and those indicating errors on the function measurement block (0x2A,
0x2B, 0x2C). Align Linux behavior with these expectations.
Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit cad4b3d4ab1f062708fff33f44d246853f51e966 upstream.
The parameters for the diag 0x258 are real addresses, not virtual, but
KVM was using them as virtual addresses. This only happened to work, since
the Linux kernel as a guest used to have a 1:1 mapping for physical vs
virtual addresses.
Fix KVM so that it correctly uses the addresses as real addresses.
Cc: stable@vger.kernel.org
Fixes: 8ae04b8f500b ("KVM: s390: Guest's memory access functions get access registers")
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240917151904.74314-3-nrb@linux.ibm.com
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e8061f06185be0a06a73760d6526b8b0feadfe52 upstream.
Previously, access_guest_page() did not check whether the given guest
address is inside of a memslot. This is not a problem, since
kvm_write_guest_page/kvm_read_guest_page return -EFAULT in this case.
However, -EFAULT is also returned when copy_to/from_user fails.
When emulating a guest instruction, the address being outside a memslot
usually means that an addressing exception should be injected into the
guest.
Failure in copy_to/from_user however indicates that something is wrong
in userspace and hence should be handled there.
To be able to distinguish these two cases, return PGM_ADDRESSING in
access_guest_page() when the guest address is outside guest memory. In
access_guest_real(), populate vcpu->arch.pgm.code such that
kvm_s390_inject_prog_cond() can be used in the caller for injecting into
the guest (if applicable).
Since this adds a new return value to access_guest_page(), we need to make
sure that other callers are not confused by the new positive return value.
There are the following users of access_guest_page():
- access_guest_with_key() does the checking itself (in
guest_range_to_gpas()), so this case should never happen. Even if, the
handling is set up properly.
- access_guest_real() just passes the return code to its callers, which
are:
- read_guest_real() - see below
- write_guest_real() - see below
There are the following users of read_guest_real():
- ar_translation() in gaccess.c which already returns PGM_*
- setup_apcb10(), setup_apcb00(), setup_apcb11() in vsie.c which always
return -EFAULT on read_guest_read() nonzero return - no change
- shadow_crycb(), handle_stfle() always present this as validity, this
could be handled better but doesn't change current behaviour - no change
There are the following users of write_guest_real():
- kvm_s390_store_status_unloaded() always returns -EFAULT on
write_guest_real() failure.
Fixes: 2293897805c2 ("KVM: s390: add architecture compliant guest access functions")
Cc: stable@vger.kernel.org
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240917151904.74314-2-nrb@linux.ibm.com
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3d5854d75e3187147613130561b58f0b06166172 upstream.
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead. That leads to a wrong read.
Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That
way an architecture can deal with specific physical memory ranges.
Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.
For other architectures the default callback is basically NOP. It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.
Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3c4d0ae0671827f4b536cc2d26f8b9c54584ccc5 ]
Add missing warning handling to the early program check handler. This
way a warning is printed to the console as soon as the early console
is setup, and the kernel continues to boot.
Before this change a disabled wait psw was loaded instead and the
machine was silently stopped without giving an idea about what
happened.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b495e710157606889f2d8bdc62aebf2aa02f67a7 ]
Remove WARN_ON_ONCE statements. These have not triggered in the
past.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 131b8db78558120f58c5dc745ea9655f6b854162 ]
Adding/removing large amount of pages at once to/from the CMM balloon
can result in rcu_sched stalls or workqueue lockups, because of busy
looping w/o cond_resched().
Prevent this by adding a cond_resched(). cmm_free_pages() holds a
spin_lock while looping, so it cannot be added directly to the existing
loop. Instead, introduce a wrapper function that operates on maximum 256
pages at once, and add it there.
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0147addc4fb72a39448b8873d8acdf3a0f29aa65 ]
Disable compile time optimizations of test_facility() for the
decompressor. The decompressor should not contain any optimized code
depending on the architecture level set the kernel image is compiled
for to avoid unexpected operation exceptions.
Add a __DECOMPRESSOR check to test_facility() to enforce that
facilities are always checked during runtime for the decompressor.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit a84dd0d8ae24bdc6da341187fc4c1a0adfce2ccc upstream.
ftrace_return_address() is called extremely often from
performance-critical code paths when debugging features like
CONFIG_TRACE_IRQFLAGS are enabled. For example, with debug_defconfig,
ftrace selftests on my LPAR currently execute ftrace_return_address()
as follows:
ftrace_return_address(0) - 0 times (common code uses __builtin_return_address(0) instead)
ftrace_return_address(1) - 2,986,805,401 times (with this patch applied)
ftrace_return_address(2) - 140 times
ftrace_return_address(>2) - 0 times
The use of __builtin_return_address(n) was replaced by return_address()
with an unwinder call by commit cae74ba8c295 ("s390/ftrace:
Use unwinder instead of __builtin_return_address()") because
__builtin_return_address(n) simply walks the stack backchain and doesn't
check for reaching the stack top. For shallow stacks with fewer than
"n" frames, this results in reads at low addresses and random
memory accesses.
While calling the fully functional unwinder "works", it is very slow
for this purpose. Moreover, potentially following stack switches and
walking past IRQ context is simply wrong thing to do for
ftrace_return_address().
Reimplement return_address() to essentially be __builtin_return_address(n)
with checks for reaching the stack top. Since the ftrace_return_address(n)
argument is always a constant, keep the implementation in the header,
allowing both GCC and Clang to unroll the loop and optimize it to the
bare minimum.
Fixes: cae74ba8c295 ("s390/ftrace: Use unwinder instead of __builtin_return_address()")
Cc: stable@vger.kernel.org
Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f101b305a7b9513a8042a2cf09018de4ff371af2 ]
Add the missing pieces so the early program check handler also works
with a relocated lowcore. Right now the result of an early program
check in case of a relocated lowcore would be a program check loop.
Fixes: 8f1e70adb1a3 ("s390/boot: Add cmdline option to relocate lowcore")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f2bb5b97b51ce5425e8f59f899643ce4eadba667 ]
Have all program check handlers in one file to make future changes easy.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Stable-dep-of: f101b305a7b9 ("s390/entry: Make early program check handler relocated lowcore aware")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix KASLR base offset to account for symbol offsets in the vmlinux
ELF file, preventing tool breakages like the drgn debugger
- Fix potential memory corruption of physmem_info during kernel
physical address randomization
- Fix potential memory corruption due to overlap between the relocated
lowcore and identity mapping by correctly reserving lowcore memory
- Fix performance regression and avoid randomizing identity mapping
base by default
- Fix unnecessary delay of AP bus binding complete uevent to prevent
startup lag in KVM guests using AP
* tag 's390-6.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/boot: Fix KASLR base offset off by __START_KERNEL bytes
s390/boot: Avoid possible physmem_info segment corruption
s390/ap: Refine AP bus bindings complete processing
s390/mm: Pin identity mapping base to zero
s390/mm: Prevent lowcore vs identity mapping overlap
|
|
Symbol offsets to the KASLR base do not match symbol address in
the vmlinux image. That is the result of setting the KASLR base
to the beginning of .text section as result of an optimization.
Revert that optimization and allocate virtual memory for the
whole kernel image including __START_KERNEL bytes as per the
linker script. That allows keeping the semantics of the KASLR
base offset in sync with other architectures.
Rename __START_KERNEL to TEXT_OFFSET, since it represents the
offset of the .text section within the kernel image, rather than
a virtual address.
Still skip mapping TEXT_OFFSET bytes to save memory on pgtables
and provoke exceptions in case an attempt to access this area is
made, as no kernel symbol may reside there.
In case CONFIG_KASAN is enabled the location counter might exceed
the value of TEXT_OFFSET, while the decompressor linker script
forcefully resets it to TEXT_OFFSET, which leads to a sections
overlap link failure. Use MAX() expression to avoid that.
Reported-by: Omar Sandoval <osandov@osandov.com>
Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/
Fixes: 56b1069c40c7 ("s390/boot: Rework deployment of the kernel image")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When physical memory for the kernel image is allocated it does not
consider extra memory required for offsetting the image start to
match it with the lower 20 bits of KASLR virtual base address. That
might lead to kernel access beyond its memory range.
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
SIE instruction performs faster when the virtual address of
SIE block matches the physical one. Pin the identity mapping
base to zero for the benefit of SIE and other instructions
that have similar performance impact. Still, randomize the
base when DEBUG_VM kernel configuration option is enabled.
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The identity mapping position in virtual memory is randomized
together with the kernel mapping. That position can never
overlap with the lowcore even when the lowcore is relocated.
Prevent overlapping with the lowcore to allow independent
positioning of the identity mapping. With the current value
of the alternative lowcore address of 0x70000 the overlap
could happen in case the identity mapping is placed at zero.
This is a prerequisite for uncoupling of randomization base
of kernel image and identity mapping in virtual memory.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The return value uv_set_shared() and uv_remove_shared() (which are
wrappers around the share() function) is not always checked. The system
integrity of a protected guest depends on the Share and Unshare UVCs
being successful. This means that any caller that fails to check the
return value will compromise the security of the protected guest.
No code path that would lead to such violation of the security
guarantees is currently exercised, since all the areas that are shared
never get unshared during the lifetime of the system. This might
change and become an issue in the future.
The Share and Unshare UVCs can only fail in case of hypervisor
misbehaviour (either a bug or malicious behaviour). In such cases there
is no reasonable way forward, and the system needs to panic.
This patch replaces the return at the end of the share() function with
a panic, to guarantee system integrity.
Fixes: 5abb9351dfd9 ("s390/uv: introduce guest side ultravisor code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20240801112548.85303-1-imbrenda@linux.ibm.com
Message-ID: <20240801112548.85303-1-imbrenda@linux.ibm.com>
[frankja@linux.ibm.com: Fixed up patch subject]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
We might run into a SIE validity if gisa has been disabled either via using
kernel parameter "kvm.use_gisa=0" or by setting the related sysfs
attribute to N (echo N >/sys/module/kvm/parameters/use_gisa).
The validity is caused by an invalid value in the SIE control block's
gisa designation. That happens because we pass the uninitialized gisa
origin to virt_to_phys() before writing it to the gisa designation.
To fix this we return 0 in kvm_s390_get_gisa_desc() if the origin is 0.
kvm_s390_get_gisa_desc() is used to determine which gisa designation to
set in the SIE control block. A value of 0 in the gisa designation disables
gisa usage.
The issue surfaces in the host kernel with the following kernel message as
soon a new kvm guest start is attemted.
kvm: unhandled validity intercept 0x1011
WARNING: CPU: 0 PID: 781237 at arch/s390/kvm/intercept.c:101 kvm_handle_sie_intercept+0x42e/0x4d0 [kvm]
Modules linked in: vhost_net tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat x_tables nf_nat_tftp nf_conntrack_tftp vfio_pci_core irqbypass vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables sunrpc mlx5_ib ib_uverbs ib_core mlx5_core uvdevice s390_trng eadm_sch vfio_ccw zcrypt_cex4 mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop drm_panel_orientation_quirks configfs nfnetlink lcs ctcm fsm dm_service_time ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common dm_mirror dm_region_hash dm_log zfcp scsi_transport_fc scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey zcrypt dm_multipath rng_core autofs4 [last unloaded: vfio_pci]
CPU: 0 PID: 781237 Comm: CPU 0/KVM Not tainted 6.10.0-08682-gcad9f11498ea #6
Hardware name: IBM 3931 A01 701 (LPAR)
Krnl PSW : 0704c00180000000 000003d93deb0122 (kvm_handle_sie_intercept+0x432/0x4d0 [kvm])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 000003d900000027 000003d900000023 0000000000000028 000002cd00000000
000002d063a00900 00000359c6daf708 00000000000bebb5 0000000000001eff
000002cfd82e9000 000002cfd80bc000 0000000000001011 000003d93deda412
000003ff8962df98 000003d93de77ce0 000003d93deb011e 00000359c6daf960
Krnl Code: 000003d93deb0112: c020fffe7259 larl %r2,000003d93de7e5c4
000003d93deb0118: c0e53fa8beac brasl %r14,000003d9bd3c7e70
#000003d93deb011e: af000000 mc 0,0
>000003d93deb0122: a728ffea lhi %r2,-22
000003d93deb0126: a7f4fe24 brc 15,000003d93deafd6e
000003d93deb012a: 9101f0b0 tm 176(%r15),1
000003d93deb012e: a774fe48 brc 7,000003d93deafdbe
000003d93deb0132: 40a0f0ae sth %r10,174(%r15)
Call Trace:
[<000003d93deb0122>] kvm_handle_sie_intercept+0x432/0x4d0 [kvm]
([<000003d93deb011e>] kvm_handle_sie_intercept+0x42e/0x4d0 [kvm])
[<000003d93deacc10>] vcpu_post_run+0x1d0/0x3b0 [kvm]
[<000003d93deaceda>] __vcpu_run+0xea/0x2d0 [kvm]
[<000003d93dead9da>] kvm_arch_vcpu_ioctl_run+0x16a/0x430 [kvm]
[<000003d93de93ee0>] kvm_vcpu_ioctl+0x190/0x7c0 [kvm]
[<000003d9bd728b4e>] vfs_ioctl+0x2e/0x70
[<000003d9bd72a092>] __s390x_sys_ioctl+0xc2/0xd0
[<000003d9be0e9222>] __do_syscall+0x1f2/0x2e0
[<000003d9be0f9a90>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<000003d9bd3c7f58>] __warn_printk+0xe8/0xf0
Cc: stable@vger.kernel.org
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Fixes: fe0ef0030463 ("KVM: s390: sort out physical vs virtual pointers usage")
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20240801123109.2782155-1-mimu@linux.ibm.com
Message-ID: <20240801123109.2782155-1-mimu@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
There is no added security by making the inittext section non-writable,
however it does split part of the kernel mapping into 4K mappings
instead of 1M mappings:
---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX
0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX
0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX
0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX
0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX <---
0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX
0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX
---[ Kernel Image End ]---
Keep the inittext writable and enable instruction execution protection
(aka noexec) later to prevent this. This also allows to use the
generic free_initmem() implementation.
---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX
0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX
0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX
0x000003ffe1400000-0x000003ffe1800000 4M PMD RW NX <---
0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX
---[ Kernel Image End ]---
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The .data.rel.ro and .got section were added between the rodata and
ro_after_init data section, which adds an RW mapping in between all RO
mapping of the kernel image:
---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX
0x000003ffe1300000-0x000003ffe1331000 196K PTE RO NX
0x000003ffe1331000-0x000003ffe13b3000 520K PTE RW NX <---
0x000003ffe13b3000-0x000003ffe13d5000 136K PTE RO NX
0x000003ffe13d5000-0x000003ffe1400000 172K PTE RW NX
0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX
0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX
0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX
0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX
---[ Kernel Image End ]---
Move the ro_after_init data section again right behind the rodata
section to prevent interleaving RO and RW mappings:
---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX
0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX
0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX
0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX
0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX
0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX
0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX
---[ Kernel Image End ]---
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Since __va(0) does not translate to NULL anymore remove RELOC_HIDE()
which was only added to get rid of a compile warning with clang W=1:
arch/s390/mm/vmem.c:666:36: warning: performing pointer arithmetic on
a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
666 | __set_memory_4k(__va(0), __va(0) + ident_map_size);
| ~~~~~~~ ^
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use the sort() from lib/sort.c to sort markers instead of the private
implementation. The current implementation does not sort markers
properly if they have to be moved downwards:
---[ Real Memory Copy Area Start ]---
0x0000035b903ff000-0x0000035b90400000 4K PTE I
---[ vmalloc Area Start ]---
---[ Real Memory Copy Area End ]---
Add a new member to each marker which indicates if a marker is start
of an area. If addresses of areas are equal consider an address which
defines the start of an area higher than the address which defines the
end of an area. In result the output is sorted as intended:
---[ Real Memory Copy Area Start ]---
0x0000019cedcff000-0x0000019cedd00000 4K PTE I
---[ Real Memory Copy Area End ]---
---[ vmalloc Area Start ]---
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The page table dumper contains a hard coded assumption that the first
mapped area starts at address zero. With a relocated lowcore this is
not true anymore. Subsequently the first entry (lowcore) is printed as
if it would contain everything from address zero until the end of the
location of the lowcore area.
Fix this by adding a single "Kernel Virtual Address Space" entry,
which always starts at address zero. It ends when the lowcore area
starts which is either address zero, or its relocated address.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Since virtual and real addresses are not the same anymore the
assumption that the kernel image is contained within the identity
mapping is also not true anymore.
Fix this by adding two explicit areas and at the correct locations: one
for the 8kb lowcore area, and one for the identity mapping.
Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Remove the unused and empty arch/s390/kernel/alternative.h header
file which was added by mistake.
Fixes: 5ade5be4edf8 ("s390: Add infrastructure to patch lowcore accesses")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
With the recent rewrite of the fpu code exception handling for the
lfpc instruction within load_fpu_state() was erroneously removed.
Add it again to prevent that loading invalid floating point register
values cause an unhandled specification exception.
Fixes: 8c09871a950a ("s390/fpu: limit save and restore to used registers")
Cc: stable@vger.kernel.org
Reported-by: Aristeu Rozanski <aris@redhat.com>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix KMSAN build breakage caused by the conflict between s390 and
mm-stable trees
- Add KMSAN page markers for ptdump
- Add runtime constant support
- Fix __pa/__va for modules under non-GPL licenses by exporting
necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage
problems
- Fix an endless loop in the CF_DIAG event stop in the CPU Measurement
Counter Facility code when the counter set size is zero
- Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable
its functionality by default
- Support allocation of multiple MSI interrupts per device and improve
logging of architecture-specific limitations
- Add support for lowcore relocation as a debugging feature to catch
all null ptr dereferences in the kernel address space, improving
detection beyond the current implementation's limited write access
protection
- Clean up and rework CPU alternatives to allow for callbacks and early
patching for the lowcore relocation
* tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390: Remove protvirt and kvm config guards for uv code
s390/boot: Add cmdline option to relocate lowcore
s390/kdump: Make kdump ready for lowcore relocation
s390/entry: Make system_call() ready for lowcore relocation
s390/entry: Make ret_from_fork() ready for lowcore relocation
s390/entry: Make __switch_to() ready for lowcore relocation
s390/entry: Make restart_int_handler() ready for lowcore relocation
s390/entry: Make mchk_int_handler() ready for lowcore relocation
s390/entry: Make int handlers ready for lowcore relocation
s390/entry: Make pgm_check_handler() ready for lowcore relocation
s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro
s390/entry: Add base register to SIEEXIT macro
s390/entry: Add base register to MBEAR macro
s390/entry: Make __sie64a() ready for lowcore relocation
s390/head64: Make startup code ready for lowcore relocation
s390: Add infrastructure to patch lowcore accesses
s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0
s390/entry: Move SIE indicator flag to thread info
s390/nmi: Simplify ptregs setup
s390/alternatives: Remove alternative facility list
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl constification from Joel Granados:
"Treewide constification of the ctl_table argument of proc_handlers
using a coccinelle script and some manual code formatting fixups.
This is a prerequisite to moving the static ctl_table structs into
read-only data section which will ensure that proc_handler function
pointers cannot be modified"
* tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: treewide: constify the ctl_table argument of proc_handlers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
|
|
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove tristate choice support from Kconfig
- Stop using the PROVIDE() directive in the linker script
- Reduce the number of links for the combination of CONFIG_KALLSYMS and
CONFIG_DEBUG_INFO_BTF
- Enable the warning for symbol reference to .exit.* sections by
default
- Fix warnings in RPM package builds
- Improve scripts/make_fit.py to generate a FIT image with separate
base DTB and overlays
- Improve choice value calculation in Kconfig
- Fix conditional prompt behavior in choice in Kconfig
- Remove support for the uncommon EMAIL environment variable in Debian
package builds
- Remove support for the uncommon "name <email>" form for the DEBEMAIL
environment variable
- Raise the minimum supported GNU Make version to 4.0
- Remove stale code for the absolute kallsyms
- Move header files commonly used for host programs to scripts/include/
- Introduce the pacman-pkg target to generate a pacman package used in
Arch Linux
- Clean up Kconfig
* tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits)
kbuild: doc: gcc to CC change
kallsyms: change sym_entry::percpu_absolute to bool type
kallsyms: unify seq and start_pos fields of struct sym_entry
kallsyms: add more original symbol type/name in comment lines
kallsyms: use \t instead of a tab in printf()
kallsyms: avoid repeated calculation of array size for markers
kbuild: add script and target to generate pacman package
modpost: use generic macros for hash table implementation
kbuild: move some helper headers from scripts/kconfig/ to scripts/include/
Makefile: add comment to discourage tools/* addition for kernel builds
kbuild: clean up scripts/remove-stale-files
kconfig: recursive checks drop file/lineno
kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec
kallsyms: get rid of code for absolute kallsyms
kbuild: Create INSTALL_PATH directory if it does not exist
kbuild: Abort make on install failures
kconfig: remove 'e1' and 'e2' macros from expression deduplication
kconfig: remove SYMBOL_CHOICEVAL flag
kconfig: add const qualifiers to several function arguments
kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups()
...
|
|
Removing the CONFIG_PROTECTED_VIRTUALIZATION_GUEST ifdefs and config
option as well as CONFIG_KVM ifdefs in uv files.
Having this configurable has been more of a pain than a help.
It's time to remove the ifdefs and the config option.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Now that everything has been converted, add the option
'relocate_lowcore' to enable relocating the lowcore.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in store_status()
and __do_machine_kdump().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in system_call().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in ret_from_fork().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in __switch_to().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in restart_int_handler().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in mcck_int_handler().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in the ext/io interrupt
handlers.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in pgm_check_handler().
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sve |