Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently MTE is permitted in two circumstances (desiring to use MTE
having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
specified, as checked by arch_calc_vm_flag_bits() and actualised by
setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
hook is activated in mmap_region().
The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
set is the arm64 implementation of arch_validate_flags().
Unfortunately, we intend to refactor mmap_region() to perform this check
earlier, meaning that in the case of a shmem backing we will not have
invoked shmem_mmap() yet, causing the mapping to fail spuriously.
It is inappropriate to set this architecture-specific flag in general mm
code anyway, so a sensible resolution of this issue is to instead move the
check somewhere else.
We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
the arch_calc_vm_flag_bits() call.
This is an appropriate place to do this as we already check for the
MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
the same idea - we permit RAM-backed memory.
This requires a modification to the arch_calc_vm_flag_bits() signature to
pass in a pointer to the struct file associated with the mapping, however
this is not too egregious as this is only used by two architectures anyway
- arm64 and parisc.
So this patch performs this adjustment and removes the unnecessary
assignment of VM_MTE_ALLOWED in shmem_mmap().
[akpm@linux-foundation.org: fix whitespace, per Catalin]
Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
Declarations local to arch/*/kernel/*.c are better off *not* in a public
header - arch/parisc/kernel/unaligned.h is just fine for those
bits.
With that done parisc asm/unaligned.h is reduced to include
of asm-generic/unaligned.h and can be removed - unaligned.h is in
mandatory-y in include/asm-generic/Kbuild.
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- support DMA zones for arm64 systems where memory starts at > 4GB
(Baruch Siach, Catalin Marinas)
- support direct calls into dma-iommu and thus obsolete dma_map_ops for
many common configurations (Leon Romanovsky)
- add DMA-API tracing (Sean Anderson)
- remove the not very useful return value from various dma_set_* APIs
(Christoph Hellwig)
- misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph
Hellwig)
* tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: reflow dma_supported
dma-mapping: reliably inform about DMA support for IOMMU
dma-mapping: add tracing for dma-mapping API calls
dma-mapping: use IOMMU DMA calls for common alloc/free page calls
dma-direct: optimize page freeing when it is not addressable
dma-mapping: clearly mark DMA ops as an architecture feature
vdpa_sim: don't select DMA_OPS
arm64: mm: keep low RAM dma zone
dma-mapping: don't return errors from dma_set_max_seg_size
dma-mapping: don't return errors from dma_set_seg_boundary
dma-mapping: don't return errors from dma_set_min_align_mask
scsi: check that busses support the DMA API before setting dma parameters
arm64: mm: fix DMA zone when dma-ranges is missing
dma-mapping: direct calls for dma-iommu
dma-mapping: call ->unmap_page and ->unmap_sg unconditionally
arm64: support DMA zone above 4GB
dma-mapping: replace zone_dma_bits by zone_dma_limit
dma-mapping: use bit masking to check VM_DMA_COHERENT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture updates from Helge Deller:
- On parisc we now use the generic clockevent framework for timekeeping
- Although there is no 64-bit glibc/userspace for parisc yet, for
testing purposes one can run statically linked 64-bit binaries. This
patchset contains two patches which fix 64-bit userspace which has
been broken since kernel 4.19
- Fix the userspace stack position and size when the ADDR_NO_RANDOMIZE
personality is enabled
- On other architectures mmap(MAP_GROWSDOWN | MAP_STACK) creates a
downward-growing stack. On parisc mmap(MAP_STACK) is now sufficient
to create an upward-growing stack
* tag 'parisc-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Allow mmap(MAP_STACK) memory to automatically expand upwards
parisc: Use PRIV_USER instead of hardcoded value
parisc: Fix itlb miss handler for 64-bit programs
parisc: Fix 64-bit userspace syscall path
parisc: Fix stack start for ADDR_NO_RANDOMIZE personality
parisc: Convert to generic clockevents
parisc: pdc_stable: Constify struct kobj_type
|
|
When userspace allocates memory with mmap() in order to be used for stack,
allow this memory region to automatically expand upwards up until the
current maximum process stack size.
The fault handler checks if the VM_GROWSUP bit is set in the vm_flags field
of a memory area before it allows it to expand.
This patch modifies the parisc specific code only.
A RFC for a generic patch to modify mmap() for all architectures was sent
to the mailing list but did not get enough Acks.
Reported-by: Camm Maguire <camm@maguirefamily.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v5.10+
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
For an itlb miss when executing code above 4 Gb on ILP64 adjust the
iasq/iaoq in the same way isr/ior was adjusted. This fixes signal
delivery for the 64-bit static test program from
http://ftp.parisc-linux.org/src/64bit.tar.gz. Note that signals are
handled by the signal trampoline code in the 64-bit VDSO which is mapped
into high userspace memory region above 4GB for 64-bit processes.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.19+
|
|
Add an interface for the user to notify the kernel that it is done
reading the devmem dmabuf frags returned as cmsg. The kernel will
drop the reference on the frags to make them available for reuse.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-11-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In tcp_recvmsg_locked(), detect if the skb being received by the user
is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM
flag - pass it to tcp_recvmsg_devmem() for custom handling.
tcp_recvmsg_devmem() copies any data in the skb header to the linear
buffer, and returns a cmsg to the user indicating the number of bytes
returned in the linear buffer.
tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags,
and returns to the user a cmsg_devmem indicating the location of the
data in the dmabuf device memory. cmsg_devmem contains this information:
1. the offset into the dmabuf where the payload starts. 'frag_offset'.
2. the size of the frag. 'frag_size'.
3. an opaque token 'frag_token' to return to the kernel when the buffer
is to be released.
The pages awaiting freeing are stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page. All pages are released when the socket is
destroyed.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-10-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Patch series "mm: Care about shadow stack guard gap when getting an
unmapped area", v2.
As covered in the commit log for c44357c2e76b ("x86/mm: care about shadow
stack guard gap during placement") our current mmap() implementation does
not take care to ensure that a new mapping isn't placed with existing
mappings inside it's own guard gaps. This is particularly important for
shadow stacks since if two shadow stacks end up getting placed adjacent to
each other then they can overflow into each other which weakens the
protection offered by the feature.
On x86 there is a custom arch_get_unmapped_area() which was updated by the
above commit to cover this case by specifying a start_gap for allocations
with VM_SHADOW_STACK. Both arm64 and RISC-V have equivalent features and
use the generic implementation of arch_get_unmapped_area() so let's make
the equivalent change there so they also don't get shadow stack pages
placed without guard pages. The arm64 and RISC-V shadow stack
implementations are currently on the list:
https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec94743
https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
Given the addition of the use of vm_flags in the generic implementation we
also simplify the set of possibilities that have to be dealt with in the
core code by making arch_get_unmapped_area() take vm_flags as standard.
This is a bit invasive since the prototype change touches quite a few
architectures but since the parameter is ignored the change is
straightforward, the simplification for the generic code seems worth it.
This patch (of 3):
When we introduced arch_get_unmapped_area_vmflags() in 961148704acd ("mm:
introduce arch_get_unmapped_area_vmflags()") we did so as part of properly
supporting guard pages for shadow stacks on x86_64, which uses a custom
arch_get_unmapped_area(). Equivalent features are also present on both
arm64 and RISC-V, both of which use the generic implementation of
arch_get_unmapped_area() and will require equivalent modification there.
Rather than continue to deal with having two versions of the functions
let's bite the bullet and have all implementations of
arch_get_unmapped_area() take vm_flags as a parameter.
The new parameter is currently ignored by all implementations other than
x86. The only caller that doesn't have a vm_flags available is
mm_get_unmapped_area(), as for the x86 implementation and the wrapper used
on other architectures this is modified to supply no flags.
No functional changes.
Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-0-a46b8b6dc0ed@kernel.org
Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-1-a46b8b6dc0ed@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently the glibc isn't yet ported to 64-bit for hppa, so
there is no usable userspace available yet.
But it's possible to manually build a static 64-bit binary
and run that for testing. One such 64-bit test program is
available at http://ftp.parisc-linux.org/src/64bit.tar.gz
and it shows various issues with the existing 64-bit syscall
path in the kernel.
This patch fixes those issues.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.19+
|
|
Convert parisc timer code to generic clockevents framework.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
DMA ops are a helper for architectures and not for drivers to override
the DMA implementation.
Unfortunately driver authors keep ignoring this. Make the fact more
clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers
overriding their dma_ops depend on that. These drivers should probably be
marked broken, but we can give them a bit of a grace period for that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6
Acked-by: Robin Murphy <robin.murphy@arm.com>
|
|
Do not write-protect the kernel read-only and __ro_after_init sections
earlier than before mark_rodata_ro() is called. This fixes a boot issue on
parisc which is triggered by commit 91a1d97ef482 ("jump_label,module: Don't
alloc static_key_mod for __ro_after_init keys"). That commit may modify
static key contents in the __ro_after_init section at bootup, so this
section needs to be writable at least until mark_rodata_ro() is called.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Link: https://lore.kernel.org/linux-parisc/096cad5aada514255cd7b0b9dbafc768@matoro.tk/#r
Fixes: 91a1d97ef482 ("jump_label,module: Don't alloc static_key_mod for __ro_after_init keys")
Cc: stable@vger.kernel.org # v6.10+
|
|
ARCH_DMA_MINALIGN was defined as 16 - this is too small - it may be
possible that two unrelated 16-byte allocations share a cache line. If
one of these allocations is written using DMA and the other is written
using cached write, the value that was written with DMA may be
corrupted.
This commit changes ARCH_DMA_MINALIGN to be 128 on PA20 and 32 on PA1.1 -
that's the largest possible cache line size.
As different parisc microarchitectures have different cache line size, we
define arch_slab_minalign(), cache_line_size() and
dma_get_cache_alignment() so that the kernel may tune slab cache
parameters dynamically, based on the detected cache line size.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
There were spurious unaligned access warnings when calling BPF code.
Sometimes, the warnings were triggered with any incoming packet, making
the machine hard to use.
The reason for the warnings is this: on parisc64, pointers to functions
are not really pointers to functions, they are pointers to 16-byte
descriptor. The first 8 bytes of the descriptor is a pointer to the
function and the next 8 bytes of the descriptor is the content of the
"dp" register. This descriptor is generated in the function
bpf_jit_build_prologue.
The problem is that the function bpf_int_jit_compile advertises 4-byte
alignment when calling bpf_jit_binary_alloc, bpf_jit_binary_alloc
randomizes the returned array and if the array happens to be not aligned
on 8-byte boundary, the descriptor generated in bpf_jit_build_prologue is
also not aligned and this triggers the unaligned access warning.
Fix this by advertising 8-byte alignment on parisc64 when calling
bpf_jit_binary_alloc.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"The gettimeofday() and clock_gettime() syscalls are now available as
vDSO functions, and Dave added a patch which allows to use NVMe cards
in the PCI slots as fast and easy alternative to SCSI discs.
Summary:
- add gettimeofday() and clock_gettime() vDSO functions
- enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
with PCIe NVME card to function in parisc machines
- allow users to reduce kernel unaligned runtime warnings
- minor code cleanups"
* tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
parisc: Use max() to calculate parisc_tlb_flush_threshold
parisc: Fix warning at drivers/pci/msi/msi.h:121
parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions
parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions
parisc: Clean up unistd.h file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
|
|
Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove tristate choice support from Kconfig
- Stop using the PROVIDE() directive in the linker script
- Reduce the number of links for the combination of CONFIG_KALLSYMS and
CONFIG_DEBUG_INFO_BTF
- Enable the warning for symbol reference to .exit.* sections by
default
- Fix warnings in RPM package builds
- Improve scripts/make_fit.py to generate a FIT image with separate
base DTB and overlays
- Improve choice value calculation in Kconfig
- Fix conditional prompt behavior in choice in Kconfig
- Remove support for the uncommon EMAIL environment variable in Debian
package builds
- Remove support for the uncommon "name <email>" form for the DEBEMAIL
environment variable
- Raise the minimum supported GNU Make version to 4.0
- Remove stale code for the absolute kallsyms
- Move header files commonly used for host programs to scripts/include/
- Introduce the pacman-pkg target to generate a pacman package used in
Arch Linux
- Clean up Kconfig
* tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits)
kbuild: doc: gcc to CC change
kallsyms: change sym_entry::percpu_absolute to bool type
kallsyms: unify seq and start_pos fields of struct sym_entry
kallsyms: add more original symbol type/name in comment lines
kallsyms: use \t instead of a tab in printf()
kallsyms: avoid repeated calculation of array size for markers
kbuild: add script and target to generate pacman package
modpost: use generic macros for hash table implementation
kbuild: move some helper headers from scripts/kconfig/ to scripts/include/
Makefile: add comment to discourage tools/* addition for kernel builds
kbuild: clean up scripts/remove-stale-files
kconfig: recursive checks drop file/lineno
kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec
kallsyms: get rid of code for absolute kallsyms
kbuild: Create INSTALL_PATH directory if it does not exist
kbuild: Abort make on install failures
kconfig: remove 'e1' and 'e2' macros from expression deduplication
kconfig: remove SYMBOL_CHOICEVAL flag
kconfig: add const qualifiers to several function arguments
kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups()
...
|
|
Setting '-e' flag tells shells to exit with error exit code immediately
after any of commands fails, and causes make(1) to regard recipes as
failed.
Before this, make will still continue to succeed even after the
installation failed, for example, for insufficient permission or
directory does not exist.
Signed-off-by: Zhang Bingwu <xtexchooser@duck.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When clone3() was introduced, it was not obvious how each architecture
deals with setting up the stack and keeping the register contents in
a fork()-like system call, so this was left for the architecture
maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
that already implement it.
Five years later, we still have a few architectures left that are missing
clone3(), and the macro keeps getting in the way as it's fundamentally
different from all the other __ARCH_WANT_SYS_* macros that are meant
to provide backwards-compatibility with applications using older
syscalls that are no longer provided by default.
Address this by reversing the polarity of the macro, adding an
__ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
from all the other ones.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Use max() to reduce 4 lines of code to a single line and improve its
readability.
Fixes the following Coccinelle/coccicheck warning reported by
minmax.cocci:
WARNING opportunity for max()
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Fix warning at drivers/pci/msi/msi.h:121.
Recently, I added a PCI to PCIe bridge adaptor and a PCIe NVME card
to my rp3440. Then, I noticed this warning at boot:
WARNING: CPU: 0 PID: 10 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x68/0x90
CPU: 0 PID: 10 Comm: kworker/u32:0 Not tainted 6.9.7-parisc64 #1 Debian 6.9.7-1
Hardware name: 9000/800/rp3440
Workqueue: async async_run_entry_fn
We need to select PCI_MSI_ARCH_FALLBACKS when PCI_MSI is selected.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
In the match() callback, the struct device_driver * should not be
changed, so change the function callback to be a const *. This is one
step of many towards making the driver core safe to have struct
device_driver in read-only memory.
Because the match() callback is in all busses, all busses are modified
to handle this properly. This does entail switching some container_of()
calls to container_of_const() to properly handle the constant *.
For some busses, like PCI and USB and HV, the const * is cast away in
the match callback as those busses do want to modify those structures at
this point in time (they have a local lock in the driver structure.)
That will have to be changed in the future if they wish to have their
struct device * in read-only-memory.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alex Elder <elder@kernel.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The sys_fanotify_mark() syscall on parisc uses the reverse word order
for the two halves of the 64-bit argument compared to all syscalls on
all 32-bit architectures. As far as I can tell, the problem is that
the function arguments on parisc are sorted backwards (26, 25, 24, 23,
...) compared to everyone else, so the calling conventions of using an
even/odd register pair in native word order result in the lower word
coming first in function arguments, matching the expected behavior
on little-endian architectures. The system call conventions however
ended up matching what the other 32-bit architectures do.
A glibc cleanup in 2020 changed the userspace behavior in a way that
handles all architectures consistently, but this inadvertently broke
parisc32 by changing to the same method as everyone else.
The change made it into glibc-2.35 and subsequently into debian 12
(bookworm), which is the latest stable release. This means we
need to choose between reverting the glibc change or changing the
kernel to match it again, but either hange will leave some systems
broken.
Pick the option that is more likely to help current and future
users and change the kernel to match current glibc. This also
means the behavior is now consistent across architectures, but
it breaks running new kernels with old glibc builds before 2.35.
Link: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=d150181d73d9
Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/arch/parisc/kernel/sys_parisc.c?h=57b1dfbd5b4a39d
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
I found this through code inspection, please double-check to make
sure I got the bug and the fix right.
The alternative is to fix this by reverting glibc back to the
unusual behavior.
|
|
Johannes missed parisc back when he introduced the compat version
of these syscalls, so receiving cmsg messages that require a compat
conversion is still broken.
Use the correct calls like the other architectures do.
Fixes: 1dacc76d0014 ("net/compat/wext: send different messages to compat tasks")
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add 64-bit vDSO implementations for gettimeofday() and clock_gettime().
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Add vDSO implementations for gettimeofday(), clock_gettime() and
clock_gettime64() kernel syscalls.
Currently those functions are implemented as pure syscall wrappers.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Clean up the internal unistd.h file, so that syscallX() can be used
internally to call syscalls from userspace. This is used later by the
vDSO C-code.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
PA-RISC systems with PA8800 and PA8900 processors have had problems
with random segmentation faults for many years. Systems with earlier
processors are much more stable.
Systems with PA8800 and PA8900 processors have a large L2 cache which
needs per page flushing for decent performance when a large range is
flushed. The combined cache in these systems is also more sensitive to
non-equivalent aliases than the caches in earlier systems.
The majority of random segmentation faults that I have looked at
appear to be memory corruption in memory allocated using mmap and
malloc.
My first attempt at fixing the random faults didn't work. On
reviewing the cache code, I realized that there were two issues
which the existing code didn't handle correctly. Both relate
to cache move-in. Another issue is that the present bit in PTEs
is racy.
1) PA-RISC caches have a mind of their own and they can speculatively
load data and instructions for a page as long as there is a entry in
the TLB for the page which allows move-in. TLBs are local to each
CPU. Thus, the TLB entry for a page must be purged before flushing
the page. This is particularly important on SMP systems.
In some of the flush routines, the flush routine would be called
and then the TLB entry would be purged. This was because the flush
routine needed the TLB entry to do the flush.
2) My initial approach to trying the fix the random faults was to
try and use flush_cache_page_if_present for all flush operations.
This actually made things worse and led to a couple of hardware
lockups. It finally dawned on me that some lines weren't being
flushed because the pte check code was racy. This resulted in
random inequivalent mappings to physical pages.
The __flush_cache_page tmpalias flush sets up its own TLB entry
and it doesn't need the existing TLB entry. As long as we can find
the pte pointer for the vm page, we can get the pfn and physical
address of the page. We can also purge the TLB entry for the page
before doing the flush. Further, __flush_cache_page uses a special
TLB entry that inhibits cache move-in.
When switching page mappings, we need to ensure that lines are
removed from the cache. It is not sufficient to just flush the
lines to memory as they may come back.
This made it clear that we needed to implement all the required
flush operations using tmpalias routines. This includes flushes
for user and kernel pages.
After modifying the code to use tmpalias flushes, it became clear
that the random segmentation faults were not fully resolved. The
frequency of faults was worse on systems with a 64 MB L2 (PA8900)
and systems with more CPUs (rp4440).
The warning that I added to flush_cache_page_if_present to detect
pages that couldn't be flushed triggered frequently on some systems.
Helge and I looked at the pages that couldn't be flushed and found
that the PTE was either cleared or for a swap page. Ignoring pages
that were swapped out seemed okay but pages with cleared PTEs seemed
problematic.
I looked at routines related to pte_clear and noticed ptep_clear_flush.
The default implementation just flushes the TLB entry. However, it was
obvious that on parisc we need to flush the cache page as well. If
we don't flush the cache page, stale lines will be left in the cache
and cause random corruption. Once a PTE is cleared, there is no way
to find the physical address associated with the PTE and flush the
associated page at a later time.
I implemented an updated change with a parisc specific version of
ptep_clear_flush. It fixed the random data corruption on Helge's rp4440
and rp3440, as well as on my c8000.
At this point, I realized that I could restore the code where we only
flush in flush_cache_page_if_present if the page has been accessed.
However, for this, we also need to flush the cache when the accessed
bit is cleared in ptep_clear_flush_young to keep things synchronized.
The default implementation only flushes the TLB entry.
Other changes in this version are:
1) Implement parisc specific version of ptep_get. It's identical to
default but needed in arch/parisc/include/asm/pgtable.h.
2) Revise parisc implementation of ptep_test_and_clear_young to use
ptep_get (READ_ONCE).
3) Drop parisc implementation of ptep_get_and_clear. We can use default.
4) Revise flush_kernel_vmap_range and invalidate_kernel_vmap_range to
use full data cache flush.
5) Move flush_cache_vmap and flush_cache_vunmap to cache.c. Handle
VM_IOREMAP case in flush_cache_vmap.
At this time, I don't know whether it is better to always flush when
the PTE present bit is set or when both the accessed and present bits
are set. The later saves flushing pages that haven't been accessed,
but we need to flush in ptep_clear_flush_young. It also needs a page
table lookup to find the PTE pointer. The lpa instruction only needs
a page table lookup when the PTE entry isn't in the TLB.
We don't atomically handle setting and clearing the _PAGE_ACCESSED bit.
If we miss an update, we may miss a flush and the cache may get corrupted.
Whether the current code is effectively atomic depends on process control.
When CONFIG_FLUSH_PAGE_ACCESSED is set to zero, the page will eventually
be flushed when the PTE is cleared or in flush_cache_page_if_present. The
_PAGE_ACCESSED bit is not used, so the problem is avoided.
The flush method can be selected using the CONFIG_FLUSH_PAGE_ACCESSED
define in cache.c. The defaul |