linux.git - Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Age	Commit message (Collapse)	Author	Files	Lines
2024-01-31	binder: signal epoll threads of self-work	Carlos Llamas	1	-0/+10
	In (e)poll mode, threads often depend on I/O events to determine when data is ready for consumption. Within binder, a thread may initiate a command via BINDER_WRITE_READ without a read buffer and then make use of epoll_wait() or similar to consume any responses afterwards. It is then crucial that epoll threads are signaled via wakeup when they queue their own work. Otherwise, they risk waiting indefinitely for an event leaving their work unhandled. What is worse, subsequent commands won't trigger a wakeup either as the thread has pending work. Fixes: 457b9a6f09f0 ("Staging: android: add binder driver") Cc: Arve Hjønnevåg <arve@android.com> Cc: Martijn Coenen <maco@android.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Steven Moreland <smoreland@google.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20240131215347.1808751-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-17	Merge tag 'char-misc-6.8-rc1' of ↵	Linus Torvalds	6	-475/+494
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver updates from Greg KH: "Here is the big set of char/misc and other driver subsystem changes for 6.8-rc1. Other than lots of binder driver changes (as you can see by the merge conflicts) included in here are: - lots of iio driver updates and additions - spmi driver updates - eeprom driver updates - firmware driver updates - ocxl driver updates - mhi driver updates - w1 driver updates - nvmem driver updates - coresight driver updates - platform driver remove callback api changes - tags.sh script updates - bus_type constant marking cleanups - lots of other small driver updates All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (341 commits) android: removed duplicate linux/errno uio: Fix use-after-free in uio_open drivers: soc: xilinx: add check for platform firmware: xilinx: Export function to use in other module scripts/tags.sh: remove find_sources scripts/tags.sh: use -n to test archinclude scripts/tags.sh: add local annotation scripts/tags.sh: use more portable -path instead of -wholename scripts/tags.sh: Update comment (addition of gtags) firmware: zynqmp: Convert to platform remove callback returning void firmware: turris-mox-rwtm: Convert to platform remove callback returning void firmware: stratix10-svc: Convert to platform remove callback returning void firmware: stratix10-rsu: Convert to platform remove callback returning void firmware: raspberrypi: Convert to platform remove callback returning void firmware: qemu_fw_cfg: Convert to platform remove callback returning void firmware: mtk-adsp-ipc: Convert to platform remove callback returning void firmware: imx-dsp: Convert to platform remove callback returning void firmware: coreboot_table: Convert to platform remove callback returning void firmware: arm_scpi: Convert to platform remove callback returning void firmware: arm_scmi: Convert to platform remove callback returning void ...
2024-01-09	Merge tag 'mm-stable-2024-01-08-15-31' of ↵	Linus Torvalds	1	-4/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-07	android: removed duplicate linux/errno	Tanzir Hasan	1	-1/+0
	There are two linux/errno.h inclusions in this file. The second one has been removed and the file builds correctly. Fixes: 54ffdab82080 ("android: binder: binderfs.c: removed asm-generic/errno-base.h") Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Tanzir Hasan <tanzirh@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20240104-removeduperror-v1-1-d170d4b3675a@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-04	android: binder: binderfs.c: removed asm-generic/errno-base.h	Tanzir Hasan	1	-1/+1
	asm-generic/errno-base.h can be replaced by linux/errno.h and the file will still build correctly. It is an asm-generic file which should be avoided if possible. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tanzir Hasan <tanzirh@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231226-binderfs-v1-1-66829e92b523@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-12	list_lru: allow explicit memcg and NUMA node selection	Nhat Pham	1	-4/+3
	Patch series "workload-specific and memory pressure-driven zswap writeback", v8. There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. This patch (of 6): The interface of list_lru is based on the assumption that the list node and the data it represents belong to the same allocated on the correct node/memcg. While this assumption is valid for existing slab objects LRU such as dentries and inodes, it is undocumented, and rather inflexible for certain potential list_lru users (such as the upcoming zswap shrinker and the THP shrinker). It has caused us a lot of issues during our development. This patch changes list_lru interface so that the caller must explicitly specify numa node and memcg when adding and removing objects. The old list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and list_lru_del_obj(), respectively. It also extends the list_lru API with a new function, list_lru_putback, which undoes a previous list_lru_isolate call. Unlike list_lru_add, it does not increment the LRU node count (as list_lru_isolate does not decrement the node count). list_lru_putback also allows for explicit memcg and NUMA node selection. Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12	file: s/close_fd_get_file()/file_close_fd()/g	Christian Brauner	1	-1/+1
	That really shouldn't have "get" in there as that implies we're bumping the reference count which we don't do at all. We used to but not anmore. Now we're just closing the fd and pick that file from the fdtable without bumping the reference count. Update the wrong documentation while at it. Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-06	android: binder: fix a kernel-doc enum warning	Randy Dunlap	1	-0/+4
	Add kernel-doc notation for @LOOP_END to prevent a kernel-doc warning. binder_alloc_selftest.c:76: warning: Enum value 'LOOP_END' not described in enum 'buf_end_align_type' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Martijn Coenen <maco@android.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231205225324.32362-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: switch alloc->mutex to spinlock_t	Carlos Llamas	2	-28/+28
	The alloc->mutex is a highly contended lock that causes performance issues on Android devices. When a low-priority task is given this lock and it sleeps, it becomes difficult for the task to wake up and complete its work. This delays other tasks that are also waiting on the mutex. The problem gets worse when there is memory pressure in the system, because this increases the contention on the alloc->mutex while the shrinker reclaims binder pages. Switching to a spinlock helps to keep the waiters running and avoids the overhead of waking up tasks. This significantly improves the transaction latency when the problematic scenario occurs. The performance impact of this patchset was measured by stress-testing the binder alloc contention. In this test, several clients of different priorities send thousands of transactions of different sizes to a single server. In parallel, pages get reclaimed using the shinker's debugfs. The test was run on a Pixel 8, Pixel 6 and qemu machine. The results were similar on all three devices: after: \| sched \| prio \| average \| max \| min \| \|--------+------+---------+-----------+---------\| \| fifo \| 99 \| 0.135ms \| 1.197ms \| 0.022ms \| \| fifo \| 01 \| 0.136ms \| 5.232ms \| 0.018ms \| \| other \| -20 \| 0.180ms \| 7.403ms \| 0.019ms \| \| other \| 19 \| 0.241ms \| 58.094ms \| 0.018ms \| before: \| sched \| prio \| average \| max \| min \| \|--------+------+---------+-----------+---------\| \| fifo \| 99 \| 0.350ms \| 248.730ms \| 0.020ms \| \| fifo \| 01 \| 0.357ms \| 248.817ms \| 0.024ms \| \| other \| -20 \| 0.399ms \| 249.906ms \| 0.020ms \| \| other \| 19 \| 0.477ms \| 297.756ms \| 0.022ms \| The key metrics above are the average and max latencies (wall time). These improvements should roughly translate to p95-p99 latencies on real workloads. The response time is up to 200x faster in these scenarios and there is no penalty in the regular path. Note that it is only possible to convert this lock after a series of changes made by previous patches. These mainly include refactoring the sections that might_sleep() and changing the locking order with the mmap_lock amongst others. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-29-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: reverse locking order in shrinker callback	Carlos Llamas	1	-24/+22
	The locking order currently requires the alloc->mutex to be acquired first followed by the mmap lock. However, the alloc->mutex is converted into a spinlock in subsequent commits so the order needs to be reversed to avoid nesting the sleeping mmap lock under the spinlock. The shrinker's callback binder_alloc_free_page() is the only place that needs to be reordered since other functions have been refactored and no longer nest these locks. Some minor cosmetic changes are also included in this patch. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-28-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: avoid user addresses in debug logs	Carlos Llamas	2	-11/+8
	Prefer logging vma offsets instead of addresses or simply drop the debug log altogether if not useful. Note this covers the instances affected by the switch to store addresses as unsigned long. However, there are other sections in the driver that could do the same. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-27-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: refactor binder_delete_free_buffer()	Carlos Llamas	1	-33/+11
	Skip the freelist call immediately as needed, instead of continuing the pointless checks. Also, drop the debug logs that we don't really need. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-26-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: collapse print_binder_buffer() into caller	Carlos Llamas	1	-13/+9
	The code in print_binder_buffer() is quite small so it can be collapsed into its single caller binder_alloc_print_allocated(). No functional change in this patch. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-25-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: document the final page calculation	Carlos Llamas	1	-7/+11
	The code to determine the page range for binder_lru_freelist_del() is quite obscure. It leverages the buffer_size calculated before doing an oversized buffer split. This is used to figure out if the last page is being shared with another active buffer. If so, the page gets trimmed out of the range as it has been previously removed from the freelist. This would be equivalent to getting the start page of the next in-use buffer explicitly. However, the code for this is much larger as we can see in binder_free_buf_locked() routine. Instead, lets settle on documenting the tricky step and using better names for now. I believe an ideal solution would be to count the binder_page->users to determine when a page should be added or removed from the freelist. However, this is a much bigger change than what I'm willing to risk at this time. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-24-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: rename lru shrinker utilities	Carlos Llamas	3	-25/+25
	Now that the page allocation step is done separately we should rename the binder_free_page_range() and binder_allocate_page_range() functions to provide a more accurate description of what they do. Lets borrow the freelist concept used in other parts of the kernel for this. No functional change here. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-23-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: make oversized buffer code more readable	Carlos Llamas	1	-11/+10
	The sections in binder_alloc_new_buf_locked() dealing with oversized buffers are scattered which makes them difficult to read. Instead, consolidate this code into a single block to improve readability. No functional change here. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-22-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: remove redundant debug log	Carlos Llamas	1	-3/+0
	The debug information in this statement is already logged earlier in the same function. We can get rid of this duplicate log. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-21-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: perform page installation outside of locks	Carlos Llamas	1	-28/+73
	Split out the insertion of pages to be outside of the alloc->mutex in a separate binder_install_buffer_pages() routine. Since this is no longer serialized, we must look at the full range of pages used by the buffers. The installation is protected with mmap_sem in write mode since multiple tasks might race to install the same page. Besides avoiding unnecessary nested locking this helps in preparation of switching the alloc->mutex into a spinlock_t in subsequent patches. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-20-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: initialize lru pages in mmap callback	Carlos Llamas	1	-5/+7
	Rather than repeatedly initializing some of the binder_lru_page members during binder_alloc_new_buf(), perform this initialization just once in binder_alloc_mmap_handler(), after the pages have been created. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-19-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: malloc new_buffer outside of locks	Carlos Llamas	1	-21/+23
	Preallocate new_buffer before acquiring the alloc->mutex and hand it down to binder_alloc_new_buf_locked(). The new buffer will be used in the vast majority of requests (measured at 98.2% in field data). The buffer is discarded otherwise. This change is required in preparation for transitioning alloc->mutex into a spinlock in subsequent commits. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-18-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: refactor page range allocation	Carlos Llamas	1	-60/+47
	Instead of looping through the page range twice to first determine if the mmap lock is required, simply do it per-page as needed. Split out all this logic into a separate binder_install_single_page() function. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-17-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: relocate binder_alloc_clear_buf()	Carlos Llamas	1	-63/+61
	Move this function up along with binder_alloc_get_page() so that their prototypes aren't necessary. No functional change in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-16-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: relocate low space calculation	Carlos Llamas	1	-10/+11
	Move the low async space calculation to debug_low_async_space_locked(). This logic not only fits better here but also offloads some of the many tasks currently done in binder_alloc_new_buf_locked(). No functional change in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-15-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: separate the no-space debugging logic	Carlos Llamas	1	-31/+40
	Move the no-space debugging logic into a separate function. Lets also mark this branch as unlikely in binder_alloc_new_buf_locked() as most requests will fit without issue. Also add a few cosmetic changes and suggestions from checkpatch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-14-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: remove pid param in binder_alloc_new_buf()	Carlos Llamas	4	-17/+12
	Binder attributes the buffer allocation to the current->tgid everytime. There is no need to pass this as a parameter so drop it. Also add a few touchups to follow the coding guidelines. No functional changes are introduced in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-13-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: do unlocked work in binder_alloc_new_buf()	Carlos Llamas	1	-39/+56
	Extract non-critical sections from binder_alloc_new_buf_locked() that don't require holding the alloc->mutex. While we are here, consolidate the checks for size overflow and zero-sized padding into a separate sanitized_size() helper function. Also add a few touchups to follow the coding guidelines. Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-12-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: split up binder_update_page_range()	Carlos Llamas	1	-39/+40
	The binder_update_page_range() function performs both allocation and freeing of binder pages. However, these two operations are unrelated and have no common logic. In fact, when a free operation is requested, the allocation logic is skipped entirely. This behavior makes the error path unnecessarily complex. To improve readability of the code, this patch splits the allocation and freeing operations into separate functions. No functional changes are introduced by this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-11-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: keep vma addresses type as unsigned long	Carlos Llamas	5	-69/+57
	The vma addresses in binder are currently stored as void __user *. This requires casting back and forth between the mm/ api which uses unsigned long. Since we also do internal arithmetic on these addresses we end up having to cast them _again_ to an integer type. Lets stop all the unnecessary casting which kills code readability and store the virtual addresses as the native unsigned long from mm/. Note that this approach is preferred over uintptr_t as Linus explains in [1]. Opportunistically add a few cosmetic touchups. Link: https://lore.kernel.org/all/CAHk-=wj2OHy-5e+srG1fy+ZU00TmZ1NFp6kFLbVLMXHe7A1d-g@mail.gmail.com/ [1] Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-10-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: remove extern from function prototypes	Carlos Llamas	1	-19/+19
	The kernel coding style does not require 'extern' in function prototypes in .h files, so remove them from drivers/android/binder_alloc.h as they are not needed. No functional changes in this patch. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-9-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: fix comment on binder_alloc_new_buf() return value	Carlos Llamas	1	-1/+1
	Update the comments of binder_alloc_new_buf() to reflect that the return value of the function is now ERR_PTR(-errno) on failure. No functional changes in this patch. Cc: stable@vger.kernel.org Fixes: 57ada2fb2250 ("binder: add log information for binder transaction failures") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-8-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: fix trivial typo of binder_free_buf_locked()	Carlos Llamas	1	-1/+1
	Fix minor misspelling of the function in the comment section. No functional changes in this patch. Cc: stable@vger.kernel.org Fixes: 0f966cba95c7 ("binder: add flag to clear buffer on txn complete") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-7-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: fix unused alloc->free_async_space	Carlos Llamas	1	-7/+4
	Each transaction is associated with a 'struct binder_buffer' that stores the metadata about its buffer area. Since commit 74310e06be4d ("android: binder: Move buffer out of area shared with user space") this struct is no longer embedded within the buffer itself but is instead allocated on the heap to prevent userspace access to this driver-exclusive info. Unfortunately, the space of this struct is still being accounted for in the total buffer size calculation, specifically for async transactions. This results in an additional 104 bytes added to every async buffer request, and this area is never used. This wasted space can be substantial. If we consider the maximum mmap buffer space of SZ_4M, the driver will reserve half of it for async transactions, or 0x200000. This area should, in theory, accommodate up to 262,144 buffers of the minimum 8-byte size. However, after adding the extra 'sizeof(struct binder_buffer)', the total number of buffers drops to only 18,724, which is a sad 7.14% of the actual capacity. This patch fixes the buffer size calculation to enable the utilization of the entire async buffer space. This is expected to reduce the number of -ENOSPC errors that are seen on the field. Fixes: 74310e06be4d ("android: binder: Move buffer out of area shared with user space") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-6-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: fix async space check for 0-sized buffers	Carlos Llamas	1	-3/+4
	Move the padding of 0-sized buffers to an earlier stage to account for this round up during the alloc->free_async_space check. Fixes: 74310e06be4d ("android: binder: Move buffer out of area shared with user space") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-5-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: fix race between mmput() and do_exit()	Carlos Llamas	1	-2/+2
	Task A calls binder_update_page_range() to allocate and insert pages on a remote address space from Task B. For this, Task A pins the remote mm via mmget_not_zero() first. This can race with Task B do_exit() and the final mmput() refcount decrement will come from Task A. Task A \| Task B ------------------+------------------ mmget_not_zero() \| \| do_exit() \| exit_mm() \| mmput() mmput() \| exit_mmap() \| remove_vma() \| fput() \| In this case, the work of ____fput() from Task B is queued up in Task A as TWA_RESUME. So in theory, Task A returns to userspace and the cleanup work gets executed. However, Task A instead sleep, waiting for a reply from Task B that never comes (it's dead). This means the binder_deferred_release() is blocked until an unrelated binder event forces Task A to go back to userspace. All the associated death notifications will also be delayed until then. In order to fix this use mmput_async() that will schedule the work in the corresponding mm->async_put_work WQ instead of Task A. Fixes: 457b9a6f09f0 ("Staging: android: add binder driver") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-4-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: fix use-after-free in shinker's callback	Carlos Llamas	1	-1/+5
	The mmap read lock is used during the shrinker's callback, which means that using alloc->vma pointer isn't safe as it can race with munmap(). As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") the mmap lock is downgraded after the vma has been isolated. I was able to reproduce this issue by manually adding some delays and triggering page reclaiming through the shrinker's debug sysfs. The following KASAN report confirms the UAF: ================================================================== BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8 Read of size 8 at addr ffff356ed50e50f0 by task bash/478 CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70 Hardware name: linux,dummy-virt (DT) Call trace: zap_page_range_single+0x470/0x4b8 binder_alloc_free_page+0x608/0xadc __list_lru_walk_one+0x130/0x3b0 list_lru_walk_node+0xc4/0x22c binder_shrink_scan+0x108/0x1dc shrinker_debugfs_scan_write+0x2b4/0x500 full_proxy_write+0xd4/0x140 vfs_write+0x1ac/0x758 ksys_write+0xf0/0x1dc __arm64_sys_write+0x6c/0x9c Allocated by task 492: kmem_cache_alloc+0x130/0x368 vm_area_alloc+0x2c/0x190 mmap_region+0x258/0x18bc do_mmap+0x694/0xa60 vm_mmap_pgoff+0x170/0x29c ksys_mmap_pgoff+0x290/0x3a0 __arm64_sys_mmap+0xcc/0x144 Freed by task 491: kmem_cache_free+0x17c/0x3c8 vm_area_free_rcu_cb+0x74/0x98 rcu_core+0xa38/0x26d4 rcu_core_si+0x10/0x1c __do_softirq+0x2fc/0xd24 Last potentially related work creation: __call_rcu_common.constprop.0+0x6c/0xba0 call_rcu+0x10/0x1c vm_area_free+0x18/0x24 remove_vma+0xe4/0x118 do_vmi_align_munmap.isra.0+0x718/0xb5c do_vmi_munmap+0xdc/0x1fc __vm_munmap+0x10c/0x278 __arm64_sys_munmap+0x58/0x7c Fix this issue by performing instead a vma_lookup() which will fail to find the vma that was isolated before the mmap lock downgrade. Note that this option has better performance than upgrading to a mmap write lock which would increase contention. Plus, mmap_write_trylock() has been recently removed anyway. Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Cc: stable@vger.kernel.org Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05	binder: use EPOLLERR from eventpoll.h	Carlos Llamas	1	-1/+1
	Use EPOLLERR instead of POLLERR to make sure it is cast to the correct __poll_t type. This fixes the following sparse issue: drivers/android/binder.c:5030:24: warning: incorrect type in return expression (different base types) drivers/android/binder.c:5030:24: expected restricted __poll_t drivers/android/binder.c:5030:24: got int Fixes: f88982679f54 ("binder: check for binder_thread allocation failure in binder_poll()") Cc: stable@vger.kernel.org Cc: Eric Biggers <ebiggers@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20231201172212.1813387-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-03	Merge tag 'char-misc-6.7-rc1' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc updates from Greg KH: "Here is the big set of char/misc and other small driver subsystem changes for 6.7-rc1. Included in here are: - IIO subsystem driver updates and additions (largest part of this pull request) - FPGA subsystem driver updates - Counter subsystem driver updates - ICC subsystem driver updates - extcon subsystem driver updates - mei driver updates and additions - nvmem subsystem driver updates and additions - comedi subsystem dependency fixes - parport driver fixups - cdx subsystem driver and core updates - splice support for /dev/zero and /dev/full - other smaller driver cleanups All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (326 commits) cdx: add sysfs for subsystem, class and revision cdx: add sysfs for bus reset cdx: add support for bus enable and disable cdx: Register cdx bus as a device on cdx subsystem cdx: Create symbol namespaces for cdx subsystem cdx: Introduce lock to protect controller ops cdx: Remove cdx controller list from cdx bus system dts: ti: k3-am625-beagleplay: Add beaglecc1352 greybus: Add BeaglePlay Linux Driver dt-bindings: net: Add ti,cc1352p7 dt-bindings: eeprom: at24: allow NVMEM cells based on old syntax dt-bindings: nvmem: SID: allow NVMEM cells based on old syntax Revert "nvmem: add new config option" MAINTAINERS: coresight: Add missing Coresight files misc: pci_endpoint_test: Add deviceID for J721S2 PCIe EP device support firmware: xilinx: Move EXPORT_SYMBOL_GPL next to zynqmp_pm_feature definition uacce: make uacce_class constant ocxl: make ocxl_class constant cxl: make cxl_class constant misc: phantom: make phantom_class constant ...
2023-11-02	Merge tag 'mm-stable-2023-11-01-14-33' of ↵	Linus Torvalds	1	-12/+18
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA loc