diff options
| author | Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> | 2025-10-06 15:21:22 +0530 |
|---|---|---|
| committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2026-01-08 10:17:16 +0100 |
| commit | e983164f179e22f095db0842bb2a6d6495ea44df (patch) | |
| tree | cc6615af3adb11de5be259d0bdf91f5595bcca66 /include | |
| parent | 2c9ba2fbcd97c02c1cb23ec80290494a808f1690 (diff) | |
| download | linux-e983164f179e22f095db0842bb2a6d6495ea44df.tar.gz linux-e983164f179e22f095db0842bb2a6d6495ea44df.tar.bz2 linux-e983164f179e22f095db0842bb2a6d6495ea44df.zip | |
drm/buddy: Optimize free block management with RB tree
commit c178e534fff1d5a74da80ea03b20e2b948a00113 upstream.
Replace the freelist (O(n)) used for free block management with a
red-black tree, providing more efficient O(log n) search, insert,
and delete operations. This improves scalability and performance
when managing large numbers of free blocks per order (e.g., hundreds
or thousands).
In the VK-CTS memory stress subtest, the buddy manager merges
fragmented memory and inserts freed blocks into the freelist. Since
freelist insertion is O(n), this becomes a bottleneck as fragmentation
increases. Benchmarking shows list_insert_sorted() consumes ~52.69% CPU
with the freelist, compared to just 0.03% with the RB tree
(rbtree_insert.isra.0), despite performing the same sorted insert.
This also improves performance in heavily fragmented workloads,
such as games or graphics tests that stress memory.
As the buddy allocator evolves with new features such as clear-page
tracking, the resulting fragmentation and complexity have grown.
These RB-tree based design changes are introduced to address that
growth and ensure the allocator continues to perform efficiently
under fragmented conditions.
The RB tree implementation with separate clear/dirty trees provides:
- O(n log n) aggregate complexity for all operations instead of O(n^2)
- Elimination of soft lockups and system instability
- Improved code maintainability and clarity
- Better scalability for large memory systems
- Predictable performance under fragmentation
v3(Matthew):
- Remove RB_EMPTY_NODE check in force_merge function.
- Rename rb for loop macros to have less generic names and move to
.c file.
- Make the rb node rb and link field as union.
v4(Jani Nikula):
- The kernel-doc comment should be "/**"
- Move all the rbtree macros to rbtree.h and add parens to ensure
correct precedence.
v5:
- Remove the inline in a .c file (Jani Nikula).
v6(Peter Zijlstra):
- Add rb_add() function replacing the existing rbtree_insert() code.
v7:
- A full walk iteration in rbtree is slower than the list (Peter Zijlstra).
- The existing rbtree_postorder_for_each_entry_safe macro should be used
in scenarios where traversal order is not a critical factor (Christian).
v8(Matthew):
- Remove the rbtree_is_empty() check in this patch as well.
Cc: stable@vger.kernel.org
Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20251006095124.1663-1-Arunpravin.PaneerSelvam@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'include')
| -rw-r--r-- | include/drm/drm_buddy.h | 11 |
1 files changed, 8 insertions, 3 deletions
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 04afd7c21a82..f22d292443db 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -10,6 +10,7 @@ #include <linux/list.h> #include <linux/slab.h> #include <linux/sched.h> +#include <linux/rbtree.h> #include <drm/drm_print.h> @@ -44,7 +45,11 @@ struct drm_buddy_block { * a list, if so desired. As soon as the block is freed with * drm_buddy_free* ownership is given back to the mm. */ - struct list_head link; + union { + struct rb_node rb; + struct list_head link; + }; + struct list_head tmp_link; }; @@ -59,7 +64,7 @@ struct drm_buddy_block { */ struct drm_buddy { /* Maintain a free list for each order. */ - struct list_head *free_list; + struct rb_root *free_tree; /* * Maintain explicit binary tree(s) to track the allocation of the @@ -85,7 +90,7 @@ struct drm_buddy { }; static inline u64 -drm_buddy_block_offset(struct drm_buddy_block *block) +drm_buddy_block_offset(const struct drm_buddy_block *block) { return block->header & DRM_BUDDY_HEADER_OFFSET; } |
