summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorArunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>2025-10-06 15:21:22 +0530
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2026-01-08 10:17:16 +0100
commite983164f179e22f095db0842bb2a6d6495ea44df (patch)
treecc6615af3adb11de5be259d0bdf91f5595bcca66 /include
parent2c9ba2fbcd97c02c1cb23ec80290494a808f1690 (diff)
downloadlinux-e983164f179e22f095db0842bb2a6d6495ea44df.tar.gz
linux-e983164f179e22f095db0842bb2a6d6495ea44df.tar.bz2
linux-e983164f179e22f095db0842bb2a6d6495ea44df.zip
drm/buddy: Optimize free block management with RB tree
commit c178e534fff1d5a74da80ea03b20e2b948a00113 upstream. Replace the freelist (O(n)) used for free block management with a red-black tree, providing more efficient O(log n) search, insert, and delete operations. This improves scalability and performance when managing large numbers of free blocks per order (e.g., hundreds or thousands). In the VK-CTS memory stress subtest, the buddy manager merges fragmented memory and inserts freed blocks into the freelist. Since freelist insertion is O(n), this becomes a bottleneck as fragmentation increases. Benchmarking shows list_insert_sorted() consumes ~52.69% CPU with the freelist, compared to just 0.03% with the RB tree (rbtree_insert.isra.0), despite performing the same sorted insert. This also improves performance in heavily fragmented workloads, such as games or graphics tests that stress memory. As the buddy allocator evolves with new features such as clear-page tracking, the resulting fragmentation and complexity have grown. These RB-tree based design changes are introduced to address that growth and ensure the allocator continues to perform efficiently under fragmented conditions. The RB tree implementation with separate clear/dirty trees provides: - O(n log n) aggregate complexity for all operations instead of O(n^2) - Elimination of soft lockups and system instability - Improved code maintainability and clarity - Better scalability for large memory systems - Predictable performance under fragmentation v3(Matthew): - Remove RB_EMPTY_NODE check in force_merge function. - Rename rb for loop macros to have less generic names and move to .c file. - Make the rb node rb and link field as union. v4(Jani Nikula): - The kernel-doc comment should be "/**" - Move all the rbtree macros to rbtree.h and add parens to ensure correct precedence. v5: - Remove the inline in a .c file (Jani Nikula). v6(Peter Zijlstra): - Add rb_add() function replacing the existing rbtree_insert() code. v7: - A full walk iteration in rbtree is slower than the list (Peter Zijlstra). - The existing rbtree_postorder_for_each_entry_safe macro should be used in scenarios where traversal order is not a critical factor (Christian). v8(Matthew): - Remove the rbtree_is_empty() check in this patch as well. Cc: stable@vger.kernel.org Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251006095124.1663-1-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'include')
-rw-r--r--include/drm/drm_buddy.h11
1 files changed, 8 insertions, 3 deletions
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
index 04afd7c21a82..f22d292443db 100644
--- a/include/drm/drm_buddy.h
+++ b/include/drm/drm_buddy.h
@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/sched.h>
+#include <linux/rbtree.h>
#include <drm/drm_print.h>
@@ -44,7 +45,11 @@ struct drm_buddy_block {
* a list, if so desired. As soon as the block is freed with
* drm_buddy_free* ownership is given back to the mm.
*/
- struct list_head link;
+ union {
+ struct rb_node rb;
+ struct list_head link;
+ };
+
struct list_head tmp_link;
};
@@ -59,7 +64,7 @@ struct drm_buddy_block {
*/
struct drm_buddy {
/* Maintain a free list for each order. */
- struct list_head *free_list;
+ struct rb_root *free_tree;
/*
* Maintain explicit binary tree(s) to track the allocation of the
@@ -85,7 +90,7 @@ struct drm_buddy {
};
static inline u64
-drm_buddy_block_offset(struct drm_buddy_block *block)
+drm_buddy_block_offset(const struct drm_buddy_block *block)
{
return block->header & DRM_BUDDY_HEADER_OFFSET;
}