diff options
93 files changed, 7052 insertions, 3821 deletions
diff --git a/.clang-format b/.clang-format index 1d5da22e0ba5..e6080f5834a3 100644 --- a/.clang-format +++ b/.clang-format @@ -323,7 +323,6 @@ ForEachMacros: - 'protocol_for_each_card' - 'protocol_for_each_dev' - 'queue_for_each_hw_ctx' - - 'radix_tree_for_each_contig' - 'radix_tree_for_each_slot' - 'radix_tree_for_each_tagged' - 'rbtree_postorder_for_each_entry_safe' @@ -119,6 +119,13 @@ Mark Brown <broonie@sirena.org.uk> Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com> Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com> Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com> +Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com> +Matthew Wilcox <willy@infradead.org> <matthew@wil.cx> +Matthew Wilcox <willy@infradead.org> <mawilcox@linuxonhyperv.com> +Matthew Wilcox <willy@infradead.org> <mawilcox@microsoft.com> +Matthew Wilcox <willy@infradead.org> <willy@debian.org> +Matthew Wilcox <willy@infradead.org> <willy@linux.intel.com> +Matthew Wilcox <willy@infradead.org> <willy@parisc-linux.org> Matthieu CASTET <castet.matthieu@free.fr> Mauro Carvalho Chehab <mchehab@kernel.org> <mchehab@brturbo.com.br> Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com> diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 29c790f571a5..3adee82be311 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -21,6 +21,7 @@ Core utilities local_ops workqueue genericirq + xarray flexible-arrays librs genalloc diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst new file mode 100644 index 000000000000..a4e705108f42 --- /dev/null +++ b/Documentation/core-api/xarray.rst @@ -0,0 +1,435 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +====== +XArray +====== + +:Author: Matthew Wilcox + +Overview +======== + +The XArray is an abstract data type which behaves like a very large array +of pointers. It meets many of the same needs as a hash or a conventional +resizable array. Unlike a hash, it allows you to sensibly go to the +next or previous entry in a cache-efficient manner. In contrast to a +resizable array, there is no need to copy data or change MMU mappings in +order to grow the array. It is more memory-efficient, parallelisable +and cache friendly than a doubly-linked list. It takes advantage of +RCU to perform lookups without locking. + +The XArray implementation is efficient when the indices used are densely +clustered; hashing the object and using the hash as the index will not +perform well. The XArray is optimised for small indices, but still has +good performance with large indices. If your index can be larger than +``ULONG_MAX`` then the XArray is not the data type for you. The most +important user of the XArray is the page cache. + +Each non-``NULL`` entry in the array has three bits associated with +it called marks. Each mark may be set or cleared independently of +the others. You can iterate over entries which are marked. + +Normal pointers may be stored in the XArray directly. They must be 4-byte +aligned, which is true for any pointer returned from :c:func:`kmalloc` and +:c:func:`alloc_page`. It isn't true for arbitrary user-space pointers, +nor for function pointers. You can store pointers to statically allocated +objects, as long as those objects have an alignment of at least 4. + +You can also store integers between 0 and ``LONG_MAX`` in the XArray. +You must first convert it into an entry using :c:func:`xa_mk_value`. +When you retrieve an entry from the XArray, you can check whether it is +a value entry by calling :c:func:`xa_is_value`, and convert it back to +an integer by calling :c:func:`xa_to_value`. + +Some users want to store tagged pointers instead of using the marks +described above. They can call :c:func:`xa_tag_pointer` to create an +entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry +back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve +the tag of an entry. Tagged pointers use the same bits that are used +to distinguish value entries from normal pointers, so each user must +decide whether they want to store value entries or tagged pointers in +any particular XArray. + +The XArray does not support storing :c:func:`IS_ERR` pointers as some +conflict with value entries or internal entries. + +An unusual feature of the XArray is the ability to create entries which +occupy a range of indices. Once stored to, looking up any index in +the range will return the same entry as looking up any other index in +the range. Setting a mark on one index will set it on all of them. +Storing to any index will store to all of them. Multi-index entries can +be explicitly split into smaller entries, or storing ``NULL`` into any +entry will cause the XArray to forget about the range. + +Normal API +========== + +Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY` +for statically allocated XArrays or :c:func:`xa_init` for dynamically +allocated ones. A freshly-initialised XArray contains a ``NULL`` +pointer at every index. + +You can then set entries using :c:func:`xa_store` and get entries +using :c:func:`xa_load`. xa_store will overwrite any entry with the +new entry and return the previous entry stored at that index. You can +use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a +``NULL`` entry. There is no difference between an entry that has never +been stored to and one that has most recently had ``NULL`` stored to it. + +You can conditionally replace an entry at an index by using +:c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if +the entry at that index has the 'old' value. It also returns the entry +which was at that index; if it returns the same entry which was passed as +'old', then :c:func:`xa_cmpxchg` succeeded. + +If you want to only store a new entry to an index if the current entry +at that index is ``NULL``, you can use :c:func:`xa_insert` which +returns ``-EEXIST`` if the entry is not empty. + +You can enquire whether a mark is set on an entry by using +:c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark +on it by using :c:func:`xa_set_mark` and remove the mark from an entry by +calling :c:func:`xa_clear_mark`. You can ask whether any entry in the +XArray has a particular mark set by calling :c:func:`xa_marked`. + +You can copy entries out of the XArray into a plain array by calling +:c:func:`xa_extract`. Or you can iterate over the present entries in +the XArray by calling :c:func:`xa_for_each`. You may prefer to use +:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present +entry in the XArray. + +Calling :c:func:`xa_store_range` stores the same entry in a range +of indices. If you do this, some of the other operations will behave +in a slightly odd way. For example, marking the entry at one index +may result in the entry being marked at some, but not all of the other +indices. Storing into one index may result in the entry retrieved by +some, but not all of the other indices changing. + +Finally, you can remove all entries from an XArray by calling +:c:func:`xa_destroy`. If the XArray entries are pointers, you may wish +to free the entries first. You can do this by iterating over all present +entries in the XArray using the :c:func:`xa_for_each` iterator. + +ID assignment +------------- + +You can call :c:func:`xa_alloc` to store the entry at any unused index +in the XArray. If you need to modify the array from interrupt context, +you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable +interrupts while allocating the ID. Unlike :c:func:`xa_store`, allocating +a ``NULL`` pointer does not delete an entry. Instead it reserves an +entry like :c:func:`xa_reserve` and you can release it using either +:c:func:`xa_erase` or :c:func:`xa_release`. To use ID assignment, the +XArray must be defined with :c:func:`DEFINE_XARRAY_ALLOC`, or initialised +by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`, + +Memory allocation +----------------- + +The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`, +:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t +parameter in case the XArray needs to allocate memory to store this entry. +If the entry is being deleted, no memory allocation needs to be performed, +and the GFP flags specified will be ignored. + +It is possible for no memory to be allocatable, particularly if you pass +a restrictive set of GFP flags. In that case, the functions return a +special value which can be turned into an errno using :c:func:`xa_err`. +If you don't need to know exactly which error occurred, using +:c:func:`xa_is_err` is slightly more efficient. + +Locking +------- + +When using the Normal API, you do not have to worry about locking. +The XArray uses RCU and an internal spinlock to synchronise access: + +No lock needed: + * :c:func:`xa_empty` + * :c:func:`xa_marked` + +Takes RCU read lock: + * :c:func:`xa_load` + * :c:func:`xa_for_each` + * :c:func:`xa_find` + * :c:func:`xa_find_after` + * :c:func:`xa_extract` + * :c:func:`xa_get_mark` + +Takes xa_lock internally: + * :c:func:`xa_store` + * :c:func:`xa_insert` + * :c:func:`xa_erase` + * :c:func:`xa_erase_bh` + * :c:func:`xa_erase_irq` + * :c:func:`xa_cmpxchg` + * :c:func:`xa_store_range` + * :c:func:`xa_alloc` + * :c:func:`xa_alloc_bh` + * :c:func:`xa_alloc_irq` + * :c:func:`xa_destroy` + * :c:func:`xa_set_mark` + * :c:func:`xa_clear_mark` + +Assumes xa_lock held on entry: + * :c:func:`__xa_store` + * :c:func:`__xa_insert` + * :c:func:`__xa_erase` + * :c:func:`__xa_cmpxchg` + * :c:func:`__xa_alloc` + * :c:func:`__xa_set_mark` + * :c:func:`__xa_clear_mark` + +If you want to take advantage of the lock to protect the data structures +that you are storing in the XArray, you can call :c:func:`xa_lock` +before calling :c:func:`xa_load`, then take a reference count on the +object you have found before calling :c:func:`xa_unlock`. This will +prevent stores from removing the object from the array between looking +up the object and incrementing the refcount. You can also use RCU to +avoid dereferencing freed memory, but an explanation of that is beyond +the scope of this document. + +The XArray does not disable interrupts or softirqs while modifying +the array. It is safe to read the XArray from interrupt or softirq +context as the RCU lock provides enough protection. + +If, for example, you want to store entries in the XArray in process +context and then erase them in softirq context, you can do that this way:: + + void foo_init(struct foo *foo) + { + xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); + } + + int foo_store(struct foo *foo, unsigned long index, void *entry) + { + int err; + + xa_lock_bh(&foo->array); + err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); + if (!err) + foo->count++; + xa_unlock_bh(&foo->array); + return err; + } + + /* foo_erase() is only called from softirq context */ + void foo_erase(struct foo *foo, unsigned long index) + { + xa_lock(&foo->array); + __xa_erase(&foo->array, index); + foo->count--; + xa_unlock(&foo->array); + } + +If you are going to modify the XArray from interrupt or softirq context, +you need to initialise the array using :c:func:`xa_init_flags`, passing +``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. + +The above example also shows a common pattern of wanting to extend the +coverage of the xa_lock on the store side to protect some statistics +associated with the array. + +Sharing the XArray with interrupt context is also possible, either +using :c:func:`xa_lock_irqsave` in both the interrupt handler and process +context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock` +in the interrupt handler. Some of the more common patterns have helper +functions such as :c:func:`xa_erase_bh` and :c:func:`xa_erase_irq`. + +Sometimes you need to protect access to the XArray with a mutex because +that lock sits above another mutex in the locking hierarchy. That does +not entitle you to use functions like :c:func:`__xa_erase` without taking +the xa_lock; the xa_lock is used for lockdep validation and will be used +for other purposes in the future. + +The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also +available for situations where you look up an entry and want to atomically +set or clear a mark. It may be more efficient to use the advanced API +in this case, as it will save you from walking the tree twice. + +Advanced API +============ + +The advanced API offers more flexibility and better performance at the +cost of an interface which can be harder to use and has fewer safeguards. +No locking is done for you by the advanced API, and you are required +to use the xa_lock while modifying the array. You can choose whether +to use the xa_lock or the RCU lock while doing read-only operations on +the array. You can mix advanced and normal operations on the same array; +indeed the normal API is implemented in terms of the advanced API. The +advanced API is only available to modules with a GPL-compatible license. + +The advanced API is based around the xa_state. This is an opaque data +structure which you declare on the stack using the :c:func:`XA_STATE` +macro. This macro initialises the xa_state ready to start walking +around the XArray. It is used as a cursor to maintain the position +in the XArray and let you compose various operations together without +having to restart from the top every time. + +The xa_state is also used to store errors. You can call +:c:func:`xas_error` to retrieve the error. All operations check whether +the xa_state is in an error state before proceeding, so there's no need +for you to check for an error after each call; you can make multiple +calls in succession and only check at a convenient point. The only +errors currently generated by the XArray code itself are ``ENOMEM`` and +``EINVAL``, but it supports arbitrary errors in case you want to call +:c:func:`xas_set_err` yourself. + +If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem` +will attempt to allocate more memory using the specified gfp flags and +cache it i |