// SPDX-License-Identifier: GPL-2.0
#include "misc.h"
#include "ctree.h"
#include "space-info.h"
#include "sysfs.h"
#include "volumes.h"
#include "free-space-cache.h"
#include "ordered-data.h"
#include "transaction.h"
#include "block-group.h"
/*
* HOW DOES SPACE RESERVATION WORK
*
* If you want to know about delalloc specifically, there is a separate comment
* for that with the delalloc code. This comment is about how the whole system
* works generally.
*
* BASIC CONCEPTS
*
* 1) space_info. This is the ultimate arbiter of how much space we can use.
* There's a description of the bytes_ fields with the struct declaration,
* refer to that for specifics on each field. Suffice it to say that for
* reservations we care about total_bytes - SUM(space_info->bytes_) when
* determining if there is space to make an allocation. There is a space_info
* for METADATA, SYSTEM, and DATA areas.
*
* 2) block_rsv's. These are basically buckets for every different type of
* metadata reservation we have. You can see the comment in the block_rsv
* code on the rules for each type, but generally block_rsv->reserved is how
* much space is accounted for in space_info->bytes_may_use.
*
* 3) btrfs_calc*_size. These are the worst case calculations we used based
* on the number of items we will want to modify. We have one for changing
* items, and one for inserting new items. Generally we use these helpers to
* determine the size of the block reserves, and then use the actual bytes
* values to adjust the space_info counters.
*
* MAKING RESERVATIONS, THE NORMAL CASE
*
* We call into either btrfs_reserve_data_bytes() or
* btrfs_reserve_metadata_bytes(), depending on which we're looking for, with
* num_bytes we want to reserve.
*
* ->reserve
* space_info->bytes_may_reserve += num_bytes
*
* ->extent allocation
* Call btrfs_add_reserved_bytes() which does
* space_info->bytes_may_reserve -= num_bytes
* space_info->bytes_reserved += extent_bytes
*
* ->insert reference
* Call btrfs_update_block_group() which does
* space_info->bytes_reserved -= extent_bytes
* space_info->bytes_used += extent_bytes
*
* MAKING RESERVATIONS, FLUSHING NORMALLY (non-priority)
*
* Assume we are unable to simply make the reservation because we do not have
* enough space
*
* -> __reserve_bytes
* create a reserve_ticket with ->bytes set to our reservation, add it to
* the tail of space_info->tickets, kick async flush thread
*
* ->handle_reserve_ticket
* wait on ticket->wait for ->bytes to be reduced to 0, or ->error to be set
* on the ticket.
*
* -> btrfs_async_reclaim_metadata_space/btrfs_async_reclaim_data_space
* Flushes various things attempting to free up space.
*
* -> btrfs_try_granting_tickets()
* This is called by anything that either subtracts space from
* space_info->bytes_may_use, ->bytes_pinned, etc, or adds to the
* space_info->total_bytes. This loops through the ->priority_tickets and
* then the ->tickets list checking to see if the reservation can be
* completed. If it can the space is added to space_info->bytes_may_use and
* the ticket is woken up.
*
* -> ticket wakeup
* Check if ->bytes == 0, if it does we got our reservation and we can carry
* on, if not return the appropriate error (ENOSPC, but can be EINTR if we
* were interrupted.)
*
* MAKING RESERVATIONS, FLUSHING HIGH PRIORITY
*
* Same as the above, except we add ourselves to the
* space_info->priority_tickets, and we do not use ticket->wait, we simply
* call flush_space() ourselves for the states that are safe for us to call
* without deadlocking and hope for the best.
*
* THE FLUSHING STATES
*
* Generally speaking we will have two cases for each state, a "nice" state
* and a "ALL THE THINGS" state. In btrfs we delay a lot of work in order to
* reduce the locking over head on the various trees, and even to keep from
* doing any work at all in the case of delayed refs. Each of these delayed
* things however hold reservations, and so letting them run allows us to
* reclaim space so we can make new reservations.
*
* FLUSH_DELAYED_ITEMS
* Every inode has a delayed item to update the inode. Take a simple write
* for example, we would update the inode item at write time to update the
* mtime, and then again at finish_ordered_io() time in order to update the
* isize or bytes. We keep these delayed items to coalesce these operations
* into a single operation done on demand. These are an easy way to reclaim
* metadata space.
*
* FLUSH_DELALLOC
* Look at the delalloc comment to get an idea of how much space is reserved
* for delayed allocation. We can reclaim some of this space simply by
* running delalloc, but usually we need to wait for ordered extents to
* reclaim the bulk of this space.
*
* FLUSH_DELAYED_REFS
* We have a block reserve for the outstanding delayed refs space, and every
* delayed ref operation holds a reservation. Running these is a quick way
* to reclaim space, but we want to hold this until the end because COW can
* churn a lot and we can avoid making some extent tree modifications if we
* are able to delay for as long as possible.
*
* ALLOC_CHUNK
* We will skip this the first time through space reservation, because of
* overcommit and we don't want to have a lot of useless metadata space when
* our worst case reservations will likely never come true.
*
* RUN_DELAYED_IPUTS
* If we're freeing inodes we're likely freeing checksums, file extent
* items, and extent tree items. Loads of space could be freed up by these
* operations, however they won't be usable until the transaction commits.
*
* COMMIT_TRANS
* may_commit_transaction() is the ultimate arbiter on whether we commit the
*