diff options
Diffstat (limited to 'Documentation')
82 files changed, 5425 insertions, 673 deletions
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index a210b8a4df00..cec2371173d7 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -298,3 +298,48 @@ A: NO. The BTF_ID macro does not cause a function to become part of the ABI any more than does the EXPORT_SYMBOL_GPL macro. + +Q: What is the compatibility story for special BPF types in map values? +----------------------------------------------------------------------- +Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map +values (when using BTF support for BPF maps). This allows to use helpers for +such objects on these fields inside map values. Users are also allowed to embed +pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the +kernel preserve backwards compatibility for these features? + +A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else: +NO, but see below. + +For struct types that have been added already, like bpf_spin_lock and bpf_timer, +the kernel will preserve backwards compatibility, as they are part of UAPI. + +For kptrs, they are also part of UAPI, but only with respect to the kptr +mechanism. The types that you can use with a __kptr and __kptr_ref tagged +pointer in your struct are NOT part of the UAPI contract. The supported types can +and will change across kernel releases. However, operations like accessing kptr +fields and bpf_kptr_xchg() helper will continue to be supported across kernel +releases for the supported types. + +For any other supported struct type, unless explicitly stated in this document +and added to bpf.h UAPI header, such types can and will arbitrarily change their +size, type, and alignment, or any other user visible API or ABI detail across +kernel releases. The users must adapt their BPF programs to the new changes and +update them to make sure their programs continue to work correctly. + +NOTE: BPF subsystem specially reserves the 'bpf\_' prefix for type names, in +order to introduce more special fields in the future. Hence, user programs must +avoid defining types with 'bpf\_' prefix to not be broken in future releases. +In other words, no backwards compatibility is guaranteed if one using a type +in BTF with 'bpf\_' prefix. + +Q: What is the compatibility story for special BPF types in allocated objects? +------------------------------------------------------------------------------ +Q: Same as above, but for allocated objects (i.e. objects allocated using +bpf_obj_new for user defined types). Will the kernel preserve backwards +compatibility for these features? + +A: NO. + +Unlike map value types, there are no stability guarantees for this case. The +whole API to work with allocated objects and any support for special fields +inside them is unstable (since it is exposed through kfuncs). diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index 761474bd7fe6..03d4993eda6f 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -44,6 +44,33 @@ is a guarantee that the reported issue will be overlooked.** Submitting patches ================== +Q: How do I run BPF CI on my changes before sending them out for review? +------------------------------------------------------------------------ +A: BPF CI is GitHub based and hosted at https://github.com/kernel-patches/bpf. +While GitHub also provides a CLI that can be used to accomplish the same +results, here we focus on the UI based workflow. + +The following steps lay out how to start a CI run for your patches: + +- Create a fork of the aforementioned repository in your own account (one time + action) + +- Clone the fork locally, check out a new branch tracking either the bpf-next + or bpf branch, and apply your to-be-tested patches on top of it + +- Push the local branch to your fork and create a pull request against + kernel-patches/bpf's bpf-next_base or bpf_base branch, respectively + +Shortly after the pull request has been created, the CI workflow will run. Note +that capacity is shared with patches submitted upstream being checked and so +depending on utilization the run can take a while to finish. + +Note furthermore that both base branches (bpf-next_base and bpf_base) will be +updated as patches are pushed to the respective upstream branches they track. As +such, your patch set will automatically (be attempted to) be rebased as well. +This behavior can result in a CI run being aborted and restarted with the new +base line. + Q: To which mailing list do I need to submit my BPF patches? ------------------------------------------------------------ A: Please submit your BPF patches to the bpf kernel mailing list: diff --git a/Documentation/bpf/bpf_iterators.rst b/Documentation/bpf/bpf_iterators.rst new file mode 100644 index 000000000000..6d7770793fab --- /dev/null +++ b/Documentation/bpf/bpf_iterators.rst @@ -0,0 +1,485 @@ +============= +BPF Iterators +============= + + +---------- +Motivation +---------- + +There are a few existing ways to dump kernel data into user space. The most +popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps +all tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink +sockets in the system. However, their output format tends to be fixed, and if +users want more information about these sockets, they have to patch the kernel, +which often takes time to publish upstream and release. The same is true for popular +tools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any +additional information needs a kernel patch. + +To solve this problem, the `drgn +<https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to +dig out the kernel data with no kernel change. However, the main drawback for +drgn is performance, as it cannot do pointer tracing inside the kernel. In +addition, drgn cannot validate a pointer value and may read invalid data if the +pointer becomes invalid inside the kernel. + +The BPF iterator solves the above problem by providing flexibility on what data +(e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel +data object. + +---------------------- +How BPF Iterators Work +---------------------- + +A BPF iterator is a type of BPF program that allows users to iterate over +specific types of kernel objects. Unlike traditional BPF tracing programs that +allow users to define callbacks that are invoked at particular points of +execution in the kernel, BPF iterators allow users to define callbacks that +should be executed for every entry in a variety of kernel data structures. + +For example, users can define a BPF iterator that iterates over every task on +the system and dumps the total amount of CPU runtime currently used by each of +them. Another BPF task iterator may instead dump the cgroup information for each +task. Such flexibility is the core value of BPF iterators. + +A BPF program is always loaded into the kernel at the behest of a user space +process. A user space process loads a BPF program by opening and initializing +the program skeleton as required and then invoking a syscall to have the BPF +program verified and loaded by the kernel. + +In traditional tracing programs, a program is activated by having user space +obtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once +activated, the program callback will be invoked whenever the tracepoint is +triggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the +program is obtained using ``bpf_link_create()``, and the program callback is +invoked by issuing system calls from user space. + +Next, let us see how you can use the iterators to iterate on kernel objects and +read data. + +------------------------ +How to Use BPF iterators +------------------------ + +BPF selftests are a great resource to illustrate how to use the iterators. In +this section, we’ll walk through a BPF selftest which shows how to load and use +a BPF iterator program. To begin, we’ll look at `bpf_iter.c +<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_, +which illustrates how to load and trigger BPF iterators on the user space side. +Later, we’ll look at a BPF program that runs in kernel space. + +Loading a BPF iterator in the kernel from user space typically involves the +following steps: + +* The BPF program is loaded into the kernel through ``libbpf``. Once the kernel + has verified and loaded the program, it returns a file descriptor (fd) to user + space. +* Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()`` + specified with the BPF program file descriptor received from the kernel. +* Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the + ``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2. +* Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is + available. +* Close the iterator fd using ``close(bpf_iter_fd)``. +* If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again. + +The following are a few examples of selftest BPF iterator programs: + +* `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_ +* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_ +* `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_ + +Let us look at ``bpf_iter_task_file.c``, which runs in kernel space: + +Here is the definition of ``bpf_iter__task_file`` in `vmlinux.h +<https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_. +Any struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>`` +represents a BPF iterator. The suffix ``<iter_name>`` represents the type of +iterator. + +:: + + struct bpf_iter__task_file { + union { + struct bpf_iter_meta *meta; + }; + union { + struct task_struct *task; + }; + u32 fd; + union { + struct file *file; + }; + }; + +In the above code, the field 'meta' contains the metadata, which is the same for +all BPF iterator programs. The rest of the fields are specific to different +iterators. For example, for task_file iterators, the kernel layer provides the +'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference +counted +<https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_, +so they won't go away when the BPF program runs. + +Here is a snippet from the ``bpf_iter_task_file.c`` file: + +:: + + SEC("iter/task_file") + int dump_task_file(struct bpf_iter__task_file *ctx) + { + struct seq_file *seq = ctx->meta->seq; + struct task_struct *task = ctx->task; + struct file *file = ctx->file; + __u32 fd = ctx->fd; + + if (task == NULL || file == NULL) + return 0; + + if (ctx->meta->seq_num == 0) { + count = 0; + BPF_SEQ_PRINTF(seq, " tgid gid fd file\n"); + } + + if (tgid == task->tgid && task->tgid != task->pid) + count++; + + if (last_tgid != task->tgid) { + last_tgid = task->tgid; + unique_tgid_count++; + } + + BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, + (long)file->f_op); + return 0; + } + +In the above example, the section name ``SEC(iter/task_file)``, indicates that +the program is a BPF iterator program to iterate all files from all tasks. The +context of the program is ``bpf_iter__task_file`` struct. + +The user space program invokes the BPF iterator program running in the kernel +by issuing a ``read()`` syscall. Once invoked, the BPF +program can export data to user space using a variety of BPF helper functions. +You can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or +``bpf_seq_write()`` function based on whether you need formatted output or just +binary data, respectively. For binary-encoded data, the user space applications +can process the data from ``bpf_seq_write()`` as needed. For the formatted data, +you can use ``cat <path>`` to print the results similar to ``cat +/proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later, +use ``rm -f <path>`` to remove the pinned iterator. + +For example, you can use the following command to create a BPF iterator from the +``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route`` +path: + +:: + + $ bpftool iter pin ./bpf_iter_ipv6_route.o /sys/fs/bpf/my_route + +And then print out the results using the following command: + +:: + + $ cat /sys/fs/bpf/my_route + + +------------------------------------------------------- +Implement Kernel Support for BPF Iterator Program Types +------------------------------------------------------- + +To implement a BPF iterator in the kernel, the developer must make a one-time +change to the following key data structure defined in the `bpf.h +<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_ +file. + +:: + + struct bpf_iter_reg { + const char *target; + bpf_iter_attach_target_t attach_target; + bpf_iter_detach_target_t detach_target; + bpf_iter_show_fdinfo_t show_fdinfo; + bpf_iter_fill_link_info_t fill_link_info; + bpf_iter_get_func_proto_t get_func_proto; + u32 ctx_arg_info_size; + u32 feature; + struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX]; + const struct bpf_iter_seq_info *seq_info; + }; + +After filling the data structur |
