|
| 1 | +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) |
| 2 | +
|
| 3 | +=========================== |
| 4 | +BPF_PROG_TYPE_CGROUP_SYSCTL |
| 5 | +=========================== |
| 6 | + |
| 7 | +This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that |
| 8 | +provides cgroup-bpf hook for sysctl. |
| 9 | + |
| 10 | +The hook has to be attached to a cgroup and will be called every time a |
| 11 | +process inside that cgroup tries to read from or write to sysctl knob in proc. |
| 12 | + |
| 13 | +1. Attach type |
| 14 | +************** |
| 15 | + |
| 16 | +``BPF_CGROUP_SYSCTL`` attach type has to be used to attach |
| 17 | +``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup. |
| 18 | + |
| 19 | +2. Context |
| 20 | +********** |
| 21 | + |
| 22 | +``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from |
| 23 | +BPF program:: |
| 24 | + |
| 25 | + struct bpf_sysctl { |
| 26 | + __u32 write; |
| 27 | + __u32 file_pos; |
| 28 | + }; |
| 29 | + |
| 30 | +* ``write`` indicates whether sysctl value is being read (``0``) or written |
| 31 | + (``1``). This field is read-only. |
| 32 | + |
| 33 | +* ``file_pos`` indicates file position sysctl is being accessed at, read |
| 34 | + or written. This field is read-write. Writing to the field sets the starting |
| 35 | + position in sysctl proc file ``read(2)`` will be reading from or ``write(2)`` |
| 36 | + will be writing to. Writing zero to the field can be used e.g. to override |
| 37 | + whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even |
| 38 | + when it's called by user space on ``file_pos > 0``. Writing non-zero |
| 39 | + value to the field can be used to access part of sysctl value starting from |
| 40 | + specified ``file_pos``. Not all sysctl support access with ``file_pos != |
| 41 | + 0``, e.g. writes to numeric sysctl entries must always be at file position |
| 42 | + ``0``. See also ``kernel.sysctl_writes_strict`` sysctl. |
| 43 | + |
| 44 | +See `linux/bpf.h`_ for more details on how context field can be accessed. |
| 45 | + |
| 46 | +3. Return code |
| 47 | +************** |
| 48 | + |
| 49 | +``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following |
| 50 | +return codes: |
| 51 | + |
| 52 | +* ``0`` means "reject access to sysctl"; |
| 53 | +* ``1`` means "proceed with access". |
| 54 | + |
| 55 | +If program returns ``0`` user space will get ``-1`` from ``read(2)`` or |
| 56 | +``write(2)`` and ``errno`` will be set to ``EPERM``. |
| 57 | + |
| 58 | +4. Helpers |
| 59 | +********** |
| 60 | + |
| 61 | +Since sysctl knob is represented by a name and a value, sysctl specific BPF |
| 62 | +helpers focus on providing access to these properties: |
| 63 | + |
| 64 | +* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in |
| 65 | + ``/proc/sys`` into provided by BPF program buffer; |
| 66 | + |
| 67 | +* ``bpf_sysctl_get_current_value()`` to get string value currently held by |
| 68 | + sysctl into provided by BPF program buffer. This helper is available on both |
| 69 | + ``read(2)`` from and ``write(2)`` to sysctl; |
| 70 | + |
| 71 | +* ``bpf_sysctl_get_new_value()`` to get new string value currently being |
| 72 | + written to sysctl before actual write happens. This helper can be used only |
| 73 | + on ``ctx->write == 1``; |
| 74 | + |
| 75 | +* ``bpf_sysctl_set_new_value()`` to override new string value currently being |
| 76 | + written to sysctl before actual write happens. Sysctl value will be |
| 77 | + overridden starting from the current ``ctx->file_pos``. If the whole value |
| 78 | + has to be overridden BPF program can set ``file_pos`` to zero before calling |
| 79 | + to the helper. This helper can be used only on ``ctx->write == 1``. New |
| 80 | + string value set by the helper is treated and verified by kernel same way as |
| 81 | + an equivalent string passed by user space. |
| 82 | + |
| 83 | +BPF program sees sysctl value same way as user space does in proc filesystem, |
| 84 | +i.e. as a string. Since many sysctl values represent an integer or a vector |
| 85 | +of integers, the following helpers can be used to get numeric value from the |
| 86 | +string: |
| 87 | + |
| 88 | +* ``bpf_strtol()`` to convert initial part of the string to long integer |
| 89 | + similar to user space `strtol(3)`_; |
| 90 | +* ``bpf_strtoul()`` to convert initial part of the string to unsigned long |
| 91 | + integer similar to user space `strtoul(3)`_; |
| 92 | + |
| 93 | +See `linux/bpf.h`_ for more details on helpers described here. |
| 94 | + |
| 95 | +5. Examples |
| 96 | +*********** |
| 97 | + |
| 98 | +See `test_sysctl_prog.c`_ for an example of BPF program in C that access |
| 99 | +sysctl name and value, parses string value to get vector of integers and uses |
| 100 | +the result to make decision whether to allow or deny access to sysctl. |
| 101 | + |
| 102 | +6. Notes |
| 103 | +******** |
| 104 | + |
| 105 | +``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root |
| 106 | +environment, for example to monitor sysctl usage or catch unreasonable values |
| 107 | +an application, running as root in a separate cgroup, is trying to set. |
| 108 | + |
| 109 | +Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it |
| 110 | +may return results different from that at `sys_open` time, i.e. process that |
| 111 | +opened sysctl file in proc filesystem may differ from process that is trying |
| 112 | +to read from / write to it and two such processes may run in different |
| 113 | +cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a |
| 114 | +security mechanism to limit sysctl usage. |
| 115 | + |
| 116 | +As with any cgroup-bpf program additional care should be taken if an |
| 117 | +application running as root in a cgroup should not be allowed to |
| 118 | +detach/replace BPF program attached by administrator. |
| 119 | + |
| 120 | +.. Links |
| 121 | +.. _linux/bpf.h: ../../include/uapi/linux/bpf.h |
| 122 | +.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html |
| 123 | +.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html |
| 124 | +.. _test_sysctl_prog.c: |
| 125 | + ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c |
0 commit comments