|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +================== |
| 4 | +BPF Flow Dissector |
| 5 | +================== |
| 6 | + |
| 7 | +Overview |
| 8 | +======== |
| 9 | + |
| 10 | +Flow dissector is a routine that parses metadata out of the packets. It's |
| 11 | +used in the various places in the networking subsystem (RFS, flow hash, etc). |
| 12 | + |
| 13 | +BPF flow dissector is an attempt to reimplement C-based flow dissector logic |
| 14 | +in BPF to gain all the benefits of BPF verifier (namely, limits on the |
| 15 | +number of instructions and tail calls). |
| 16 | + |
| 17 | +API |
| 18 | +=== |
| 19 | + |
| 20 | +BPF flow dissector programs operate on an ``__sk_buff``. However, only the |
| 21 | +limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. |
| 22 | +``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input |
| 23 | +and output arguments. |
| 24 | + |
| 25 | +The inputs are: |
| 26 | + * ``nhoff`` - initial offset of the networking header |
| 27 | + * ``thoff`` - initial offset of the transport header, initialized to nhoff |
| 28 | + * ``n_proto`` - L3 protocol type, parsed out of L2 header |
| 29 | + |
| 30 | +Flow dissector BPF program should fill out the rest of the ``struct |
| 31 | +bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be |
| 32 | +also adjusted accordingly. |
| 33 | + |
| 34 | +The return code of the BPF program is either BPF_OK to indicate successful |
| 35 | +dissection, or BPF_DROP to indicate parsing error. |
| 36 | + |
| 37 | +__sk_buff->data |
| 38 | +=============== |
| 39 | + |
| 40 | +In the VLAN-less case, this is what the initial state of the BPF flow |
| 41 | +dissector looks like:: |
| 42 | + |
| 43 | + +------+------+------------+-----------+ |
| 44 | + | DMAC | SMAC | ETHER_TYPE | L3_HEADER | |
| 45 | + +------+------+------------+-----------+ |
| 46 | + ^ |
| 47 | + | |
| 48 | + +-- flow dissector starts here |
| 49 | + |
| 50 | + |
| 51 | +.. code:: c |
| 52 | +
|
| 53 | + skb->data + flow_keys->nhoff point to the first byte of L3_HEADER |
| 54 | + flow_keys->thoff = nhoff |
| 55 | + flow_keys->n_proto = ETHER_TYPE |
| 56 | +
|
| 57 | +In case of VLAN, flow dissector can be called with the two different states. |
| 58 | + |
| 59 | +Pre-VLAN parsing:: |
| 60 | + |
| 61 | + +------+------+------+-----+-----------+-----------+ |
| 62 | + | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | |
| 63 | + +------+------+------+-----+-----------+-----------+ |
| 64 | + ^ |
| 65 | + | |
| 66 | + +-- flow dissector starts here |
| 67 | + |
| 68 | +.. code:: c |
| 69 | +
|
| 70 | + skb->data + flow_keys->nhoff point the to first byte of TCI |
| 71 | + flow_keys->thoff = nhoff |
| 72 | + flow_keys->n_proto = TPID |
| 73 | +
|
| 74 | +Please note that TPID can be 802.1AD and, hence, BPF program would |
| 75 | +have to parse VLAN information twice for double tagged packets. |
| 76 | + |
| 77 | + |
| 78 | +Post-VLAN parsing:: |
| 79 | + |
| 80 | + +------+------+------+-----+-----------+-----------+ |
| 81 | + | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | |
| 82 | + +------+------+------+-----+-----------+-----------+ |
| 83 | + ^ |
| 84 | + | |
| 85 | + +-- flow dissector starts here |
| 86 | + |
| 87 | +.. code:: c |
| 88 | +
|
| 89 | + skb->data + flow_keys->nhoff point the to first byte of L3_HEADER |
| 90 | + flow_keys->thoff = nhoff |
| 91 | + flow_keys->n_proto = ETHER_TYPE |
| 92 | +
|
| 93 | +In this case VLAN information has been processed before the flow dissector |
| 94 | +and BPF flow dissector is not required to handle it. |
| 95 | + |
| 96 | + |
| 97 | +The takeaway here is as follows: BPF flow dissector program can be called with |
| 98 | +the optional VLAN header and should gracefully handle both cases: when single |
| 99 | +or double VLAN is present and when it is not present. The same program |
| 100 | +can be called for both cases and would have to be written carefully to |
| 101 | +handle both cases. |
| 102 | + |
| 103 | + |
| 104 | +Reference Implementation |
| 105 | +======================== |
| 106 | + |
| 107 | +See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference |
| 108 | +implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` |
| 109 | +for the loader. bpftool can be used to load BPF flow dissector program as well. |
| 110 | + |
| 111 | +The reference implementation is organized as follows: |
| 112 | + * ``jmp_table`` map that contains sub-programs for each supported L3 protocol |
| 113 | + * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and |
| 114 | + does ``bpf_tail_call`` to the appropriate L3 handler |
| 115 | + |
| 116 | +Since BPF at this point doesn't support looping (or any jumping back), |
| 117 | +jmp_table is used instead to handle multiple levels of encapsulation (and |
| 118 | +IPv6 options). |
| 119 | + |
| 120 | + |
| 121 | +Current Limitations |
| 122 | +=================== |
| 123 | +BPF flow dissector doesn't support exporting all the metadata that in-kernel |
| 124 | +C-based implementation can export. Notable example is single VLAN (802.1Q) |
| 125 | +and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` |
| 126 | +for a set of information that's currently can be exported from the BPF context. |
0 commit comments