@@ -1001,6 +1001,99 @@ instruction that loads 64-bit immediate value into a dst_reg.
10011001Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
1002100232-bit immediate value into a register.
10031003
1004+ eBPF verifier
1005+ -------------
1006+ The safety of the eBPF program is determined in two steps.
1007+
1008+ First step does DAG check to disallow loops and other CFG validation.
1009+ In particular it will detect programs that have unreachable instructions.
1010+ (though classic BPF checker allows them)
1011+
1012+ Second step starts from the first insn and descends all possible paths.
1013+ It simulates execution of every insn and observes the state change of
1014+ registers and stack.
1015+
1016+ At the start of the program the register R1 contains a pointer to context
1017+ and has type PTR_TO_CTX.
1018+ If verifier sees an insn that does R2=R1, then R2 has now type
1019+ PTR_TO_CTX as well and can be used on the right hand side of expression.
1020+ If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE,
1021+ since addition of two valid pointers makes invalid pointer.
1022+ (In 'secure' mode verifier will reject any type of pointer arithmetic to make
1023+ sure that kernel addresses don't leak to unprivileged users)
1024+
1025+ If register was never written to, it's not readable:
1026+ bpf_mov R0 = R2
1027+ bpf_exit
1028+ will be rejected, since R2 is unreadable at the start of the program.
1029+
1030+ After kernel function call, R1-R5 are reset to unreadable and
1031+ R0 has a return type of the function.
1032+
1033+ Since R6-R9 are callee saved, their state is preserved across the call.
1034+ bpf_mov R6 = 1
1035+ bpf_call foo
1036+ bpf_mov R0 = R6
1037+ bpf_exit
1038+ is a correct program. If there was R1 instead of R6, it would have
1039+ been rejected.
1040+
1041+ load/store instructions are allowed only with registers of valid types, which
1042+ are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked.
1043+ For example:
1044+ bpf_mov R1 = 1
1045+ bpf_mov R2 = 2
1046+ bpf_xadd *(u32 *)(R1 + 3) += R2
1047+ bpf_exit
1048+ will be rejected, since R1 doesn't have a valid pointer type at the time of
1049+ execution of instruction bpf_xadd.
1050+
1051+ At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context')
1052+ A callback is used to customize verifier to restrict eBPF program access to only
1053+ certain fields within ctx structure with specified size and alignment.
1054+
1055+ For example, the following insn:
1056+ bpf_ld R0 = *(u32 *)(R6 + 8)
1057+ intends to load a word from address R6 + 8 and store it into R0
1058+ If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
1059+ that offset 8 of size 4 bytes can be accessed for reading, otherwise
1060+ the verifier will reject the program.
1061+ If R6=FRAME_PTR, then access should be aligned and be within
1062+ stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
1063+ so it will fail verification, since it's out of bounds.
1064+
1065+ The verifier will allow eBPF program to read data from stack only after
1066+ it wrote into it.
1067+ Classic BPF verifier does similar check with M[0-15] memory slots.
1068+ For example:
1069+ bpf_ld R0 = *(u32 *)(R10 - 4)
1070+ bpf_exit
1071+ is invalid program.
1072+ Though R10 is correct read-only register and has type FRAME_PTR
1073+ and R10 - 4 is within stack bounds, there were no stores into that location.
1074+
1075+ Pointer register spill/fill is tracked as well, since four (R6-R9)
1076+ callee saved registers may not be enough for some programs.
1077+
1078+ Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
1079+ The eBPF verifier will check that registers match argument constraints.
1080+ After the call register R0 will be set to return type of the function.
1081+
1082+ Function calls is a main mechanism to extend functionality of eBPF programs.
1083+ Socket filters may let programs to call one set of functions, whereas tracing
1084+ filters may allow completely different set.
1085+
1086+ If a function made accessible to eBPF program, it needs to be thought through
1087+ from safety point of view. The verifier will guarantee that the function is
1088+ called with valid arguments.
1089+
1090+ seccomp vs socket filters have different security restrictions for classic BPF.
1091+ Seccomp solves this by two stage verifier: classic BPF verifier is followed
1092+ by seccomp verifier. In case of eBPF one configurable verifier is shared for
1093+ all use cases.
1094+
1095+ See details of eBPF verifier in kernel/bpf/verifier.c
1096+
10041097eBPF maps
10051098---------
10061099'maps' is a generic storage of different types for sharing data between kernel
@@ -1040,6 +1133,137 @@ The map is defined by:
10401133 . key size in bytes
10411134 . value size in bytes
10421135
1136+ Understanding eBPF verifier messages
1137+ ------------------------------------
1138+
1139+ The following are few examples of invalid eBPF programs and verifier error
1140+ messages as seen in the log:
1141+
1142+ Program with unreachable instructions:
1143+ static struct bpf_insn prog[] = {
1144+ BPF_EXIT_INSN(),
1145+ BPF_EXIT_INSN(),
1146+ };
1147+ Error:
1148+ unreachable insn 1
1149+
1150+ Program that reads uninitialized register:
1151+ BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
1152+ BPF_EXIT_INSN(),
1153+ Error:
1154+ 0: (bf) r0 = r2
1155+ R2 !read_ok
1156+
1157+ Program that doesn't initialize R0 before exiting:
1158+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
1159+ BPF_EXIT_INSN(),
1160+ Error:
1161+ 0: (bf) r2 = r1
1162+ 1: (95) exit
1163+ R0 !read_ok
1164+
1165+ Program that accesses stack out of bounds:
1166+ BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
1167+ BPF_EXIT_INSN(),
1168+ Error:
1169+ 0: (7a) *(u64 *)(r10 +8) = 0
1170+ invalid stack off=8 size=8
1171+
1172+ Program that doesn't initialize stack before passing its address into function:
1173+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1174+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1175+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1176+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1177+ BPF_EXIT_INSN(),
1178+ Error:
1179+ 0: (bf) r2 = r10
1180+ 1: (07) r2 += -8
1181+ 2: (b7) r1 = 0x0
1182+ 3: (85) call 1
1183+ invalid indirect read from stack off -8+0 size 8
1184+
1185+ Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
1186+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1187+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1188+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1189+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1190+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1191+ BPF_EXIT_INSN(),
1192+ Error:
1193+ 0: (7a) *(u64 *)(r10 -8) = 0
1194+ 1: (bf) r2 = r10
1195+ 2: (07) r2 += -8
1196+ 3: (b7) r1 = 0x0
1197+ 4: (85) call 1
1198+ fd 0 is not pointing to valid bpf_map
1199+
1200+ Program that doesn't check return value of map_lookup_elem() before accessing
1201+ map element:
1202+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1203+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1204+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1205+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1206+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1207+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
1208+ BPF_EXIT_INSN(),
1209+ Error:
1210+ 0: (7a) *(u64 *)(r10 -8) = 0
1211+ 1: (bf) r2 = r10
1212+ 2: (07) r2 += -8
1213+ 3: (b7) r1 = 0x0
1214+ 4: (85) call 1
1215+ 5: (7a) *(u64 *)(r0 +0) = 0
1216+ R0 invalid mem access 'map_value_or_null'
1217+
1218+ Program that correctly checks map_lookup_elem() returned value for NULL, but
1219+ accesses the memory with incorrect alignment:
1220+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1221+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1222+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1223+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1224+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1225+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
1226+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
1227+ BPF_EXIT_INSN(),
1228+ Error:
1229+ 0: (7a) *(u64 *)(r10 -8) = 0
1230+ 1: (bf) r2 = r10
1231+ 2: (07) r2 += -8
1232+ 3: (b7) r1 = 1
1233+ 4: (85) call 1
1234+ 5: (15) if r0 == 0x0 goto pc+1
1235+ R0=map_ptr R10=fp
1236+ 6: (7a) *(u64 *)(r0 +4) = 0
1237+ misaligned access off 4 size 8
1238+
1239+ Program that correctly checks map_lookup_elem() returned value for NULL and
1240+ accesses memory with correct alignment in one side of 'if' branch, but fails
1241+ to do so in the other side of 'if' branch:
1242+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1243+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1244+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1245+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1246+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1247+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
1248+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
1249+ BPF_EXIT_INSN(),
1250+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
1251+ BPF_EXIT_INSN(),
1252+ Error:
1253+ 0: (7a) *(u64 *)(r10 -8) = 0
1254+ 1: (bf) r2 = r10
1255+ 2: (07) r2 += -8
1256+ 3: (b7) r1 = 1
1257+ 4: (85) call 1
1258+ 5: (15) if r0 == 0x0 goto pc+2
1259+ R0=map_ptr R10=fp
1260+ 6: (7a) *(u64 *)(r0 +0) = 0
1261+ 7: (95) exit
1262+
1263+ from 5 to 8: R0=imm0 R10=fp
1264+ 8: (7a) *(u64 *)(r0 +0) = 1
1265+ R0 invalid mem access 'imm'
1266+
10431267Testing
10441268-------
10451269
0 commit comments