diffblue · NlightNFotis · Apr 24, 2020 · Mar 10, 2020 · Oct 25, 2019 · Oct 25, 2019
@@ -9,6 +9,7 @@
 [A Short Tutorial](cbmc/tutorial/),
 [Loop Unwinding](cbmc/unwinding/),
 [Assertion Checking](cbmc/assertions/),
+[Restricting function pointers](cbmc/restrict-function-pointer/),
 [Memory Analyzer](cbmc/memory-analyzer/),
 [Program Harness](cbmc/goto-harness/)
 

@@ -0,0 +1,180 @@
+[CPROVER Manual TOC](../../)
+
+## Restricting function pointers
+
+In this document, we describe the `goto-instrument` feature to replace calls
+through function pointers by case distinctions over calls to given sets of
+functions.
+
+### Motivation
+
+The CPROVER framework includes a goto program transformation pass
+`remove_function_pointers()` to resolve calls to function pointers to direct
+function calls. The pass is needed by `cbmc`, as symbolic execution itself can't
+handle calls to function pointers. In practice, the transformation pass works as
+follows:
+
+Given that there are functions with these signatures available in the program:
+
+```
+int f(int x);
+int g(int x);
+int h(int x);
+```
+
+And we have a call site like this:
+
+```
+typedef int(*fptr_t)(int x);
+void call(fptr_t fptr) {
+  int r = fptr(10);
+  assert(r > 0);
+}
+```
+
+Function pointer removal will turn this into code similar to this:
+
+```
+void call(fptr_t fptr) {
+  int r;
+  if(fptr == &f) {
+    r = f(10);
+  } else if(fptr == &g) {
+    r = g(10);
+  } else if(fptr == &h) {
+    r = h(10);
+  } else {
+    // sanity check
+    assert(false);
+    assume(false);
+  }
+  return r;
+}
+```
+
+This works well enough for simple cases. However, this is a very simple
+replacement only based on the signature of the function (and whether or not they
+have their address taken somewhere in the program), so if there are many
+functions matching a particular signature, or if some of these functions are
+expensive in symex (e.g. functions with lots of loops or recursion), then this
+can be a bit cumbersome - especially if we, as a user, already know that a
+particular function pointer will only resolve to a single function or a small
+set of functions. The `goto-instrument` option `--restrict-function-pointer`
+allows to manually specify this set of functions.
+
+### Example
+
+Take the motivating example above. Let us assume that we know for a fact that
+`call` will always receive pointers to either `f` or `g` during actual
+executions of the program, and symbolic execution for `h` is too expensive to
+simply ignore the cost of its branch. For this, we will label the places in each
+function where function pointers are being called, to this pattern:
+
+```
+<function-name>.function_pointer_call.<N>
+```
+
+where `N` is referring to which function call it is - so the first call to a
+function pointer in a function will have `N=1`, the 5th `N=5` etc.
+
+We can call `goto-instrument --restrict-function-pointer
+call.function_pointer_call.1/f,g in.gb out.gb`. This can be read as
+
+> For the first call to a function pointer in the function `call`, assume that
+> it can only be a call to `f` or `g`
+
+The resulting output (written to goto binary `out.gb`) looks similar to the
+original example, except now there will not be a call to `h`:
+
+```
+void call(fptr_t fptr) {
+  int r;
+  if(fptr == &f) {
+    r = f(10);
+  } else if(fptr == &g) {
+    r = g(10);
+  } else {
+    // sanity check
+    assert(false);
+    assume(false);
+  }
+  return r;
+}
+```
+
+Another example: Imagine we have a simple virtual filesystem API and implementation
+like this:
+
+```
+typedef struct filesystem_t filesystem_t;
+struct filesystem_t {
+  int (*open)(filesystem_t *filesystem, const char* file_name);
+};
+
+int fs_open(filesystem_t *filesystem, const char* file_name) {
+  filesystem->open(filesystem, file_name);
+}
+
+int nullfs_open(filesystem_t *filesystem, const char* file_name) {
+  return -1;
+}
+
+filesystem_t nullfs_val = {.open = nullfs_open};
+filesystem *const nullfs = &nullfs_val;
+
+filesystem_t *get_fs_impl() {
+  // some fancy logic to determine
+  // which filesystem we're getting -
+  // in-memory, backed by a database, OS file system
+  // - but in our case, we know that
+  // it always ends up being nullfs
+  // for the cases we care about
+  return nullfs;
+}
+int main(void) {
+  filesystem_t *fs = get_fs_impl();
+  assert(fs_open(fs, "hello.txt") != -1);
+}
+```
+
+In this case, the assumption is that *we* know that in our `main`, `fs` can be
+nothing other than `nullfs`; But perhaps due to the logic being too complicated
+symex ends up being unable to figure this out, so in the call to `fs_open()` we
+end up branching on all functions matching the signature of
+`filesystem_t::open`, which could be quite a few functions within the program.
+Worst of all, if its address is ever taken in the program, as far as the "dumb"
+function pointer removal is concerned it could be `fs_open()` itself due to it
+having a matching signature, leading to symex being forced to follow a
+potentially infinite recursion until its unwind limit.
+
+In this case we can again restrict the function pointer to the value which we
+know it will have:
+
+```
+--restrict-function-pointer fs_open.function_pointer_call.1/nullfs_open
+```
+
+### Loading from file
+
+If you have many places where you want to restrict function pointers, it'd be a
+nuisance to have to specify them all on the command line. In these cases, you
+can specify a file to load the restrictions from instead, via the
+`--function-pointer-restrictions-file` option, which you can give the name of a
+JSON file with this format:
+
+```
+{
+  "function_call_site_name": ["function1", "function2", ...],
+   ...
+}
+```
+
+**Note:** If you pass in multiple files, or a mix of files and command line
+restrictions, the final restrictions will be a set union of all specified
+restrictions.
+
+**Note:** as of now, if something goes wrong during type checking (i.e. making
+sure that all function pointer replacements refer to functions in the symbol
+table that have the correct type), the error message will refer to the command
+line option `--restrict-function-pointer` regardless of whether the restriction
+in question came from the command line or a file.
diff --git a/regression/goto-instrument/restrict-function-pointer-by-name-global/test.c b/regression/goto-instrument/restrict-function-pointer-by-name-global/test.c
@@ -0,0 +1,28 @@
+int f(void)
+{
+  return 1;
+}
+
+int g(void)
+{
+  return 2;
+}
+
+int h(void)
+{
+  return 3;
+}
+
+typedef int (*fp_t)(void);
+
+fp_t fp;
+
+void main()
+{
+  int cond;
+  fp = cond ? f : g;
+  int res = fp();
+  __CPROVER_assert(res == 1, "");
+  __CPROVER_assert(res == 2, "");
+  __CPROVER_assert(res == 1 || res == 2, "");
+}
diff --git a/regression/goto-instrument/restrict-function-pointer-by-name-global/test.desc b/regression/goto-instrument/restrict-function-pointer-by-name-global/test.desc
@@ -0,0 +1,15 @@
+CORE
+test.c
+--restrict-function-pointer-by-name fp/f,g
+\[main\.assertion\.1\] line \d+ dereferenced function pointer at main\.function_pointer_call\.1 must be one of \[(f, g)|(g, f)\]: SUCCESS
+\[main.assertion.2\] line \d+ assertion: FAILURE
+\[main.assertion.3\] line \d+ assertion: FAILURE
+\[main.assertion.4\] line \d+ assertion: SUCCESS
+f\(\)
+g\(\)
+^EXIT=10$
+^SIGNAL=0$
+--
+h\(\)
+--
+Check that a call to a global function pointer is correctly restricted
diff --git a/regression/goto-instrument/restrict-function-pointer-by-name-local/test.c b/regression/goto-instrument/restrict-function-pointer-by-name-local/test.c
@@ -0,0 +1,19 @@
+#include <assert.h>
+
+int f(void)
+{
+  return 1;
+}
+
+int g(void)
+{
+  return 2;
+}
+
+typedef int (*fp_t)(void);
+
+void main()
+{
+  fp_t fp = f;
+  assert(fp() == 1);
+}
diff --git a/regression/goto-instrument/restrict-function-pointer-by-name-local/test.desc b/regression/goto-instrument/restrict-function-pointer-by-name-local/test.desc
@@ -0,0 +1,12 @@
+CORE
+test.c
+--restrict-function-pointer-by-name main::1::fp/f
+\[main\.assertion\.1\] line \d+ dereferenced function pointer at main\.function_pointer_call\.1 must be f: SUCCESS
+\[main\.assertion\.2\] line \d+ assertion fp\(\) == 1: SUCCESS
+f\(\)
+^EXIT=0$
+^SIGNAL=0$
+--
+g\(\)
+--
+Check that a call to a local function pointer is correctly restricted
diff --git a/regression/goto-instrument/restrict-function-pointer-by-name-parameter/test.c b/regression/goto-instrument/restrict-function-pointer-by-name-parameter/test.c
@@ -0,0 +1,23 @@
+#include <assert.h>
+
+int f(void)
+{
+  return 1;
+}
+
+int g(void)
+{
+  return 2;
+}
+
+typedef int (*fp_t)(void);
+
+void use_fp(fp_t fp)
+{
+  assert(fp() == 1);
+}
+
+void main()
+{
+  use_fp(f);
+}
diff --git a/regression/goto-instrument/restrict-function-pointer-by-name-parameter/test.desc b/regression/goto-instrument/restrict-function-pointer-by-name-parameter/test.desc
@@ -0,0 +1,12 @@
+CORE
+test.c
+--restrict-function-pointer-by-name use_fp::fp/f
+\[use_fp\.assertion\.1\] line \d+ dereferenced function pointer at use_fp\.function_pointer_call\.1 must be f: SUCCESS
+\[use_fp\.assertion\.2\] line \d+ assertion fp\(\) == 1: SUCCESS
+f\(\)
+^EXIT=0$
+^SIGNAL=0$
+--
+g\(\)
+--
+Check that a call to a function pointer parameter is correctly restricted
diff --git a/regression/goto-instrument/restrict-function-pointer-to-complex-expression/test.c b/regression/goto-instrument/restrict-function-pointer-to-complex-expression/test.c
@@ -0,0 +1,61 @@
+#include <assert.h>
+
+typedef int (*fptr_t)(int);
+
+fptr_t get_f(void);
+fptr_t get_g(void);
+
+void use_fg(int choice, fptr_t fptr, fptr_t gptr)
+{
+  assert((choice ? fptr : gptr)(10) == 10 + choice);
+}
+
+int main(void)
+{
+  use_fg(0, get_f(), get_g());
+  use_fg(1, get_f(), get_g());
+}
+
+int f(int x)
+{
+  return x + 1;
+}
+
+int g(int x)
+{
+  return x;
+}
+
+int h(int x)
+{
+  return x / 2;
+}
+
+// Below we take the address of f, g, and h. Thus remove_function_pointers()
+// would create a case distinction involving f, g, and h in the function
+// use_fg() above.
+int always_false_cond = 0;
+
+fptr_t get_f(void)
+{
+  if(!always_false_cond)
+  {
+    return f;
+  }
+  else
+  {
+    return h;
+  }
+}
+
+fptr_t get_g(void)
+{
+  if(!always_false_cond)
+  {
+    return g;
+  }
+  else
+  {
+    return h;
+  }
+}
diff --git a/regression/goto-instrument/restrict-function-pointer-to-complex-expression/test.desc b/regression/goto-instrument/restrict-function-pointer-to-complex-expression/test.desc
@@ -0,0 +1,11 @@
+CORE
+test.c
+--restrict-function-pointer "use_fg.function_pointer_call.1/f,g"
+\[use_fg.assertion.2\] line \d+ assertion \(choice \? fptr : gptr\)\(10\) == 10 \+ choice: SUCCESS
+\[use_fg.assertion.1\] line \d+ dereferenced function pointer at use_fg.function_pointer_call.1 must be one of \[(f|g), (f|g)\]: SUCCESS
+^EXIT=0$
+^SIGNAL=0$
+--
+--
+This test checks that the selected function pointer is replaced by a case
+distinction over both functions f and g.
diff --git a/regression/goto-instrument/restrict-function-pointer-to-complex-expression/test.json b/regression/goto-instrument/restrict-function-pointer-to-complex-expression/test.json
@@ -0,0 +1,3 @@
+{
+  "use_fg.function_pointer_call.1": ["f"]
+}
+4 −44		src/main/java/java/lang/Integer.java
+5 −41		src/main/java/java/lang/Long.java
+36 −0		src/main/java/org/cprover/CProverString.java