Post-hoc cond_v2 design doc (#12)

skye · ewilderj · commit c4faa81e0820 · 2018-08-23T09:52:31.000-07:00
Make early design doc public.
diff --git a/rfcs/20180507-cond-v2.md b/rfcs/20180507-cond-v2.md
@@ -0,0 +1,170 @@
+# **"Functional"** **cond design doc**
+
+| Status        | Approved       |
+:-------------- |:---------------------------------------------------- |
+| **Author(s)** | Skye Wanderman-Milne (skyewm@gmail.com)  |
+| **Created**   | 2018-05-07                                           |
+| **Updated**   | 2018-08-22                                           |
+
+## Objective
+
+**Switch tf.cond to emit a single If op.**
+
+We can do tf.while_loop next.
+
+This would make mapping to XLA's control flow constructs easier/possible. In particular, just switching to the If op would be a big win (more work needed to get cond working with XLA than while_loop, which already had a lot of work done), and easier than while loop. It will also making debugging and analysis of cond constructs much simpler, e.g. to implement higher-order derivatives.
+
+Note that cond will still support side-effecting ops (e.g. variable updates).
+
+
+## Background material
+
+tf.cond API: https://www.tensorflow.org/api_docs/python/tf/cond
+
+If op: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/functional_ops.cc#L104
+
+Overview of current control flow implementation: [Implementation of Control Flow in TensorFlow](http://download.tensorflow.org/paper/white_paper_tf_control_flow_implementation_2017_11_1.pdf)
+
+
+## Design overview
+
+
+### Functional tf.cond
+
+The signature of `tf.cond` will stay the same: boolean predicate Tensor, and Python callables for the two branches. The two callables each take no arguments (they instead close over any input tensors), and are required to return the same number and type of tensors.
+
+We need to convert this to the If op signature, which is a boolean predicate, and FunctionDefs for the two branches. The FunctionDefs are required to have the same number and type of inputs and outputs. Luckily, tfe.defun already gives us the machinery to convert the Python callables into FunctionDefs, including converting closures to inputs and adding extra inputs to make the branch signatures match. This is done via an overloaded Graph subclass, [FuncGraph](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/function.py#L191), which gives us the full flexibility of graphs while creating the branch functions.
+
+This conversion results in a single If op representing the `tf.cond`.
+
+
+### Gradients
+
+The gradient of an If op is another If op. The predicate is the same as the forward op's, and each branch function is the gradient function of the corresponding forward branch.
+
+This requires the gradient branch functions to access intermediate tensors of the forward branch functions. Internal tensors in a function can't be directly accessed, so we need to add the necessary intermediates as outputs to the forward If op (how to do this is discussed in the "Implementation challenges" section).
+
+
+### Execution
+
+There are two choices for running the resulting If ops:
+
+
+
+1.  Use the `IfOp` kernel as-is, which runs the functions using `FunctionLibraryRuntime`.
+1.  "Lower" the If ops to the current `tf.cond` implementation (i.e. `Switch` and `Merge` nodes).
+
+(1) is simpler at a high level, but (2) will avoid some of the implementation challenges below.
+
+The lowering can be implemented as an early (pre-placement) optimization pass, in order for the lowered control flow to be placed, pruned, partitioned, etc. as usual. There are already a few examples of similar passes: ParallelConcatRemovePass, AccumulateNV2RemovePass
+
+**Update**: this is done: [LowerIfOpPass](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/lower_if_op.h)
+
+We don't want to lower If ops that will eventually be consumed by the [XLA encapsulation pass](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/jit/jit_compilation_pass_registration.cc#L35), so the TF-XLA bridge can take advantage of the easy-to-convert functional representation. This can be achieved by setting an attribute on the If op indicating whether it should be lowered, determined by e.g. if the If op is in an `XLAContext`. This may prove useful for other future use cases as well, such as transitioning to using the functional representation in the main TF runtime.
+
+
+## Implementation challenges
+
+
+### Exposing intermediate tensors to gradient functions
+
+See the "Gradients" section. We somehow need to add intermediate tensors as outputs to the already-created forward-pass If op and its branch functions. Options:
+
+
+
+1.  Create a new If op with the required outputs. To prevent running both the original and new ops, we need to rewire the outputs of the original op to use the new op (and ideally modify any existing Tensor objects as well).
+1.  Modify the existing If op in-place. This involves either modifying or replacing the branch functions, and changing the outputs of the op (tricky, but probably doable).
+
+Note that both of these options require mutating existing graph elements. If the graph has already been run, **this will invalidate any existing Sessions!** Other options:
+
+
+
+1.  Use placeholders for intermediates during construction, then use a C++ rewrite (Grappler or GraphOptimizationPass) to rewire the graph.
+1.  Output every possible intermediate.
+1.  It might already work as-is.
+    1.  Except for ExtendGraph -- solution could be to make C API and Session share Graph*
+
+**Update**: we went with (2) output every possible intermediate
+
+
+### Making branch function outputs match
+
+After adding the intermediate outputs to the forward If op's branch functions, it's likely the two functions don't have the same output signature anymore. For each new output of each branch, we need to add an extra output tensor to the other branch to mirror it (since the If op requires the two outputs signatures match).
+
+Note that the "mirror" tensors never need to be read. The original output is only consumed by the corresponding gradient function, which is only executed if the original output's branch is taken. Thus, if the mirror tensor is produced, no consumer of it will be run. However, without pruning and/or non-strict execution, the If op must still produce some value for the mirror tensor.
+
+_Solution:_
+
+Introduce a special op to output mirror tensors. This op's shape inference function will claim to output the same shape and type of the mirrored output, but since the tensor isn't actually needed the kernel will produce some small value to avoid producing large unnecessary values. If/when the op doesn't need to produce a value (e.g. via lowering + pruning), the kernel can CHECK or similar.
+
+
+### Taking the gradient of deserialized If ops
+
+We need a graph representing the branch function of an If op in order to take its gradient. We already have a graph as part of creating the function, but if the graph was loaded from a GraphDef, we no longer have this graph. Options:
+
+
+
+1.  FunctionDef → Graph method
+
+
+### Variable initialization
+
+Variables created in the `cond` input callables must be created in the main graph, not in the temporary `FuncGraphs`. Luckily this is already handled by [init_scope](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/ops.py#L5230), which should already be used as necessary to handle creating variables in Defuns, etc.
+
+
+### Collections
+
+We must support reading and writing to collections in the `cond` input callables.
+
+Reading from collections in eager-mode defuns already works by copying the collections into the `FuncGraphs`, which should presumably work here as well.
+
+For writing, we'll have to forward or copy the values back to the original collections. This is tricky and poorly-defined for Tensor and Operation values, and possibly intractable for data structures containing graph elements (e.g. `WhileContext`). Options:
+
+
+
+1.  Collections are supposed to go away in TF 2.0
+1.  Somehow turn Tensors into function outputs
+1.  Can some tensors/operations be pulled out of the function?
+1.  Expose "legacy cond" in contrib, eventually deprecate.
+
+**Writing to collections requires more investigation.**
+
+For example, how are people using collections within `cond` branches? How do they avoid dead Tensors?
+
+
+### Name/device/colocation scope
+
+Similar to reading collections, any graph-wide stacks and other state can be copied into the `FuncGraphs`. New scopes can then be added within the FuncGraph, and the semantics prevent any added state from persisting beyond the input callable.
+
+For colocation, we can possibly use external tensor names as-is, since they'll either be lowered into the main graph or compiled by XLA.
+
+
+### Control dependencies
+
+If the `tf.cond` call occurs inside a control_dependencies block, the control inputs will be added directly to the resulting If op.
+
+If the `cond` input callables contain control_dependencies blocks referring external tensors, we can create Identity nodes of the external tensors inside the function definition, and then create internal control edges (functions only have data inputs).
+
+_The following concerns are avoided by lowering If ops before execution (see "Execution" section):_
+
+
+### Devices
+
+Akshay is working on allowing functions to run across multiple devices. My understanding is that it's mostly working, with a few limitations (e.g. all arguments to the function must go through the caller device, colocation with external tensors doesn't work).
+
+
+### Partial evaluation
+
+TF graphs are pruned before execution, meaning only the subgraph needed to compute the requested output tensors is run (this doesn't work completely for ops in a conditional branch, but some pruning still occurs). This is not currently possible with TF functions; the entire function is run regardless of which outputs are needed. This would need to be supported for parity with the current `cond` implementation.
+
+
+### Non-strict execution
+
+The current `cond` implementation allows each op in the taken branch to be run as soon as its inputs are ready, even if other ops in the branch aren't ready yet ("non-strict" execution). However, each TF op kernel will only begin running once it's inputs are all ready ("strict" execution), with `Merge` nodes being the only exception. If we replace the current `cond` construct with a single op, this will switch `cond` to strict execution. We would need to support non-strict execution of If ops and their branch functions.
+
+
+## Future work
+
+**tf.while_loop**. This effort will solve most of the problems with switching to a functional While representation (or a recursive function representation?). The remaining challenges are inserting stacks for the gradients, and support parallel iterations.
+
+**C API support.** Ideally other language bindings support conditional execution as well. The C API already includes the primitives for other bindings to implement something similar to `tf.cond` that produces an `If` op, but the C API `TF_AddGradients` method would need to support `If` ops in order for other bindings to (easily) allow autodiff of conditionals.