-
Notifications
You must be signed in to change notification settings - Fork 580
RFC: Deprecate Collections #17
Conversation
This comment has been minimized.
This comment has been minimized.
|
Tried changing author |
|
CLAs look good, thanks! |
| 1. **Assets:** The ASSET_FILEPATHS collection tracks all external files ([lookup_ops](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/lookup_ops.py#L550), [tf_transform](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L830), [estimator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/estimator/estimator.py#L941)) that are needed by the model for instance vocabulary files etc. These are used in a few places | ||
| 1. SavedModel: [builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/saved_model/builder_impl.py#L354) | ||
| 1. TF Transform: [analyzers.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L98), [beam/impl.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L560), [impl_helper.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/impl_helper.py#L405), [saved_io_transform.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py#L213) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And tf-hub. CC @andresusanopinto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
|
||
| For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables. | ||
|
|
||
| In case of custom graph code, variable_creator_scope can be used to collect up variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of how this would work? What about for Estimators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some example code. Regarding estimators: I think we can create one of these variable trackers before the model_fn is called, record the variables created and then feed them to the Scaffold so that the initialization happens correctly and the init_op etc. is set correctly.
| ### **Local variables** | ||
|
|
||
| Local variables are primarily used for metrics. With the move to objects for metrics, one can access the variables created for the metrics via the Metric object. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are old metrics explicitly deprecated/gone for 2.0? Or will there be remaining users we have to accommodate? CC @brainnoise
| ### **Asset file paths** | ||
|
|
||
| Asset files currently only include vocabulary files that are used to initialize tables with. The tracking of tables should be sufficient to track vocabulary files. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These also have to be replicated in the SavedModel, although passing these in as a python list would be way clearer than the strange thing we do now with collections, presumably. But it will require updating client code and SavedModel code as well.
| ### **Updates** | ||
|
|
||
| Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of what this would look like for user code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| 1. Queue runners removal and replacement with tf.data | ||
| 1. [Variables 2.0](https://github.com/tensorflow/community/pull/11) | ||
| 1. Object based checkpointing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that this was not slated to be ready for 2.0? Has that changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that its already live. @allenlavoie for more comments on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, tf.train.Checkpoint is in core. Estimators don't have an API for using it yet, and I am working on saving SavedModels using the same dependency system, but for training checkpoints it's all there.
rohan100jain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Karmel!
| 1. **Assets:** The ASSET_FILEPATHS collection tracks all external files ([lookup_ops](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/lookup_ops.py#L550), [tf_transform](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L830), [estimator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/estimator/estimator.py#L941)) that are needed by the model for instance vocabulary files etc. These are used in a few places | ||
| 1. SavedModel: [builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/saved_model/builder_impl.py#L354) | ||
| 1. TF Transform: [analyzers.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L98), [beam/impl.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L560), [impl_helper.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/impl_helper.py#L405), [saved_io_transform.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py#L213) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
|
||
| For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables. | ||
|
|
||
| In case of custom graph code, variable_creator_scope can be used to collect up variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some example code. Regarding estimators: I think we can create one of these variable trackers before the model_fn is called, record the variables created and then feed them to the Scaffold so that the initialization happens correctly and the init_op etc. is set correctly.
| ### **Updates** | ||
|
|
||
| Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| 1. Queue runners removal and replacement with tf.data | ||
| 1. [Variables 2.0](https://github.com/tensorflow/community/pull/11) | ||
| 1. Object based checkpointing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that its already live. @allenlavoie for more comments on this.
|
|
||
| #### **Keras models** | ||
|
|
||
| We'll provide a `track_table` method in the backend (similar to `track_variable`) and we'll make sure that when model.fit is called, tables are initialized similar to the way we initialize variables. Changes we'll have to make here are the following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track_variable and track_table seem suspiciously similar to the global behavior we're trying to remove.
Are these okay because they are special purpose? Is there a rule of thumb for when global == OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah they are indeed. The thinking is to provide them as a backstop because forcing everyone to migrate to managing their state immediately would be tough. As a principle, we will try not to use them in the tensorflow code base at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables in particular seem too special case. If we are just capturing initializers, maybe this can be generalized for that? Variables and initializers as special cases; everything else we keep out of global scope? We can handle assets through the EstimatorSpec (and the same for similar entities, perhaps?)
|
|
||
| Initialize tables on first use. Similar to the [OneShotIterator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/core/ops/dataset_ops.cc#L684), we'll pass in a function argument that can initialize the table. We can even explore the idea of the function to return a dataset and then initialize the table from that dataset. Having a function would avoid overheads of passing in large tensors that might be required to initialize on each lookup (index_table_from_tensor for instance might run into this problem). The Table API would remain the same since we hide initialization from the user and we wouldn't have to run an initializer explicitly. | ||
|
|
||
| The concern with this solution is that in serving, the first request would take more time than the rest of the calls and that sort of inconsistent performance is not desirable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hidden table initializers have been a persistent thorn for us in some projects. Having them be explicitly visible at the construction site would be a nice win for readability.
These tests started failing after 4cedc8b variables.Variable is the 2.0-style class - where the notion of global collections is being done away with (as per tensorflow/community#17) and hence the call to sess.run(tf.global_variables_initializer()) is a no-op. For now, switch the tests to use variable.VariableV1. In a follow up, these tests need to be updated so that they do not rely on global collections (i.e., do not call tf.global_variables_initializer()). PiperOrigin-RevId: 220558354
|
I added design review notes. Can someone take a look and approve this request so that I can merge it? |
Is there anything else actionable ? if not and everyone is in agreement with this proposal, Rohan, could you please update the status of the doc to "Accepted" and then we can merge it. |
The feedback phase will be open for two weeks until 2018-09-20
Deprecate Collections
Summary
TF 2.0 gives us a great opportunity to clean up APIs that are not as desirable. The collections API, although it permits a compact coding style and is widely used, poses a fair amount of technical difficulties and challenges and so we'd like to get rid of it.
Eager execution: There isn't a concept of a graph when executing eagerly and therefore no support for collections.
Graph mode: Collections are tied to a graph and assume that we build only one model per graph. As use cases get more and more complex, we have situations where we might build multiple models in a graph. Functions cause further issues because functions are graphs of their own. Any collections API usage inside a function would create state that is purely local to that function and the default graph would lose all that state.