RFC: Deprecate Collections #17

rohan100jain · 2018-09-05T18:23:04Z

The feedback phase will be open for two weeks until 2018-09-20

Deprecate Collections

Status	Proposed
Author(s)	Rohan Jain (Google)
Sponsor	Asim Shankar (Google)
Updated	2018-09-05

Summary

TF 2.0 gives us a great opportunity to clean up APIs that are not as desirable. The collections API, although it permits a compact coding style and is widely used, poses a fair amount of technical difficulties and challenges and so we'd like to get rid of it.

Eager execution: There isn't a concept of a graph when executing eagerly and therefore no support for collections.
Graph mode: Collections are tied to a graph and assume that we build only one model per graph. As use cases get more and more complex, we have situations where we might build multiple models in a graph. Functions cause further issues because functions are graphs of their own. Any collections API usage inside a function would create state that is purely local to that function and the default graph would lose all that state.

rohan100jain · 2018-09-05T18:47:06Z

Tried changing author

googlebot · 2018-09-05T19:05:41Z

CLAs look good, thanks!

karmel · 2018-09-13T18:51:50Z

rfcs/20180905-deprecate-collections.md

+1.  **Assets:** The ASSET_FILEPATHS collection tracks all external files ([lookup_ops](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/lookup_ops.py#L550), [tf_transform](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L830), [estimator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/estimator/estimator.py#L941)) that are needed by the model for instance vocabulary files etc. These are used in a few places
+    1.  SavedModel: [builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/saved_model/builder_impl.py#L354)
+    1.  TF Transform:  [analyzers.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L98), [beam/impl.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L560), [impl_helper.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/impl_helper.py#L405), [saved_io_transform.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py#L213)
+


And tf-hub. CC @andresusanopinto

karmel · 2018-09-13T18:57:27Z

rfcs/20180905-deprecate-collections.md

+
+For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables. 
+
+In case of custom graph code, variable_creator_scope can be used to collect up variables.


Can you give an example of how this would work? What about for Estimators?

Added some example code. Regarding estimators: I think we can create one of these variable trackers before the model_fn is called, record the variables created and then feed them to the Scaffold so that the initialization happens correctly and the init_op etc. is set correctly.

karmel · 2018-09-13T18:59:30Z

rfcs/20180905-deprecate-collections.md

+### **Local variables**
+
+Local variables are primarily used for metrics. With the move to objects for metrics, one can access the variables created for the metrics via the Metric object. 
+


Are old metrics explicitly deprecated/gone for 2.0? Or will there be remaining users we have to accommodate? CC @brainnoise

karmel · 2018-09-13T20:26:45Z

rfcs/20180905-deprecate-collections.md

+### **Asset file paths**
+
+Asset files currently only include vocabulary files that are used to initialize tables with. The tracking of tables should be sufficient to track vocabulary files. 
+


These also have to be replicated in the SavedModel, although passing these in as a python list would be way clearer than the strange thing we do now with collections, presumably. But it will require updating client code and SavedModel code as well.

karmel · 2018-09-13T20:28:49Z

rfcs/20180905-deprecate-collections.md

+### **Updates**
+
+Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only.
+


Can you provide an example of what this would look like for user code?

karmel · 2018-09-13T20:31:30Z

rfcs/20180905-deprecate-collections.md

+
+1.  Queue runners removal and replacement with tf.data
+1.  [Variables 2.0](https://github.com/tensorflow/community/pull/11)
+1.  Object based checkpointing


My understanding was that this was not slated to be ready for 2.0? Has that changed?

My understanding was that its already live. @allenlavoie for more comments on this.

Yep, tf.train.Checkpoint is in core. Estimators don't have an API for using it yet, and I am working on saving SavedModels using the same dependency system, but for training checkpoints it's all there.

rohan100jain

Thanks Karmel!

rohan100jain · 2018-10-03T20:04:26Z

rfcs/20180905-deprecate-collections.md

+1.  **Assets:** The ASSET_FILEPATHS collection tracks all external files ([lookup_ops](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/lookup_ops.py#L550), [tf_transform](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L830), [estimator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/estimator/estimator.py#L941)) that are needed by the model for instance vocabulary files etc. These are used in a few places
+    1.  SavedModel: [builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/saved_model/builder_impl.py#L354)
+    1.  TF Transform:  [analyzers.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L98), [beam/impl.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L560), [impl_helper.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/impl_helper.py#L405), [saved_io_transform.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py#L213)
+


rohan100jain · 2018-10-03T20:25:50Z

rfcs/20180905-deprecate-collections.md

+
+For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables. 
+
+In case of custom graph code, variable_creator_scope can be used to collect up variables.


Added some example code. Regarding estimators: I think we can create one of these variable trackers before the model_fn is called, record the variables created and then feed them to the Scaffold so that the initialization happens correctly and the init_op etc. is set correctly.

rohan100jain · 2018-10-03T20:39:09Z

rfcs/20180905-deprecate-collections.md

+### **Updates**
+
+Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only.
+


rohan100jain · 2018-10-03T20:43:18Z

rfcs/20180905-deprecate-collections.md

+
+1.  Queue runners removal and replacement with tf.data
+1.  [Variables 2.0](https://github.com/tensorflow/community/pull/11)
+1.  Object based checkpointing


My understanding was that its already live. @allenlavoie for more comments on this.

rjpower · 2018-10-04T18:31:25Z

rfcs/20180905-deprecate-collections.md

+
+#### **Keras models**
+
+We'll provide a `track_table` method in the backend (similar to `track_variable`) and we'll make sure that when model.fit is called, tables are initialized similar to the way we initialize variables. Changes we'll have to make here are the following


track_variable and track_table seem suspiciously similar to the global behavior we're trying to remove.

Are these okay because they are special purpose? Is there a rule of thumb for when global == OK?

yeah they are indeed. The thinking is to provide them as a backstop because forcing everyone to migrate to managing their state immediately would be tough. As a principle, we will try not to use them in the tensorflow code base at all.

Tables in particular seem too special case. If we are just capturing initializers, maybe this can be generalized for that? Variables and initializers as special cases; everything else we keep out of global scope? We can handle assets through the EstimatorSpec (and the same for similar entities, perhaps?)

rjpower · 2018-10-04T18:36:16Z

rfcs/20180905-deprecate-collections.md

+
+Initialize tables on first use. Similar to the [OneShotIterator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/core/ops/dataset_ops.cc#L684), we'll pass in a function argument that can initialize the table. We can even explore the idea of the function to return a dataset and then initialize the table from that dataset. Having a function would avoid overheads of passing in large tensors that might be required to initialize on each lookup (index_table_from_tensor for instance might run into this problem). The Table API would remain the same since we hide initialization from the user and we wouldn't have to run an initializer explicitly. 
+
+The concern with this solution is that in serving, the first request would take more time than the rest of the calls and that sort of inconsistent performance is not desirable. 


Hidden table initializers have been a persistent thorn for us in some projects. Having them be explicitly visible at the construction site would be a nice win for readability.

These tests started failing after 4cedc8b variables.Variable is the 2.0-style class - where the notion of global collections is being done away with (as per tensorflow/community#17) and hence the call to sess.run(tf.global_variables_initializer()) is a no-op. For now, switch the tests to use variable.VariableV1. In a follow up, these tests need to be updated so that they do not rely on global collections (i.e., do not call tf.global_variables_initializer()). PiperOrigin-RevId: 220558354

rohan100jain · 2019-02-26T02:27:28Z

I added design review notes. Can someone take a look and approve this request so that I can merge it?

goldiegadde · 2019-02-26T16:40:30Z

I added design review notes. Can someone take a look and approve this request so that I can merge it?

Is there anything else actionable ? if not and everyone is in agreement with this proposal, Rohan, could you please update the status of the doc to "Accepted" and then we can merge it.

rohan100jain requested review from ewilderj and martinwicke as code owners September 5, 2018 18:23

This comment has been minimized.

Sign in to view

Adding a doc to deprecate collections

9d5c52f

rohan100jain force-pushed the master branch from 526f66a to 9d5c52f Compare September 5, 2018 19:05

ewilderj added the 2.0 TensorFlow 2.0 development label Sep 5, 2018

ewilderj changed the title ~~Adding a doc to deprecate collections~~ RFC: Deprecate Collections Sep 5, 2018

ewilderj added the RFC: Proposed RFC Design Document label Sep 6, 2018

karmel reviewed Sep 13, 2018

View reviewed changes

Responding to Karmels comments

7d23ce4

rohan100jain commented Oct 3, 2018

View reviewed changes

Minor fix to VariableTracker sample code

93601a4

rjpower reviewed Oct 4, 2018

View reviewed changes

goldiegadde self-assigned this Oct 30, 2018

Added design review notes

4a3905d

rohan100jain requested a review from goldiegadde as a code owner February 26, 2019 02:26

googlebot added the cla: yes label Feb 26, 2019

Marking doc as accepted

2442176

goldiegadde approved these changes Feb 26, 2019

View reviewed changes

goldiegadde added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 26, 2019

goldiegadde merged commit fc9b7f8 into tensorflow:master Feb 26, 2019


		For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables.

		In case of custom graph code, variable_creator_scope can be used to collect up variables.

		### Local variables

		Local variables are primarily used for metrics. With the move to objects for metrics, one can access the variables created for the metrics via the Metric object.

		### Asset file paths

		Asset files currently only include vocabulary files that are used to initialize tables with. The tracking of tables should be sufficient to track vocabulary files.

		### Updates

		Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only.


		#### Keras models

		We'll provide a `track_table` method in the backend (similar to `track_variable`) and we'll make sure that when model.fit is called, tables are initialized similar to the way we initialize variables. Changes we'll have to make here are the following


		Initialize tables on first use. Similar to the [OneShotIterator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/core/ops/dataset_ops.cc#L684), we'll pass in a function argument that can initialize the table. We can even explore the idea of the function to return a dataset and then initialize the table from that dataset. Having a function would avoid overheads of passing in large tensors that might be required to initialize on each lookup (index_table_from_tensor for instance might run into this problem). The Table API would remain the same since we hide initialization from the user and we wouldn't have to run an initializer explicitly.

		The concern with this solution is that in serving, the first request would take more time than the rest of the calls and that sort of inconsistent performance is not desirable.

RFC: Deprecate Collections #17

RFC: Deprecate Collections #17

Uh oh!

Conversation

rohan100jain commented Sep 5, 2018 • edited by ewilderj Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deprecate Collections

Summary

Uh oh!

This comment has been minimized.

rohan100jain commented Sep 5, 2018

Uh oh!

googlebot commented Sep 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohan100jain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohan100jain commented Feb 26, 2019

Uh oh!

goldiegadde commented Feb 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rohan100jain commented Sep 5, 2018 •

edited by ewilderj

Loading