Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Conversation

@rohan100jain
Copy link
Member

@rohan100jain rohan100jain commented Sep 5, 2018

The feedback phase will be open for two weeks until 2018-09-20

Deprecate Collections

Status Proposed
Author(s) Rohan Jain (Google)
Sponsor Asim Shankar (Google)
Updated 2018-09-05

Summary

TF 2.0 gives us a great opportunity to clean up APIs that are not as desirable. The collections API, although it permits a compact coding style and is widely used, poses a fair amount of technical difficulties and challenges and so we'd like to get rid of it.

  1. Eager execution: There isn't a concept of a graph when executing eagerly and therefore no support for collections.

  2. Graph mode: Collections are tied to a graph and assume that we build only one model per graph. As use cases get more and more complex, we have situations where we might build multiple models in a graph. Functions cause further issues because functions are graphs of their own. Any collections API usage inside a function would create state that is purely local to that function and the default graph would lose all that state.

@googlebot

This comment has been minimized.

@rohan100jain
Copy link
Member Author

Tried changing author

@googlebot
Copy link

CLAs look good, thanks!

@ewilderj ewilderj added the 2.0 TensorFlow 2.0 development label Sep 5, 2018
@ewilderj ewilderj changed the title Adding a doc to deprecate collections RFC: Deprecate Collections Sep 5, 2018
@ewilderj ewilderj added the RFC: Proposed RFC Design Document label Sep 6, 2018
1. **Assets:** The ASSET_FILEPATHS collection tracks all external files ([lookup_ops](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/lookup_ops.py#L550), [tf_transform](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L830), [estimator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/estimator/estimator.py#L941)) that are needed by the model for instance vocabulary files etc. These are used in a few places
1. SavedModel: [builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/saved_model/builder_impl.py#L354)
1. TF Transform: [analyzers.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L98), [beam/impl.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L560), [impl_helper.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/impl_helper.py#L405), [saved_io_transform.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py#L213)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tf-hub. CC @andresusanopinto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables.

In case of custom graph code, variable_creator_scope can be used to collect up variables.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of how this would work? What about for Estimators?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some example code. Regarding estimators: I think we can create one of these variable trackers before the model_fn is called, record the variables created and then feed them to the Scaffold so that the initialization happens correctly and the init_op etc. is set correctly.

### **Local variables**

Local variables are primarily used for metrics. With the move to objects for metrics, one can access the variables created for the metrics via the Metric object.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are old metrics explicitly deprecated/gone for 2.0? Or will there be remaining users we have to accommodate? CC @brainnoise

### **Asset file paths**

Asset files currently only include vocabulary files that are used to initialize tables with. The tracking of tables should be sufficient to track vocabulary files.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These also have to be replicated in the SavedModel, although passing these in as a python list would be way clearer than the strange thing we do now with collections, presumably. But it will require updating client code and SavedModel code as well.

### **Updates**

Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of what this would look like for user code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


1. Queue runners removal and replacement with tf.data
1. [Variables 2.0](https://github.com/tensorflow/community/pull/11)
1. Object based checkpointing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that this was not slated to be ready for 2.0? Has that changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that its already live. @allenlavoie for more comments on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, tf.train.Checkpoint is in core. Estimators don't have an API for using it yet, and I am working on saving SavedModels using the same dependency system, but for training checkpoints it's all there.

Copy link
Member Author

@rohan100jain rohan100jain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Karmel!

1. **Assets:** The ASSET_FILEPATHS collection tracks all external files ([lookup_ops](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/lookup_ops.py#L550), [tf_transform](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L830), [estimator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/estimator/estimator.py#L941)) that are needed by the model for instance vocabulary files etc. These are used in a few places
1. SavedModel: [builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/saved_model/builder_impl.py#L354)
1. TF Transform: [analyzers.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/analyzers.py#L98), [beam/impl.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/beam/impl.py#L560), [impl_helper.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/impl_helper.py#L405), [saved_io_transform.py](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py#L213)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


For the keras models use case, Model.variables already tracks all variables created in a Keras model. Similarly, for trainable variables, Model.trainable_variables tracks all trainable variables.

In case of custom graph code, variable_creator_scope can be used to collect up variables.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some example code. Regarding estimators: I think we can create one of these variable trackers before the model_fn is called, record the variables created and then feed them to the Scaffold so that the initialization happens correctly and the init_op etc. is set correctly.

### **Updates**

Batch normalization is the only use case for updates. For keras models, all updates are tracked via Model.updates. For the graph construction case, the updates are accessible via the updates attribute on the BatchNormalization layer itself and then they could be added to the train op. We will deprecate the [functional batch_normalization API](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/layers/normalization.py#L158) so that users deal with objects only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


1. Queue runners removal and replacement with tf.data
1. [Variables 2.0](https://github.com/tensorflow/community/pull/11)
1. Object based checkpointing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that its already live. @allenlavoie for more comments on this.


#### **Keras models**

We'll provide a `track_table` method in the backend (similar to `track_variable`) and we'll make sure that when model.fit is called, tables are initialized similar to the way we initialize variables. Changes we'll have to make here are the following
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

track_variable and track_table seem suspiciously similar to the global behavior we're trying to remove.

Are these okay because they are special purpose? Is there a rule of thumb for when global == OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah they are indeed. The thinking is to provide them as a backstop because forcing everyone to migrate to managing their state immediately would be tough. As a principle, we will try not to use them in the tensorflow code base at all.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables in particular seem too special case. If we are just capturing initializers, maybe this can be generalized for that? Variables and initializers as special cases; everything else we keep out of global scope? We can handle assets through the EstimatorSpec (and the same for similar entities, perhaps?)


Initialize tables on first use. Similar to the [OneShotIterator](https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/core/ops/dataset_ops.cc#L684), we'll pass in a function argument that can initialize the table. We can even explore the idea of the function to return a dataset and then initialize the table from that dataset. Having a function would avoid overheads of passing in large tensors that might be required to initialize on each lookup (index_table_from_tensor for instance might run into this problem). The Table API would remain the same since we hide initialization from the user and we wouldn't have to run an initializer explicitly.

The concern with this solution is that in serving, the first request would take more time than the rest of the calls and that sort of inconsistent performance is not desirable.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hidden table initializers have been a persistent thorn for us in some projects. Having them be explicitly visible at the construction site would be a nice win for readability.

@goldiegadde goldiegadde self-assigned this Oct 30, 2018
tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 8, 2018
These tests started failing after
4cedc8b

variables.Variable is the 2.0-style class - where the notion of
global collections is being done away with (as per
tensorflow/community#17) and hence
the call to sess.run(tf.global_variables_initializer())
is a no-op.

For now, switch the tests to use variable.VariableV1.
In a follow up, these tests need to be updated so that they do not rely on
global collections (i.e., do not call tf.global_variables_initializer()).

PiperOrigin-RevId: 220558354
@rohan100jain
Copy link
Member Author

I added design review notes. Can someone take a look and approve this request so that I can merge it?

@goldiegadde
Copy link
Contributor

I added design review notes. Can someone take a look and approve this request so that I can merge it?

Is there anything else actionable ? if not and everyone is in agreement with this proposal, Rohan, could you please update the status of the doc to "Accepted" and then we can merge it.

@goldiegadde goldiegadde added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 26, 2019
@goldiegadde goldiegadde merged commit fc9b7f8 into tensorflow:master Feb 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

2.0 TensorFlow 2.0 development cla: yes RFC: Accepted RFC Design Document: Accepted by Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants