Skip to content

Conversation

@yongtang
Copy link
Member

@yongtang yongtang commented Jun 2, 2019

With the upcoming TF 1.14.0 (rc0 released, final could be released in several weeks) and TF 2.0 beta (expected to be released next week) release. It is time for us to move forward with 2.0 version of the tf.data.Dataset.

There are lots of changes in tf.data.DatasetV2. The biggest one being iteration now could be done in eager mode with for ... (instead of the previous awkward tf.Session.run() iteration).

For tensorflow-io, one immediate question is: with the co-existence of 1.x and 2.0, what should we support? I was also asked multiple times about what exactly is the advantage of using tf.data pipeline vs. graph with custom-written ops.

For tf.data, there are several useful aspects:

  1. The most valuable one is the direct input support for tf.keras (and integration with distribution strategy). It gives a nice pipeline that could be easily optimized.
  2. Iteration through list of tensors. This one is a little controversial as I don't see many users really like the tf.Session.run(init_ops); tf.Session.run(next_ops); pattern. However, with the upcoming 2.0 it is possible to iterate through dataset with for ... now.
  3. Potentially integration with @tf.function. As we could see @tf.function carries a signature of tf.TensorSpec which could fit into tf.data. This could be something we want to explore.
  4. Future expansion of tf.data usages. It looks like tf.data might fits into pandas or R's dataset, but let's find some concrete usages first.

I think the idea is to clean up anything that does not carry enough value. For example, should we really test the iteration in V1 with tf.Session.run(init_ops); tf.Session.run(next_ops); where most user will never use?

This PR is an effort to clean up and rework on MNISTDataset, so that it fits into DatasetV2, and allows future expansions:

  1. All unnecessary exposed methods (e.g., output_types, output_classes, output_shapes) are dropped. We could add back if we could find some concrete usage like with tf.keras or tf.function.
  2. In the tests, the following cases are tested:
    a. tf.keras support in 1.x with non-eager mode (default mode in 1.x)
    b. tf.keras support in 2.0 with eager-mode (default mode in 2.0)
    c. iteration with for ... in 2.0 eager-mode (default mode in 2.0)
    d. iteration with for ... in 1.x eager-mode (non-default mode in 1.x)
    e. iteration with tf.session in 1.x non-eager mode has been dropped.
  3. The combination of dataset with tf.functions has not bee fully tests. I will update in follow-up PRs.

Note: This is a rework of #195.

Signed-off-by: Yong Tang [email protected]

@yongtang
Copy link
Member Author

yongtang commented Jun 2, 2019

@BryanCutler @terrytangyuan Note in DatasetV2, only the following two methods are really necesssary:

  def _inputs(self):
     return []

    @property
   def _element_structure(self):
    ....

The PR abstract the above two methods in tensorflow_io.core.python.ops.data_ops.Dataset.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. I really like the abstraction of tensorflow_io.core.python.ops.data_ops.Dataset which greatly eases the migration. Should we add documentation on the advantage of using tf.data pipeline vs. graph with custom-written ops since this is getting asked a lot?

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and +1 on moving ops towards Dataset v2

@yongtang
Copy link
Member Author

yongtang commented Jun 4, 2019

Thanks @terrytangyuan @BryanCutler, I will wait for TF 2.0 beta to be out to rebase and merge. (Think TF 2.0 Beta is about to be released this week (either today or tomorrow).

Signed-off-by: Yong Tang <[email protected]>
@yongtang
Copy link
Member Author

yongtang commented Jun 8, 2019

Rebased against TF 1.14.0rc0 and 2.0.0b0, and all tests passed. Merging.

@yongtang yongtang merged commit 1bd81b5 into tensorflow:master Jun 8, 2019
@yongtang yongtang deleted the 2.0v branch June 8, 2019 22:32
@yongtang yongtang mentioned this pull request Jun 17, 2019
i-ony pushed a commit to i-ony/io that referenced this pull request Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants