-
Notifications
You must be signed in to change notification settings - Fork 307
Update LMDBDataset to support batch at the creation #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also /cc @captain-pool |
|
This looks really helpful. Thanks @yongtang |
|
@captain-pool To explain further, with the new pattern, the only real method you need to implement to add a new data format is:
The ReadRecord is a stateful operation so you need a place to store your state. Next ReadRecord will start from the last op. In certain situations the state could be simply a 'int offset'. In certain situations the state will be much more complicated such as LMDB. You could define the class of the In case of LMDB, I wrapped the state into a In case of CIFAR (https://github.com/tensorflow/io/blob/master/tensorflow_io/cifar/kernels/cifar_dataset_ops.cc), Let me know if you have any questions. |
|
Will do, Thanks Yong! |
| ``` | ||
| Args: | ||
| filenames: A `tf.string` tensor containing one or more filenames. | ||
| filename: A `tf.string` tensor containing one or more filenames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input is slightly different than before, in that now it is a 1D or 2D tensor (not a list of tensors). Think it makes sense to remove s as we consider tensor as singular in Tensorflow overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would break backward compatibility though. Should we add a deprecation notice or are we planning a non-minor release?
|
@terrytangyuan Added a test case for LMDB with batch. |
|
Thanks. Just another comment on the arg name change. Travis build failed though. |
|
The build is failing as bazel does not have a rule to alias repo, and grpc assume zlib has to be named Will need to update the WORKSPACE and several BUILD files to fix. Feels like more than half of the time I am on fixing bazel issues, not actually implementation. ~ |
|
@terrytangyuan With upcoming TF 2.0, a lot of API will be changed. With v2 version of the tf.data.Dataset many public method will be gone. I was trying to move forward with #195, but noticed that once we switch to v2, the graph mode will not work for most of the cases. It comes to a point when we want to move to TF 2.0. I was thinking after the release of TF 1.14 we could release a version of 1.14, then switch. May still have to wait until 1.14 is released. |
|
@yongtang Thanks for the clarification. Should we rebase here now that the other PR on |
terrytangyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR updates LMDBDataset to support batch at the creation, it also switch to the new pattern to reduce unnecessary code duplication. Signed-off-by: Yong Tang <[email protected]> Add test case for LMDB with batch Signed-off-by: Yong Tang <[email protected]>
|
All tests passed now. Also updated |
Update LMDBDataset to support batch at the creation
This PR updates LMDBDataset to support batch at the creation,
it also switch to the new pattern to reduce unnecessary code
duplication.
Signed-off-by: Yong Tang [email protected]