Skip to content

Add batch support for dataset at the creation #191

@yongtang

Description

@yongtang

The current workflow for Dataset is to read one record at a time, then use a batch() call to aggregate the result. Since Dataset is processed sequentially, it really makes sense to add batch support when reading records. In other words, data will be copied in large memory chunks (batch) with multiple records at each process. Believe this will help improve the overall performance. The batch parameter could be passed as the output_shapes so it should be fairly easy. The only item remaining is that, in case of multiple files, batch will need to be pieced together in between two files (if the record count is not divided by the batch).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions