Add batch support for dataset at the creation

The current workflow for Dataset is to read one record at a time, then use a batch() call to aggregate the result. Since Dataset is processed sequentially, it really makes sense to add batch support when reading records. In other words, data will be copied in large memory chunks (batch) with multiple records at each process. Believe this will help improve the overall performance. The batch parameter could be passed as the output_shapes so it should be fairly easy. The only item remaining is that, in case of multiple files, batch will need to be pieced together in between two files (if the record count is not divided by the batch).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add batch support for dataset at the creation #191

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add batch support for dataset at the creation #191

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions