-
Notifications
You must be signed in to change notification settings - Fork 307
Closed
Labels
enhancementEnhancement requestEnhancement request
Description
The current workflow for Dataset is to read one record at a time, then use a batch() call to aggregate the result. Since Dataset is processed sequentially, it really makes sense to add batch support when reading records. In other words, data will be copied in large memory chunks (batch) with multiple records at each process. Believe this will help improve the overall performance. The batch parameter could be passed as the output_shapes so it should be fairly easy. The only item remaining is that, in case of multiple files, batch will need to be pieced together in between two files (if the record count is not divided by the batch).
terrytangyuan
Metadata
Metadata
Assignees
Labels
enhancementEnhancement requestEnhancement request