-
-
Notifications
You must be signed in to change notification settings - Fork 153
split arrow conversion and sync #1138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 13004371562Details
💛 - Coveralls |
bc40cf2 to
813ede8
Compare
src/storage/object_storage.rs
Outdated
| let streams = STREAM_INFO.list_streams(); | ||
|
|
||
| // start the sync loop for a stream | ||
| // parallelize this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please implement the code for parallelization for each stream in both sync and conversion flow
src/storage/object_storage.rs
Outdated
|
|
||
| // read arrow files on disk | ||
| // convert them to parquet | ||
| let schema = convert_disk_files_to_parquet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we still converting arrow to parquet in sync flow
Returning None based on length of vec for proper message printing
Upload and conversion tasks are handled by a monitor task which checks whether these tasks have crossed a threshold or not. In case either of these tasks crosses a pre-defined threshold, a warning will be printed.
37aa68d to
cdd863a
Compare
perf: consume less CPU cycles, by understanding tokio
This PR separates out the arrow conversion job from the sync job as we require more visibility into both of them. Arrow to Parquet conversion takes place at the 5th second of every minute and sync/upload happens every 30 seconds. Nothing else has been changed in terms of functionality.
Description
This PR has: