Skip to content

Conversation

@trueleo
Copy link
Contributor

@trueleo trueleo commented Aug 28, 2022

Fixes #53 .

Description

When server is restarted it should try to sync data on the local directory that is not yet synced to s3. Otherwise the data will get lost when new event arrives.

Make use of last modified system time to generate a prefix and move data.parquet to tmp directory, It will be synced to s3 on next s3_sync run.


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

@trueleo trueleo requested a review from nitisht August 28, 2022 14:29
if !dir.parquet_path_exists() {
continue;
}
if let Err(e) = dir.create_temp_dir() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the behaviour here if the tmp directory exists already? I mean we need to ensure we don't overwrite, because it is highly probably that the directory exists and has some data that is still not sent to S3.

Copy link
Contributor Author

@trueleo trueleo Aug 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_dir_all recursively builds all the missing component of path. If there is already an existing tmp, it will not overwrite anything inside

Copy link
Member

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not working on macOS and mostly windows as well. For platforms where we can't fetch the last modified time, we still need to discard data.parquet if that exists in the ./data directory. Otherwise we'd be adding new data to stale data from when the server crashed

@trueleo
Copy link
Contributor Author

trueleo commented Aug 29, 2022

@nitisht Test for this again after #77 is merged. Issue might be that the logstream was not getting loaded so there was no try on startup sync

@nitisht
Copy link
Member

nitisht commented Aug 29, 2022

seems to be working now.

@nitisht nitisht merged commit 6f3f961 into parseablehq:main Aug 29, 2022
@trueleo trueleo deleted the startup branch August 30, 2022 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Contents in data.parquet need to be preserved properly in case of restart

2 participants