Skip to content

Conversation

@parmesant
Copy link
Contributor

This PR separates out the arrow conversion job from the sync job as we require more visibility into both of them. Arrow to Parquet conversion takes place at the 5th second of every minute and sync/upload happens every 30 seconds. Nothing else has been changed in terms of functionality.

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

@coveralls
Copy link

coveralls commented Jan 28, 2025

Pull Request Test Coverage Report for Build 13004371562

Details

  • 0 of 258 (0.0%) changed or added relevant lines in 6 files are covered.
  • 5 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.2%) to 12.568%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/handlers/http/modal/mod.rs 0 8 0.0%
src/handlers/http/modal/ingest_server.rs 0 13 0.0%
src/handlers/http/modal/server.rs 0 13 0.0%
src/storage/staging.rs 0 41 0.0%
src/sync.rs 0 48 0.0%
src/storage/object_storage.rs 0 135 0.0%
Files with Coverage Reduction New Missed Lines %
src/handlers/http/modal/mod.rs 1 29.07%
src/handlers/http/modal/server.rs 1 0.0%
src/handlers/http/modal/ingest_server.rs 1 0.0%
src/storage/object_storage.rs 2 0.0%
Totals Coverage Status
Change from base Build 12991591562: -0.2%
Covered Lines: 2443
Relevant Lines: 19438

💛 - Coveralls

@nitisht nitisht requested a review from de-sh January 28, 2025 07:22
@parmesant parmesant marked this pull request as ready for review January 30, 2025 11:15
let streams = STREAM_INFO.list_streams();

// start the sync loop for a stream
// parallelize this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please implement the code for parallelization for each stream in both sync and conversion flow


// read arrow files on disk
// convert them to parquet
let schema = convert_disk_files_to_parquet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we still converting arrow to parquet in sync flow

Returning None based on length of vec
for proper message printing
Upload and conversion tasks are handled by a monitor task which checks whether
these tasks have crossed a threshold or not.
In case either of these tasks crosses a pre-defined threshold, a warning will be printed.
@nitisht nitisht merged commit 43a793d into parseablehq:main Feb 6, 2025
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants