Skip to content

Conversation

@trueleo
Copy link
Contributor

@trueleo trueleo commented Dec 6, 2022

Description

This PR integrated a basic custom execution plan for Datafusion based on existing listing features. Whenever query happens it fetches all the arrows file and loads them into memory. Ideally it will have to load two or three arrow files, this is trivial and done through memory execution plan. Other one is cached parquet which may or may not be synced at query time, this is queried through already existing listing table. Finally result of local execution is combined with that of execution for object storage. To prevent files from being removed while they are used for query there is now a basic global file tracker which will only remove file once it has been uploaded and no on going query is using that file.

Todo

  • Greater pruning based on query time, limit and offset
  • Combine with storage execution plan ( will be added after further improvements )

This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

This PR integrated a basic custom execution plan for datafusion based on
existing listing features. Whenever query happens it fetches all the arrows
file and loads them into memory. Ideally it will have to load two or three
arrow files, this is trivial and done through memory execution plan. Other
one is cached parquet which may or may not be synced at query time, this
is queried through already existing listing table. Finally result of local
execution is combined with that of execution for object storage. To prevent
files from being removed while they are used for query there is now a basic
global file tracker which will only remove file once it has been uploaded
and no on going query is using that file.

- [ ] Greater pruning based on query time, limit and offset
- [ ] Combined execution model ( will be added after further improvements )
Copy link
Member

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and works great!

@nitisht nitisht merged commit 5e20d5c into parseablehq:main Dec 8, 2022
@nitisht nitisht mentioned this pull request Dec 8, 2022
3 tasks
@trueleo trueleo deleted the local_query_custom_provider branch December 8, 2022 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants