-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled #12135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
4e29e20
feat: Preemptively filter for pushdown-preventing columns in ListingT…
itsjunetime 762f397
Fix behavior to make all previous tests work and lay groundwork for f…
itsjunetime 30f75e0
fix: Add some more tests and fix small issue with pushdown specificity
itsjunetime d76fea3
test: Revive unneccesarily removed test
itsjunetime 187a121
ci: Fix CI issues with different combinations of exprs
itsjunetime 11c62dc
fix: run fmt
itsjunetime fb53778
Fix doc publicity issues
itsjunetime 539b2e8
Add ::new fn for PushdownChecker
itsjunetime e13b89f
Remove unnecessary 'pub' qualifier
itsjunetime 5e86c0b
Fix naming and doc comment of non_pushdown_columns to reflect what it…
itsjunetime 6106ecd
fmt
itsjunetime c7d6211
Extend FileFormat trait to allow library users to define formats whic…
itsjunetime ba463b5
fmt
itsjunetime 89f423a
fix: reference real fn in doc to fix CI
itsjunetime 53b7046
Minor: Add tests for using FilterExec when parquet was pushed down
alamb 5c29552
Update datafusion/core/src/datasource/file_format/mod.rs
alamb ef0affe
Pipe schema information through to TableScan and ParquetExec to facil…
itsjunetime b1ee813
- Remove collect::<(_, _)> to satisfy msrv
itsjunetime f8adb43
Add more details in comments for `map_partial_batch`
itsjunetime ec5f21b
Remove reference to issue #4028 as it will be closed
itsjunetime 19f4310
Convert normal comments to doc-comments
itsjunetime 9cfbb5d
Clarify meaning of word `projected` in comment
itsjunetime 832f7e2
Clarify more how `table_schema` is used differently from `projected_t…
itsjunetime ca26f03
Finish partially-written comment about SchemaMapping struct
itsjunetime File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,16 +18,17 @@ | |
| //! The table implementation. | ||
|
|
||
| use std::collections::HashMap; | ||
| use std::str::FromStr; | ||
| use std::{any::Any, sync::Arc}; | ||
| use std::{any::Any, str::FromStr, sync::Arc}; | ||
|
|
||
| use super::helpers::{expr_applicable_for_cols, pruned_partition_list, split_files}; | ||
| use super::PartitionedFile; | ||
| use super::{ListingTableUrl, PartitionedFile}; | ||
|
|
||
| use super::ListingTableUrl; | ||
| use crate::datasource::{create_ordering, get_statistics_with_limit}; | ||
| use crate::datasource::{ | ||
| file_format::{file_compression_type::FileCompressionType, FileFormat}, | ||
| create_ordering, | ||
| file_format::{ | ||
| file_compression_type::FileCompressionType, FileFormat, FilePushdownSupport, | ||
| }, | ||
| get_statistics_with_limit, | ||
| physical_plan::{FileScanConfig, FileSinkConfig}, | ||
| }; | ||
| use crate::execution::context::SessionState; | ||
|
|
@@ -43,8 +44,9 @@ use datafusion_common::{ | |
| config_datafusion_err, internal_err, plan_err, project_schema, Constraints, | ||
| SchemaExt, ToDFSchema, | ||
| }; | ||
| use datafusion_execution::cache::cache_manager::FileStatisticsCache; | ||
| use datafusion_execution::cache::cache_unit::DefaultFileStatisticsCache; | ||
| use datafusion_execution::cache::{ | ||
| cache_manager::FileStatisticsCache, cache_unit::DefaultFileStatisticsCache, | ||
| }; | ||
| use datafusion_physical_expr::{ | ||
| create_physical_expr, LexOrdering, PhysicalSortRequirement, | ||
| }; | ||
|
|
@@ -789,19 +791,22 @@ impl TableProvider for ListingTable { | |
| .map(|col| Ok(self.table_schema.field_with_name(&col.0)?.clone())) | ||
| .collect::<Result<Vec<_>>>()?; | ||
|
|
||
| let filters = if let Some(expr) = conjunction(filters.to_vec()) { | ||
| // NOTE: Use the table schema (NOT file schema) here because `expr` may contain references to partition columns. | ||
| let table_df_schema = self.table_schema.as_ref().clone().to_dfschema()?; | ||
| let filters = | ||
| create_physical_expr(&expr, &table_df_schema, state.execution_props())?; | ||
| Some(filters) | ||
| } else { | ||
| None | ||
| }; | ||
| let filters = conjunction(filters.to_vec()) | ||
| .map(|expr| -> Result<_> { | ||
| // NOTE: Use the table schema (NOT file schema) here because `expr` may contain references to partition columns. | ||
| let table_df_schema = self.table_schema.as_ref().clone().to_dfschema()?; | ||
| let filters = create_physical_expr( | ||
| &expr, | ||
| &table_df_schema, | ||
| state.execution_props(), | ||
| )?; | ||
| Ok(Some(filters)) | ||
| }) | ||
| .unwrap_or(Ok(None))?; | ||
|
|
||
| let object_store_url = if let Some(url) = self.table_paths.first() { | ||
| url.object_store() | ||
| } else { | ||
| let Some(object_store_url) = | ||
| self.table_paths.first().map(ListingTableUrl::object_store) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
| else { | ||
| return Ok(Arc::new(EmptyExec::new(Arc::new(Schema::empty())))); | ||
| }; | ||
|
|
||
|
|
@@ -826,27 +831,37 @@ impl TableProvider for ListingTable { | |
| &self, | ||
| filters: &[&Expr], | ||
| ) -> Result<Vec<TableProviderFilterPushDown>> { | ||
| Ok(filters | ||
| filters | ||
| .iter() | ||
| .map(|filter| { | ||
| if expr_applicable_for_cols( | ||
| &self | ||
| .options | ||
| .table_partition_cols | ||
| .iter() | ||
| .map(|x| x.0.as_str()) | ||
| .map(|col| col.0.as_str()) | ||
itsjunetime marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| .collect::<Vec<_>>(), | ||
| filter, | ||
| ) { | ||
| // if filter can be handled by partition pruning, it is exact | ||
| TableProviderFilterPushDown::Exact | ||
| } else { | ||
| // otherwise, we still might be able to handle the filter with file | ||
| // level mechanisms such as Parquet row group pruning. | ||
| TableProviderFilterPushDown::Inexact | ||
| return Ok(TableProviderFilterPushDown::Exact); | ||
| } | ||
|
|
||
| // if we can't push it down completely with only the filename-based/path-based | ||
itsjunetime marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // column names, then we should check if we can do parquet predicate pushdown | ||
| let supports_pushdown = self.options.format.supports_filters_pushdown( | ||
| &self.file_schema, | ||
| &self.table_schema, | ||
| &[filter], | ||
| )?; | ||
|
|
||
| if supports_pushdown == FilePushdownSupport::Supported { | ||
| return Ok(TableProviderFilterPushDown::Exact); | ||
| } | ||
|
|
||
| Ok(TableProviderFilterPushDown::Inexact) | ||
| }) | ||
| .collect()) | ||
| .collect() | ||
| } | ||
|
|
||
| fn get_table_definition(&self) -> Option<&str> { | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.