-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25559][SQL] Remove the unsupported predicates in Parquet when possible #22574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d49d63b to
8c76e31
Compare
|
Test build #96724 has finished for PR 22574 at commit
|
| // If the unsupported predicate is in the top level `And` condition or in the child | ||
| // `And` condition before hitting `Not` or `Or` condition, it can be safely removed. | ||
| (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true), | ||
| createFilterHelper(nameToParquetField, rhs, canRemoveOneSideInAnd = true)) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I just fixed it.
Add a nested `And` test case
|
Test build #96725 has finished for PR 22574 at commit
|
|
Test build #96731 has finished for PR 22574 at commit
|
| } yield FilterApi.and(lhsFilter, rhsFilter) | ||
| // If the unsupported predicate is in the top level `And` condition or in the child | ||
| // `And` condition before hitting `Not` or `Or` condition, it can be safely removed. | ||
| (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit about style
val lhs = createFilterHelper...
val rhs = createFilterHelper...
(lhs, rhs) match {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. Thanks.
|
Test build #96735 has finished for PR 22574 at commit
|
|
test this again. |
|
LGTM, pending jenkins |
| // convert b in ('1'). If we only convert a = 2, we will end up with a filter | ||
| // NOT(a = 2), which will generate wrong results. | ||
| // Pushing one side of AND down is only safe to do at the top level. | ||
| // You can see ParquetRelation's initializeLocalJobFunc method as an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is good to show the cases we can't remove one side. Can we still keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed and added more tests.
|
retest this please. |
|
Test build #96749 has finished for PR 22574 at commit
|
|
Can we have a more clear title which is described as in the description? Currently, the claim is too broader than the description. |
|
Also, cc @rdblue |
|
I changed the title, and hopefully, it's much more clear now. |
|
Test build #96775 has finished for PR 22574 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Merged to master. |
…possible ## What changes were proposed in this pull request? Currently, in `ParquetFilters`, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top level `And` condition or in the child before hitting `Not` or `Or` condition, it can be safely removed. ## How was this patch tested? Tests are added. Closes apache#22574 from dbtsai/removeUnsupportedPredicatesInParquet. Lead-authored-by: DB Tsai <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Co-authored-by: DB Tsai <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ts in Parquet ## What changes were proposed in this pull request? This is a follow up of #22574. Renamed the parameter and added comments. ## How was this patch tested? N/A Closes #22679 from gatorsmile/followupSPARK-25559. Authored-by: gatorsmile <[email protected]> Signed-off-by: DB Tsai <[email protected]>
…possible ## What changes were proposed in this pull request? Currently, in `ParquetFilters`, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top level `And` condition or in the child before hitting `Not` or `Or` condition, it can be safely removed. ## How was this patch tested? Tests are added. Closes apache#22574 from dbtsai/removeUnsupportedPredicatesInParquet. Lead-authored-by: DB Tsai <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Co-authored-by: DB Tsai <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ts in Parquet ## What changes were proposed in this pull request? This is a follow up of apache#22574. Renamed the parameter and added comments. ## How was this patch tested? N/A Closes apache#22679 from gatorsmile/followupSPARK-25559. Authored-by: gatorsmile <[email protected]> Signed-off-by: DB Tsai <[email protected]>
## What changes were proposed in this pull request? Inspired by apache#22574 . We can partially push down top level conjunctive predicates to Orc. This PR improves Orc predicate push down in both SQL and Hive module. ## How was this patch tested? New unit test. Closes apache#22684 from gengliangwang/pushOrcFilters. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: DB Tsai <[email protected]>
What changes were proposed in this pull request?
Currently, in
ParquetFilters, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top levelAndcondition or in the child before hittingNotorOrcondition, it can be safely removed.How was this patch tested?
Tests are added.