-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq predicates with nulls for Parquet #3367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary.fromString and Binary.fromByteArray don't accept null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add this as a comment.
|
Test build #23612 has started for PR 3367 at commit
|
|
Test build #23612 has finished for PR 3367 at commit
|
|
Test FAILed. |
|
Build failure due to syncing issue between GitHub and ASF Git repo. |
|
retest this please |
|
Test build #23654 has started for PR 3367 at commit
|
|
Test build #23654 has finished for PR 3367 at commit
|
|
Test FAILed. |
|
Test build #530 has started for PR 3367 at commit
|
|
Test build #530 has finished for PR 3367 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why var?
|
Minor comments, otherwise LGTM. |
|
Addressed all styling issues. Thanks! |
12c9d1c to
cc41281
Compare
|
Test build #24026 has started for PR 3367 at commit
|
|
Test build #24027 has started for PR 3367 at commit
|
|
Test build #24026 has finished for PR 3367 at commit
|
|
Test PASSed. |
|
Test build #24027 has finished for PR 3367 at commit
|
|
Test PASSed. |
|
Thanks! Merged to master. |
This is a follow-up of #3367 and #3644. At the time #3644 was written, #3367 hadn't been merged yet, thus `IsNull` and `IsNotNull` filters are not covered in the first version of `ParquetFilterSuite`. This PR adds corresponding test cases. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3748) <!-- Reviewable:end --> Author: Cheng Lian <[email protected]> Closes #3748 from liancheng/test-null-filters and squashes the following commits: 1ab943f [Cheng Lian] IsNull and IsNotNull Parquet filter test case for boolean type bcd616b [Cheng Lian] Adds Parquet filter pushedown tests for IsNull and IsNotNull
Predicates like
a = NULLanda < NULLcan't be pushed down since ParquetLt,LtEq,Gt,GtEqdoesn't accept null value. Note thatEqandNotEqcan only be used withnullto represent predicates likea IS NULLanda IS NOT NULL.However, normally this issue doesn't cause NPE because any value compared to
NULLresultsNULL, and Spark SQL automatically optimizes outNULLpredicate in theSimplifyFiltersrule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do NOT think we need to backport this to branch-1.1This PR restricts
Lt,LtEq,GtandGtEqto non-null values only, and only usesEqwith null value to pushdownIsNullandIsNotNull. Also, added support for ParquetNotEqfilter for completeness and (tiny) performance gain, it's also used to pushdownIsNotNull.