-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-9182][SQL]Filters are not passed through to jdbc source #8049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/cc @liancheng |
|
Test build #40230 has finished for PR 8049 at commit
|
|
Test build #40279 has finished for PR 8049 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't successfully constructed a test case to prove it, but simply removing the casts here looks a little bit dangerous to me. For example, both BinaryOperator and logic in ParquetFilters assume that both branches of a binary expression should have the same data type.
Instead of removing the casts, how about transposing them? Namely, converting
expression.transform {
case LessThan(Cast(a: Attribute, _), value) =>
LessThan(a, Cast(value, a.dataType).eval())
case LessThan(value, Cast(a: Attribute, _)) =>
LessThan(Cast(value, a.dataType).eval(), a)
...
}
In this way, we still ensure both branches have the same data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't notice the assumption of parquetFilters until you pointed out. I'll follow your advice above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we are comparing DateType column with a String literal, we could not cast the String literal value to DateType since it would eval to the inner representation of DateType, i.e. int, which means nothing for JDBC source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast.eval() returns values of Catalyst internal types, and can be converted to values of user space types via CatalystTypeConverter.convertToScala(). This is not super efficient, but we are not on the critical path, so I think it's OK.
|
Jenkins, retest this please. |
|
some problem with Jenkins? |
|
Jenkins, retest this please. |
|
Test build #40393 has finished for PR 8049 at commit
|
|
Jenkins, retest this please. |
|
Test build #40403 has finished for PR 8049 at commit
|
|
Test build #40458 has finished for PR 8049 at commit
|
|
@marmbrus Well, seems that GitHub ate your comment about
We can have separate extractors like @yjshen This can be done in separate PR though. |
|
@yjshen Thanks for fixing this! Merging to master and branch-1.5. |
This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching. While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down. Author: Yijie Shen <[email protected]> Closes #8049 from yjshen/jdbc_pushdown. (cherry picked from commit 9d08224) Signed-off-by: Cheng Lian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this always safe? or could you for example cast long -> int and truncate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, downcast here just truncate the origin literal value and have a wrong pushed down filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should we revert this?
/cc @liancheng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @marmbrus , I've think this twice and may be I could do this?
For HadoopFsRelations, they all assume pushed down column and value are of same type, I think the only safe way is not pushed down these casted filters at all.
For JDBCRelation, since the value itself is converted into a constructed string where clause and pushed to the underlying database, I think it's safe to just pushed to uncasted value down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, depends on what implicit casting rules various databases do. Can you investigate more? I would not want us to generate queries that fail to analayze.
In the mean time I think we should revert this from the release branch as pushing down wrong filters is worse than not pushing down filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sorry for the wrong fix, should I make a revert PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its okay. I'm glad we caught it. Please do and open a blocker JIRA targeted at 1.5 so we don't miss merging it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's reverted by #8157. I think we can add casts over literals only when the casts don't introduce truncation?
I reopened SPARK-9182. It's not a regression introduced in 1.5, do we need a blocker JIRA for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, given the possible trickiness here I think we should bump the fix to 1.6.
I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <[email protected]> Closes #8157 from yjshen/rever8049. (cherry picked from commit d0b1891) Signed-off-by: Cheng Lian <[email protected]>
I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <[email protected]> Closes #8157 from yjshen/rever8049.
This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching. While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down. Author: Yijie Shen <[email protected]> Closes apache#8049 from yjshen/jdbc_pushdown.
I made a mistake in apache#8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <[email protected]> Closes apache#8157 from yjshen/rever8049.
This PR fixes unable to push filter down to JDBC source caused by
Castduring pattern matching.While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down.