[SPARK-9182][SQL]Filters are not passed through to jdbc source #8049

yjshen · 2015-08-08T08:04:29Z

This PR fixes unable to push filter down to JDBC source caused by Cast during pattern matching.

While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down.

yjshen · 2015-08-08T08:05:18Z

/cc @liancheng

SparkQA · 2015-08-08T10:24:29Z

Test build #40230 has finished for PR 8049 at commit 890e66c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-10T02:13:21Z

Test build #40279 has finished for PR 8049 at commit e186ca2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-08-10T18:12:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

I haven't successfully constructed a test case to prove it, but simply removing the casts here looks a little bit dangerous to me. For example, both BinaryOperator and logic in ParquetFilters assume that both branches of a binary expression should have the same data type.

Instead of removing the casts, how about transposing them? Namely, converting

expression.transform { case LessThan(Cast(a: Attribute, _), value) => LessThan(a, Cast(value, a.dataType).eval()) case LessThan(value, Cast(a: Attribute, _)) => LessThan(Cast(value, a.dataType).eval(), a) ... }

In this way, we still ensure both branches have the same data type.

Thanks, I didn't notice the assumption of parquetFilters until you pointed out. I'll follow your advice above.

yjshen · 2015-08-11T04:21:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

when we are comparing DateType column with a String literal, we could not cast the String literal value to DateType since it would eval to the inner representation of DateType, i.e. int, which means nothing for JDBC source.

Cast.eval() returns values of Catalyst internal types, and can be converted to values of user space types via CatalystTypeConverter.convertToScala(). This is not super efficient, but we are not on the critical path, so I think it's OK.

yjshen · 2015-08-11T04:47:51Z

Jenkins, retest this please.

yjshen · 2015-08-11T05:00:42Z

some problem with Jenkins?

yjshen · 2015-08-11T05:00:56Z

Jenkins, retest this please.

SparkQA · 2015-08-11T05:21:44Z

Test build #40393 has finished for PR 8049 at commit 4977fe7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yjshen · 2015-08-11T05:39:38Z

Jenkins, retest this please.

SparkQA · 2015-08-11T07:58:51Z

Test build #40403 has finished for PR 8049 at commit 43380c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-11T17:09:12Z

Test build #40458 has finished for PR 8049 at commit cebdd1d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-08-12T11:46:43Z

@marmbrus Well, seems that GitHub ate your comment about Literal extractor, so I'm quoting it here:

It would be nice if we could just build the convertToScala logic into the extractor for Literal.

We can have separate extractors like InternalLiteral for Catalyst values and Literal extractor or Scala values.

@yjshen This can be done in separate PR though.

liancheng · 2015-08-12T11:51:56Z

@yjshen Thanks for fixing this! Merging to master and branch-1.5.

This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching. While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down. Author: Yijie Shen <[email protected]> Closes #8049 from yjshen/jdbc_pushdown. (cherry picked from commit 9d08224) Signed-off-by: Cheng Lian <[email protected]>

marmbrus · 2015-08-12T17:59:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

is this always safe? or could you for example cast long -> int and truncate?

yes, you are right, downcast here just truncate the origin literal value and have a wrong pushed down filter.

so should we revert this?

/cc @liancheng

Hi, @marmbrus , I've think this twice and may be I could do this?

For HadoopFsRelations, they all assume pushed down column and value are of same type, I think the only safe way is not pushed down these casted filters at all.
For JDBCRelation, since the value itself is converted into a constructed string where clause and pushed to the underlying database, I think it's safe to just pushed to uncasted value down?

Hmm, depends on what implicit casting rules various databases do. Can you investigate more? I would not want us to generate queries that fail to analayze.

In the mean time I think we should revert this from the release branch as pushing down wrong filters is worse than not pushing down filters.

ok, sorry for the wrong fix, should I make a revert PR?

Its okay. I'm glad we caught it. Please do and open a blocker JIRA targeted at 1.5 so we don't miss merging it.

It's reverted by #8157. I think we can add casts over literals only when the casts don't introduce truncation?

I reopened SPARK-9182. It's not a regression introduced in 1.5, do we need a blocker JIRA for it?

No, given the possible trickiness here I think we should bump the fix to 1.6.

I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <[email protected]> Closes #8157 from yjshen/rever8049. (cherry picked from commit d0b1891) Signed-off-by: Cheng Lian <[email protected]>

I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <[email protected]> Closes #8157 from yjshen/rever8049.

This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching. While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down. Author: Yijie Shen <[email protected]> Closes apache#8049 from yjshen/jdbc_pushdown.

I made a mistake in apache#8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <[email protected]> Closes apache#8157 from yjshen/rever8049.

liancheng reviewed Aug 10, 2015
View reviewed changes

eliminate cast on attribute to enable filter push down

6e5c60a

yjshen force-pushed the jdbc_pushdown branch from e186ca2 to 4977fe7 Compare August 11, 2015 04:13

yjshen reviewed Aug 11, 2015
View reviewed changes

cast literals instead

43380c3

yjshen force-pushed the jdbc_pushdown branch from 4977fe7 to 43380c3 Compare August 11, 2015 05:30

address comment

cebdd1d

asfgit closed this in 9d08224 Aug 12, 2015

marmbrus reviewed Aug 12, 2015
View reviewed changes

yjshen mentioned this pull request Aug 13, 2015

[SPARK-9927][SQL]Revert 8049 since it's pushing wrong filter down #8157

Closed

[SPARK-9182][SQL]Filters are not passed through to jdbc source #8049

[SPARK-9182][SQL]Filters are not passed through to jdbc source #8049

Uh oh!

Conversation

yjshen commented Aug 8, 2015

Uh oh!

yjshen commented Aug 8, 2015

Uh oh!

SparkQA commented Aug 8, 2015

Uh oh!

SparkQA commented Aug 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yjshen commented Aug 11, 2015

Uh oh!

yjshen commented Aug 11, 2015

Uh oh!

yjshen commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

yjshen commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

liancheng commented Aug 12, 2015

Uh oh!

liancheng commented Aug 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants