Skip to content

Conversation

@yjshen
Copy link
Member

@yjshen yjshen commented Aug 8, 2015

This PR fixes unable to push filter down to JDBC source caused by Cast during pattern matching.

While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down.

@yjshen
Copy link
Member Author

yjshen commented Aug 8, 2015

/cc @liancheng

@SparkQA
Copy link

SparkQA commented Aug 8, 2015

Test build #40230 has finished for PR 8049 at commit 890e66c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40279 has finished for PR 8049 at commit e186ca2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't successfully constructed a test case to prove it, but simply removing the casts here looks a little bit dangerous to me. For example, both BinaryOperator and logic in ParquetFilters assume that both branches of a binary expression should have the same data type.

Instead of removing the casts, how about transposing them? Namely, converting

expression.transform {
  case LessThan(Cast(a: Attribute, _), value) => 
    LessThan(a, Cast(value, a.dataType).eval())

  case LessThan(value, Cast(a: Attribute, _)) => 
    LessThan(Cast(value, a.dataType).eval(), a)

  ...
}

In this way, we still ensure both branches have the same data type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't notice the assumption of parquetFilters until you pointed out. I'll follow your advice above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we are comparing DateType column with a String literal, we could not cast the String literal value to DateType since it would eval to the inner representation of DateType, i.e. int, which means nothing for JDBC source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast.eval() returns values of Catalyst internal types, and can be converted to values of user space types via CatalystTypeConverter.convertToScala(). This is not super efficient, but we are not on the critical path, so I think it's OK.

@yjshen
Copy link
Member Author

yjshen commented Aug 11, 2015

Jenkins, retest this please.

@yjshen
Copy link
Member Author

yjshen commented Aug 11, 2015

some problem with Jenkins?

@yjshen
Copy link
Member Author

yjshen commented Aug 11, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40393 has finished for PR 8049 at commit 4977fe7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Aug 11, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40403 has finished for PR 8049 at commit 43380c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40458 has finished for PR 8049 at commit cebdd1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@marmbrus Well, seems that GitHub ate your comment about Literal extractor, so I'm quoting it here:

It would be nice if we could just build the convertToScala logic into the extractor for Literal.

We can have separate extractors like InternalLiteral for Catalyst values and Literal extractor or Scala values.

@yjshen This can be done in separate PR though.

@liancheng
Copy link
Contributor

@yjshen Thanks for fixing this! Merging to master and branch-1.5.

asfgit pushed a commit that referenced this pull request Aug 12, 2015
This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching.

While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down.

Author: Yijie Shen <[email protected]>

Closes #8049 from yjshen/jdbc_pushdown.

(cherry picked from commit 9d08224)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in 9d08224 Aug 12, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always safe? or could you for example cast long -> int and truncate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, downcast here just truncate the origin literal value and have a wrong pushed down filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should we revert this?

/cc @liancheng

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @marmbrus , I've think this twice and may be I could do this?

For HadoopFsRelations, they all assume pushed down column and value are of same type, I think the only safe way is not pushed down these casted filters at all.
For JDBCRelation, since the value itself is converted into a constructed string where clause and pushed to the underlying database, I think it's safe to just pushed to uncasted value down?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, depends on what implicit casting rules various databases do. Can you investigate more? I would not want us to generate queries that fail to analayze.

In the mean time I think we should revert this from the release branch as pushing down wrong filters is worse than not pushing down filters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sorry for the wrong fix, should I make a revert PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its okay. I'm glad we caught it. Please do and open a blocker JIRA targeted at 1.5 so we don't miss merging it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reverted by #8157. I think we can add casts over literals only when the casts don't introduce truncation?

I reopened SPARK-9182. It's not a regression introduced in 1.5, do we need a blocker JIRA for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, given the possible trickiness here I think we should bump the fix to 1.6.

asfgit pushed a commit that referenced this pull request Aug 13, 2015
I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down.

JIRA: https://issues.apache.org/jira/browse/SPARK-9927

Author: Yijie Shen <[email protected]>

Closes #8157 from yjshen/rever8049.

(cherry picked from commit d0b1891)
Signed-off-by: Cheng Lian <[email protected]>
asfgit pushed a commit that referenced this pull request Aug 13, 2015
I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down.

JIRA: https://issues.apache.org/jira/browse/SPARK-9927

Author: Yijie Shen <[email protected]>

Closes #8157 from yjshen/rever8049.
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching.

While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down.

Author: Yijie Shen <[email protected]>

Closes apache#8049 from yjshen/jdbc_pushdown.
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
I made a mistake in apache#8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down.

JIRA: https://issues.apache.org/jira/browse/SPARK-9927

Author: Yijie Shen <[email protected]>

Closes apache#8157 from yjshen/rever8049.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants