[SPARK-10829][SQL]Filter combine partition key and attribute doesn't work in DataSource scan #8916

chenghao-intel · 2015-09-25T06:02:42Z

withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") {
      withTempPath { dir =>
        val path = s"${dir.getCanonicalPath}/part=1"
        (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)

        // If the "part = 1" filter gets pushed down, this query will throw an exception since
        // "part" is not a valid column in the actual Parquet file
        checkAnswer(
          sqlContext.read.parquet(path).filter("a > 0 and (part = 0 or a > 1)"),
          (2 to 3).map(i => Row(i, i.toString, 1)))
      }
    }

We expect the result to be:

2,1
3,1

But got

1,1
2,1
3,1

…ttributes doesn't work

chenghao-intel · 2015-09-25T06:03:12Z

cc @liancheng @yhuai

SparkQA · 2015-09-25T08:10:10Z

Test build #43012 has finished for PR 8916 at commit 1c0ba50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-09-28T00:41:27Z

cc @liancheng @yhuai Just in case you missed this PR. :)

yhuai · 2015-09-28T04:43:57Z

What is our current behavior? Do we pushdown part = 0 to parquet? Also, what is plan for your query?

chenghao-intel · 2015-09-28T04:50:42Z

Currently, it simply ignore the filter condition which contains both partition key and the other column. For the unit test case, it simply ignore the part (part = 0 or a > 1).

cloud-fan · 2015-10-05T18:15:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

Instead of using groupBy, can we just use filter 3 times to split these 3 kinds of filters? I think performance doesn't matter here.

SparkQA · 2015-10-13T05:17:13Z

Test build #43611 has finished for PR 8916 at commit f5705a5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-10-13T06:53:31Z

Thank you @cloud-fan for the reviewing, updated, and passed the unit test.

cloud-fan · 2015-10-13T07:03:24Z

LGTM, cc @liancheng to take another look.

liancheng · 2015-10-14T23:27:16Z

LGTM, merging to master. Thanks!

…d columns (1.5 backport) [SPARK-10829](#8916) Filter combine partition key and attribute doesn't work in DataSource scan [SPARK-11301](#9271) fix case sensitivity for filter on partitioned columns Author: Wenchen Fan <[email protected]> This patch had conflicts when merged, resolved by Committer: Yin Huai <[email protected]> Closes #9371 from cloud-fan/branch-1.5.

Scan DataSource with predicate expression combine partition key and a…

1c0ba50

…ttributes doesn't work

cloud-fan reviewed Oct 5, 2015
View reviewed changes

update the code as suggested

f5705a5

asfgit closed this in 1baaf2b Oct 14, 2015

chenghao-intel deleted the partition_filter branch October 16, 2015 00:36

This was referenced Oct 30, 2015

[SPARK-11301][SQL] fix case sensitivity for filter on partitioned columns #9271

Closed

[SPARK-10829][SPARK-11301][SQL] fix 2 bugs for filter on partitioned columns (1.5 backport) #9371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10829][SQL]Filter combine partition key and attribute doesn't work in DataSource scan #8916

[SPARK-10829][SQL]Filter combine partition key and attribute doesn't work in DataSource scan #8916

Uh oh!

chenghao-intel commented Sep 25, 2015

Uh oh!

chenghao-intel commented Sep 25, 2015

Uh oh!

SparkQA commented Sep 25, 2015

Uh oh!

chenghao-intel commented Sep 28, 2015

Uh oh!

yhuai commented Sep 28, 2015

Uh oh!

chenghao-intel commented Sep 28, 2015

Uh oh!

cloud-fan Oct 5, 2015

Uh oh!

SparkQA commented Oct 13, 2015

Uh oh!

chenghao-intel commented Oct 13, 2015

Uh oh!

cloud-fan commented Oct 13, 2015

Uh oh!

liancheng commented Oct 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-10829][SQL]Filter combine partition key and attribute doesn't work in DataSource scan #8916

[SPARK-10829][SQL]Filter combine partition key and attribute doesn't work in DataSource scan #8916

Uh oh!

Conversation

chenghao-intel commented Sep 25, 2015

Uh oh!

chenghao-intel commented Sep 25, 2015

Uh oh!

SparkQA commented Sep 25, 2015

Uh oh!

chenghao-intel commented Sep 28, 2015

Uh oh!

yhuai commented Sep 28, 2015

Uh oh!

chenghao-intel commented Sep 28, 2015

Uh oh!

cloud-fan Oct 5, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 13, 2015

Uh oh!

chenghao-intel commented Oct 13, 2015

Uh oh!

cloud-fan commented Oct 13, 2015

Uh oh!

liancheng commented Oct 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants