Skip to content

Conversation

@chenghao-intel
Copy link
Contributor

withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") {
      withTempPath { dir =>
        val path = s"${dir.getCanonicalPath}/part=1"
        (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)

        // If the "part = 1" filter gets pushed down, this query will throw an exception since
        // "part" is not a valid column in the actual Parquet file
        checkAnswer(
          sqlContext.read.parquet(path).filter("a > 0 and (part = 0 or a > 1)"),
          (2 to 3).map(i => Row(i, i.toString, 1)))
      }
    }

We expect the result to be:

2,1
3,1

But got

1,1
2,1
3,1

@chenghao-intel
Copy link
Contributor Author

cc @liancheng @yhuai

@SparkQA
Copy link

SparkQA commented Sep 25, 2015

Test build #43012 has finished for PR 8916 at commit 1c0ba50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

cc @liancheng @yhuai Just in case you missed this PR. :)

@yhuai
Copy link
Contributor

yhuai commented Sep 28, 2015

What is our current behavior? Do we pushdown part = 0 to parquet? Also, what is plan for your query?

@chenghao-intel
Copy link
Contributor Author

Currently, it simply ignore the filter condition which contains both partition key and the other column. For the unit test case, it simply ignore the part (part = 0 or a > 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using groupBy, can we just use filter 3 times to split these 3 kinds of filters? I think performance doesn't matter here.

@SparkQA
Copy link

SparkQA commented Oct 13, 2015

Test build #43611 has finished for PR 8916 at commit f5705a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

Thank you @cloud-fan for the reviewing, updated, and passed the unit test.

@cloud-fan
Copy link
Contributor

LGTM, cc @liancheng to take another look.

@liancheng
Copy link
Contributor

LGTM, merging to master. Thanks!

@asfgit asfgit closed this in 1baaf2b Oct 14, 2015
@chenghao-intel chenghao-intel deleted the partition_filter branch October 16, 2015 00:36
asfgit pushed a commit that referenced this pull request Oct 30, 2015
…d columns (1.5 backport)

[SPARK-10829](#8916) Filter combine partition key and attribute doesn't work in DataSource scan

[SPARK-11301](#9271) fix case sensitivity for filter on partitioned columns

Author: Wenchen Fan <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Yin Huai <[email protected]>

Closes #9371 from cloud-fan/branch-1.5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants