Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

I just realized that with the changes in #33650, the restriction for not pushing down Min/Max/Count for partition filter was already removed. This PR just added test to make sure Min/Max/Count in parquet are pushed down if filter is on partition col.

Why are the changes needed?

To complete the work for Aggregate (Min/Max/Count) push down for Parquet

Does this PR introduce any user-facing change?

No

How was this patch tested?

new test

@github-actions github-actions bot added the SQL label Oct 12, 2021
@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48587/

@HyukjinKwon HyukjinKwon changed the title [SPARK-36647][SQL] Push down Aggregate (Min/Max/Count) for Parquet if filter is on partition col [SPARK-36647][SQL][TESTS] Push down Aggregate (Min/Max/Count) for Parquet if filter is on partition col Oct 12, 2021
@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48587/

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Test build #144110 has finished for PR 34248 at commit cd22629.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @huaxingao.

Comment on lines 249 to 250
val enableVectorizedReader = Seq("false", "true")
for (testVectorizedReader <- enableVectorizedReader) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can be more scala here, but not a big deal:

Seq("false", "true").foreach { enableVectorizedReader =>
  withSQLConf(...) {
    ...
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c21 Thanks for reviewing! I fixed this.

@huaxingao
Copy link
Contributor Author

cc @viirya Could you please take a look when you have time? Thanks!

Seq("false", "true").foreach { enableVectorizedReader =>
withSQLConf(SQLConf.PARQUET_AGGREGATE_PUSHDOWN_ENABLED.key -> "true",
vectorizedReaderEnabledKey -> enableVectorizedReader) {
val max = sql("SELECT max(id) FROM tmp WHERE p = 0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add other two supported aggregate functions? And how about group by on partition column case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.
Group by on partition column is a little more complicated and needs some code changes: currently, we only have the aggregate values in the returned row. For group by on partition column, we will need to pass down the partition col value and prepend that value to the aggregation row. I will have a separate PR for that work.

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48655/

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48655/

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48659/

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48658/

Comment on lines 133 to 134
// However, if the filter or group by is on partition column,
// max/min/count can still be pushed down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So group by on partition column is not supported yet. Then this comment is not correct.

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48659/

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48658/

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Test build #144177 has finished for PR 34248 at commit 6cc6f7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 13, 2021

Test build #144180 has finished for PR 34248 at commit 1c96138.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48854/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48854/

@SparkQA
Copy link

SparkQA commented Oct 19, 2021

Test build #144380 has finished for PR 34248 at commit 1293ae0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Oct 26, 2021

retest this please

@viirya
Copy link
Member

viirya commented Oct 26, 2021

I'll merge this after CI, since last CI was a few days ago.

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Test build #144626 has started for PR 34248 at commit 1293ae0.

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49096/

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49096/

@viirya
Copy link
Member

viirya commented Oct 27, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49103/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Test build #144633 has finished for PR 34248 at commit 1293ae0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49103/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49106/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49106/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Test build #144636 has finished for PR 34248 at commit 1293ae0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Oct 27, 2021

Thanks! Merging to master.

@viirya viirya closed this in 4aec9d7 Oct 27, 2021
@huaxingao
Copy link
Contributor Author

Thanks @c21 @viirya

@huaxingao huaxingao deleted the partitionFilter branch October 27, 2021 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants