Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Sep 27, 2018

What changes were proposed in this pull request?

Currently, in ParquetFilters, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top level And condition or in the child before hitting Not or Or condition, it can be safely removed.

How was this patch tested?

Tests are added.

@dbtsai
Copy link
Member Author

dbtsai commented Sep 27, 2018

@dbtsai dbtsai force-pushed the removeUnsupportedPredicatesInParquet branch from d49d63b to 8c76e31 Compare September 27, 2018 23:10
@dbtsai dbtsai changed the title [SPARK-25556][SQL] Just remove the unsupported predicates in Parquet [SPARK-25559][SQL] Just remove the unsupported predicates in Parquet Sep 27, 2018
@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96724 has finished for PR 22574 at commit d49d63b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// If the unsupported predicate is in the top level `And` condition or in the child
// `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
(createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
createFilterHelper(nameToParquetField, rhs, canRemoveOneSideInAnd = true)) match {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pinging me, @dbtsai .

Here, instead of canRemoveOneSideInAnd = true, we need to use the received canRemoveOneSideInAnd, don't we? Otherwise, this will cause a bug in the nested And clauses. So, could you review and merge the test case, dbtsai#5, and fix the PR accordingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I just fixed it.

Add a nested `And` test case
@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96725 has finished for PR 22574 at commit 8c76e31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96731 has finished for PR 22574 at commit 29a9aa0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} yield FilterApi.and(lhsFilter, rhsFilter)
// If the unsupported predicate is in the top level `And` condition or in the child
// `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
(createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit about style

val lhs = createFilterHelper...
val rhs = createFilterHelper...
(lhs, rhs) match {
  ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Thanks.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96735 has finished for PR 22574 at commit d37b2e8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member Author

dbtsai commented Sep 28, 2018

test this again.

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins

// convert b in ('1'). If we only convert a = 2, we will end up with a filter
// NOT(a = 2), which will generate wrong results.
// Pushing one side of AND down is only safe to do at the top level.
// You can see ParquetRelation's initializeLocalJobFunc method as an example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is good to show the cases we can't remove one side. Can we still keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed and added more tests.

@viirya
Copy link
Member

viirya commented Sep 28, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96749 has finished for PR 22574 at commit d37b2e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Can we have a more clear title which is described as in the description? Currently, the claim is too broader than the description.

@dongjoon-hyun
Copy link
Member

Also, cc @rdblue

@dbtsai dbtsai changed the title [SPARK-25559][SQL] Just remove the unsupported predicates in Parquet [SPARK-25559][SQL] Remove the unsupported predicates in Parquet when possible Sep 28, 2018
@dbtsai
Copy link
Member Author

dbtsai commented Sep 28, 2018

I changed the title, and hopefully, it's much more clear now.

@SparkQA
Copy link

SparkQA commented Sep 29, 2018

Test build #96775 has finished for PR 22574 at commit 9a9e47f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 5d726b8 Sep 29, 2018
daspalrahul pushed a commit to daspalrahul/spark that referenced this pull request Sep 29, 2018
…possible

## What changes were proposed in this pull request?

Currently, in `ParquetFilters`, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top level `And` condition or in the child before hitting `Not` or `Or` condition, it can be safely removed.

## How was this patch tested?

Tests are added.

Closes apache#22574 from dbtsai/removeUnsupportedPredicatesInParquet.

Lead-authored-by: DB Tsai <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: DB Tsai <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dbtsai dbtsai deleted the removeUnsupportedPredicatesInParquet branch October 8, 2018 19:28
asfgit pushed a commit that referenced this pull request Oct 9, 2018
…ts in Parquet

## What changes were proposed in this pull request?
This is a follow up of #22574. Renamed the parameter and added comments.

## How was this patch tested?
N/A

Closes #22679 from gatorsmile/followupSPARK-25559.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: DB Tsai <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…possible

## What changes were proposed in this pull request?

Currently, in `ParquetFilters`, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top level `And` condition or in the child before hitting `Not` or `Or` condition, it can be safely removed.

## How was this patch tested?

Tests are added.

Closes apache#22574 from dbtsai/removeUnsupportedPredicatesInParquet.

Lead-authored-by: DB Tsai <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: DB Tsai <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ts in Parquet

## What changes were proposed in this pull request?
This is a follow up of apache#22574. Renamed the parameter and added comments.

## How was this patch tested?
N/A

Closes apache#22679 from gatorsmile/followupSPARK-25559.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: DB Tsai <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Inspired by apache#22574 .
We can partially push down top level conjunctive predicates to Orc.
This PR improves Orc predicate push down in both SQL and Hive module.

## How was this patch tested?

New unit test.

Closes apache#22684 from gengliangwang/pushOrcFilters.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: DB Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants