Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Aug 22, 2020

What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis.

Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too.

Does this PR introduce any user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work.

How was this patch tested?

Unit tests.

@SparkQA
Copy link

SparkQA commented Aug 22, 2020

Test build #127770 has finished for PR 29513 at commit a19e523.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class OrcPrimitiveField(fieldName: String, fieldType: DataType)

@viirya viirya changed the title [SPARK-32646][SQL][BRANCH-3.0] ORC predicate pushdown should work with case-insensitive analysis [SPARK-32646][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis Aug 22, 2020
@viirya
Copy link
Member Author

viirya commented Aug 22, 2020

retest this please

@SparkQA

This comment has been minimized.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 22, 2020

Test build #127780 has finished for PR 29513 at commit a19e523.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class OrcPrimitiveField(fieldName: String, fieldType: DataType)

@viirya
Copy link
Member Author

viirya commented Aug 22, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Aug 22, 2020

Test build #127785 has finished for PR 29513 at commit a19e523.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class OrcPrimitiveField(fieldName: String, fieldType: DataType)

@viirya
Copy link
Member Author

viirya commented Aug 22, 2020

Not sure if these errors are related.

E.g., for org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite.Read/Write Hive PARQUET serde table, this is the query plan:

== Parsed Logical Plan ==
'UnresolvedRelation [hive_serde]

== Analyzed Logical Plan ==
c1: date
SubqueryAlias spark_catalog.default.hive_serde
+- HiveTableRelation `default`.`hive_serde`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [c1#40752]

== Optimized Logical Plan ==
HiveTableRelation `default`.`hive_serde`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [c1#40752]

== Physical Plan ==
Scan hive default.hive_serde [c1#40752], HiveTableRelation `default`.`hive_serde`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [c1#40752]

ORC unrelated and no pushdown predicate.

Btw, I cannot reproduce the errors locally.

@viirya
Copy link
Member Author

viirya commented Aug 23, 2020

Err.. I think these tests are already failed in current branch-3.0 and master branches. Please see #29517. I created SPARK-32689 to track it.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32646][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis Aug 24, 2020
@viirya
Copy link
Member Author

viirya commented Aug 24, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127828 has finished for PR 29513 at commit a19e523.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class OrcPrimitiveField(fieldName: String, fieldType: DataType)

@viirya
Copy link
Member Author

viirya commented Aug 24, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127832 has finished for PR 29513 at commit a19e523.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class OrcPrimitiveField(fieldName: String, fieldType: DataType)

@viirya
Copy link
Member Author

viirya commented Aug 24, 2020

Passed hive-2.3 and hive-1.2 tests. This is ready for review now. Basically this is similar to #29457 but for 3.0 we don't have ORC nested predicate pushdown, so #29457 cannot directly backport.

cc @dongjoon-hyun @cloud-fan @HyukjinKwon

cloud-fan pushed a commit that referenced this pull request Aug 25, 2020
…ushdown should work with case-insensitive analysis

### What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis.

### Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too.

### Does this PR introduce _any_ user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work.

### How was this patch tested?

Unit tests.

Closes #29513 from viirya/fix-orc-pushdown-3.0.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to 3.0!

@cloud-fan cloud-fan closed this Aug 25, 2020
@viirya
Copy link
Member Author

viirya commented Aug 25, 2020

Thanks! @cloud-fan

@viirya viirya deleted the fix-orc-pushdown-3.0 branch December 27, 2023 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants