Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 2, 2021

What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.9.

Why are the changes needed?

This is required to bring ORC-804 in order to fix ORC encryption masking bug.

Does this PR introduce any user-facing change?

No. This is not released yet.

How was this patch tested?

Pass the newly added test case.

@dongjoon-hyun
Copy link
Member Author

cc @gengliangwang This bug fix is required for Apache Spark 3.2.0.

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45093/

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45093/

@gengliangwang
Copy link
Member

@dongjoon-hyun Thanks for the ping.
Do you know how to run the test in local ? I always got

org.scalatest.exceptions.TestCanceledException: [] was empty org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider@1949309d doesn't has the test keys. ORC shim is created with old Hadoop libraries

for every test case under OrcEncryptionSuite

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Test build #140582 has finished for PR 33189 at commit 7af9274.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @gengliangwang . You can run it with SBT like the following.

$ build/sbt "sql/testOnly *.OrcEncryptionSuite"
...
[info] OrcEncryptionSuite:
09:46:50.545 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] - Write and read an encrypted file (2 seconds, 775 milliseconds)
[info] - Write and read an encrypted table (497 milliseconds)
[info] - SPARK-35325: Write and read encrypted nested columns (431 milliseconds)
[info] - SPARK-35992: Write and read fully-encrypted columns with default masking (636 milliseconds)
09:46:56.241 WARN org.apache.spark.sql.execution.datasources.orc.OrcEncryptionSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.datasources.orc.OrcEncryptionSuite, thread names: rpc-boss-3-1, shuffle-boss-6-1 =====
[info] Run completed in 6 seconds, 721 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 169 s (02:49), completed Jul 2, 2021 9:46:56 AM

Did you run it in IntelliJ?

@dongjoon-hyun
Copy link
Member Author

Merged to master/3.2 for Apache Spark 3.2.0.

dongjoon-hyun added a commit that referenced this pull request Jul 2, 2021
### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.9.

### Why are the changes needed?

This is required to bring ORC-804 in order to fix ORC encryption masking bug.

### Does this PR introduce _any_ user-facing change?

No. This is not released yet.

### How was this patch tested?

Pass the newly added test case.

Closes #33189 from dongjoon-hyun/SPARK-35992.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit c55b9fd)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-35992 branch July 2, 2021 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants