Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Dec 1, 2021

What changes were proposed in this pull request?

This PR used to fix the issue
#33588 (comment)

The root cause is Orc write/read timestamp with local timezone in default. The local timezone will be changed.
If the Orc writer write timestamp with local timezone(e.g. America/Los_Angeles), when the Orc reader reading the timestamp with local timezone(e.g. Europe/Amsterdam), the value of timestamp will be different.

If we let the Orc writer write timestamp with UTC timezone, when the Orc reader reading the timestamp with UTC timezone too, the value of timestamp will be correct.

This PR let Orc write/read Timestamp with UTC timezone by call useUTCTimestamp(true) for readers or writers.

The related Orc source:
https://github.com/apache/orc/blob/3f1e57cf1cebe58027c1bd48c09eef4e9717a9e3/java/core/src/java/org/apache/orc/impl/WriterImpl.java#L525

https://github.com/apache/orc/blob/1f68ac0c7f2ae804b374500dcf1b4d7abe30ffeb/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L1184

Another problem is Spark 3.3 or newer read the Orc file written by Spark 3.2 or prior. Because the older Spark write timestamp with local timezone, no need to read them with UTC timezone. Otherwise, an incorrect value of timestamp occurs.

Why are the changes needed?

Fix the bug for Orc timestamp.

Does this PR introduce any user-facing change?

Orc timestamp ntz is a new feature not release yet.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Dec 1, 2021
@beliefer
Copy link
Contributor Author

beliefer commented Dec 1, 2021

Because my mistake rebase not correctly, I create this PR to replace #34712
ping @cloud-fan
@bersprockets This PR can fix all the problem as we know.

@SparkQA
Copy link

SparkQA commented Dec 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50289/

@SparkQA
Copy link

SparkQA commented Dec 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50289/

@SparkQA
Copy link

SparkQA commented Dec 1, 2021

Test build #145814 has finished for PR 34769 at commit 5347110.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 12, 2022
@beliefer
Copy link
Contributor Author

#34984 replaces this PR.

@beliefer beliefer closed this Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants