-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26379][SS][FOLLOWUP] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp #23660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…xception in CurrentBatchTimestamp
| CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs, | ||
| cd.dataType, cd.timeZoneId) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following lines (511 ~ 519) are just a revert of the previous patch.
|
cc @HeartSaVioR , @jose-torres , @tdas , @zsxwing , @gaborgsomogyi , @arunmahadevan |
| // dummy string to prevent UnresolvedException and to prevent to be used in the future. | ||
| CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs, | ||
| ct.dataType) | ||
| ct.dataType, Some("Dummy TimeZoneId")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may try to use DateTimeUtils.defaultTimeZone().getID() or conf.sessionLocalTimeZone (as ResolveTimeZone does for TimeZoneAwareExpression).
However, IncrementalExecution doesn't use timezone info in case of CurrentBatchTimestamp(_, TimestampType, _). And, we want to prevent to use TimeZone for this CurrentTimestamp expression because this is originally non-TimeZoneAwareExpression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing nice analysis and looks like this patch is simpler and more concise. Nice!
|
Test build #101708 has finished for PR 23660 at commit
|
|
Retest this please. |
|
Test build #101710 has finished for PR 23660 at commit
|
| val df = input.toDS() | ||
| .withColumn("cur_timestamp", lit(current_timestamp())) | ||
| .withColumn("cur_date", lit(current_date())) | ||
| val df = input.toDS().withColumn("cur_timestamp", lit(current_timestamp())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, adding cur_date was the reason why UT is passed even without the patch.
I added only current_timestamp() first (checked UT failed without the patch) and added current_date() afterwards, which looks like making cur_timestamp be resolved without any of patches.
(Though I'm not sure about the mechanism why it happens...)
Nice finding!
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Happy to see cleaner patch with proper explanation of comments.
|
Thank you for review, @HeartSaVioR . |
|
Hi, @gatorsmile . Could you review this PR, too? |
|
Thank you for approval, @srowen . Merged to master/branch-2.4. |
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes #23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Spark replaces
CurrentTimestampwithCurrentBatchTimestamp.However,
CurrentBatchTimestampisTimeZoneAwareExpressionwhileCurrentTimestampisn't.Without TimeZoneId,
CurrentBatchTimestampbecomes unresolved and raisesUnresolvedException.Since
CurrentDateisTimeZoneAwareExpression, there is no problem withCurrentDate.This PR reverts the previous patch on
MicroBatchExecutionand fixes the root cause.How was this patch tested?
Pass the Jenkins with the updated test cases.