-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29520][SS] Fix checks of negative intervals #26177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a2d83b5
f880fd3
1f5d684
ab6036c
6afa006
084c8d5
2aa6480
1840a1d
4dab906
b212976
8cb55a7
8c2bcb1
edad25e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,7 +17,7 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.util | ||
|
|
||
| import java.util.regex.Pattern | ||
| import java.util.concurrent.TimeUnit | ||
|
|
||
| import scala.util.control.NonFatal | ||
|
|
||
|
|
@@ -317,4 +317,42 @@ object IntervalUtils { | |
| "Interval string does not match second-nano format of ss.nnnnnnnnn") | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Gets interval duration | ||
| * | ||
| * @param interval The interval to get duration | ||
| * @param targetUnit Time units of the result | ||
| * @param daysPerMonth The number of days per one month. The default value is 31 days | ||
| * per month. This value was taken as the default because it is used | ||
| * in Structured Streaming for watermark calculations. Having 31 days | ||
| * per month, we can guarantee that events are not dropped before | ||
| * the end of any month (February with 29 days or January with 31 days). | ||
| * @return Duration in the specified time units | ||
| */ | ||
| def getDuration( | ||
| interval: CalendarInterval, | ||
| targetUnit: TimeUnit, | ||
| daysPerMonth: Int = 31): Long = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then we can remove the default value? The caller side should explicitly specify that it wants 31 days per month and why.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggested the default as that is the only value used now. I think callers that specify otherwise can explain why (later)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then at least we need to explain why 31 is picked as the default value |
||
| val monthsDuration = Math.multiplyExact( | ||
| daysPerMonth * DateTimeUtils.MICROS_PER_DAY, | ||
| interval.months) | ||
| val result = Math.addExact(interval.microseconds, monthsDuration) | ||
| targetUnit.convert(result, TimeUnit.MICROSECONDS) | ||
| } | ||
|
|
||
| /** | ||
| * Checks the interval is negative | ||
| * | ||
| * @param interval The checked interval | ||
| * @param daysPerMonth The number of days per one month. The default value is 31 days | ||
| * per month. This value was taken as the default because it is used | ||
| * in Structured Streaming for watermark calculations. Having 31 days | ||
| * per month, we can guarantee that events are not dropped before | ||
| * the end of any month (February with 29 days or January with 31 days). | ||
| * @return true if duration of the given interval is less than 0 otherwise false | ||
| */ | ||
| def isNegative(interval: CalendarInterval, daysPerMonth: Int = 31): Boolean = { | ||
| getDuration(interval, TimeUnit.MICROSECONDS, daysPerMonth) < 0 | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,7 +31,7 @@ private object Triggers { | |
|
|
||
| def convert(interval: String): Long = { | ||
| val cal = IntervalUtils.fromString(interval) | ||
| if (cal.months > 0) { | ||
| if (cal.months != 0) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like another way of converting interval to duration: make sure the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can change def getDuration(
interval: CalendarInterval,
targetUnit: TimeUnit,
daysPerMonth: Option[Int] = Some(31)): Long = {
val monthsDuration = daysPerMonth
.map { days =>
Math.multiplyExact(days * DateTimeUtils.MICROS_PER_DAY, interval.months)
}.getOrElse {
if (interval.months == 0) {
0L
} else {
throw new IllegalArgumentException(s"Doesn't support month or year interval: $interval")
}
}
val result = Math.addExact(interval.microseconds, monthsDuration)
targetUnit.convert(result, TimeUnit.MICROSECONDS)
}and call
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but I am not sure that this check should be inside of |
||
| throw new IllegalArgumentException(s"Doesn't support month or year interval: $interval") | ||
| } | ||
| TimeUnit.MICROSECONDS.toMillis(cal.microseconds) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we specify
daysPerMonth?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other places we need to get a duration of interval except for stream watermark? If not I think it's simpler to call it
getDurationForWaterwark, and explain why we pick 1 month = 31 days, according to #16304 (comment)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you propose is ad-hoc interface which solves current issue only. I don't like it because it makes interval interface not reusable.
getDurationForAnotherPlace1?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, a watermark has type of
TIMESTAMPsemantically but notINTERVAL. Probably, it can be even negative. If to rename the method, it couldgetDurationOfDelayorgetDurationOfTimeout(or following your proposalgetDurationOfDelayForEventTimeWatermarkorgetDurationOfTimeoutForGroupState)