Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,8 @@

package org.apache.spark.sql.catalyst.plans.logical

import java.util.concurrent.TimeUnit

import org.apache.spark.sql.catalyst.expressions.Attribute
import org.apache.spark.sql.catalyst.util.DateTimeUtils.MILLIS_PER_MONTH
import org.apache.spark.sql.types.MetadataBuilder
import org.apache.spark.unsafe.types.CalendarInterval

Expand All @@ -28,9 +27,7 @@ object EventTimeWatermark {
val delayKey = "spark.watermarkDelayMs"

def getDelayMs(delay: CalendarInterval): Long = {
// We define month as `31 days` to simplify calculation.
val millisPerMonth = TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31
delay.milliseconds + delay.months * millisPerMonth
delay.milliseconds + delay.months * MILLIS_PER_MONTH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we need here is seconds per month, not seconds per year. I think we should still assume 31 days per month here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point out where we need seconds/days per year in the codebase?

Copy link
Member Author

@MaxGekk MaxGekk Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe any place including this one when we need a duration (in seconds or its fractions). The difference between months_between() and this place is months_between uses month length to calculate fraction of month, and 28 or 31 days per months don't really matter because it impacts on 2nd or 3rd digit in fractions but here we operate on bigger numbers when months form years. And it becomes matter how much days we use per year. Let's say we calculate the duration of 10 years which 120 months. If we use 31 days per months, this duration is 31 * 120 = 10 * 372 = 3720 days but if one year is 365.2425 than 10 years = 3652 days. The difference is 3720 - 3652 = 68 days or the calculation error is more than 2 months. That's matter I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

months_between is sort of a special case because "31 days per month" is (it seems) actually how it is supposed to work, correctly.

It's rare that someone would specify "1 month" here, let alone "10 years" right? or am I missing something? these are things like watermark intervals. Not that it means the semantics don't matter, it's just quite a corner case.

I therefore just don't feel strongly either way about it. We don't need to match months_between semantics. More precision is nice, but surely it almost never comes up anyway? I don't mind the change, as a result.

}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,15 @@ object DateTimeUtils {
final val MILLIS_PER_MINUTE: Long = 60 * MILLIS_PER_SECOND
final val MILLIS_PER_HOUR: Long = 60 * MILLIS_PER_MINUTE
final val MILLIS_PER_DAY: Long = SECONDS_PER_DAY * MILLIS_PER_SECOND
// The average year of the Gregorian calendar 365.2425 days long, see
// https://en.wikipedia.org/wiki/Gregorian_calendar
// Leap year occurs every 4 years, except for years that are divisible by 100
// and not divisible by 400. So, the mean length of of the Gregorian calendar year is:
// 1 mean year = (365 + 1/4 - 1/100 + 1/400) days = 365.2425 days
// The mean year length in seconds is:
// 60 * 60 * 24 * 365.2425 = 31556952.0 = 12 * 2629746
final val SECONDS_PER_MONTH: Int = 2629746
final val MILLIS_PER_MONTH: Long = SECONDS_PER_MONTH * MILLIS_PER_SECOND

// number of days between 1.1.1970 and 1.1.2001
final val to2001 = -11323
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@
package org.apache.spark.sql.execution.streaming

import java.sql.Date
import java.util.concurrent.TimeUnit

import org.apache.spark.sql.catalyst.plans.logical.{EventTimeTimeout, ProcessingTimeTimeout}
import org.apache.spark.sql.catalyst.util.DateTimeUtils.MILLIS_PER_MONTH
import org.apache.spark.sql.execution.streaming.GroupStateImpl._
import org.apache.spark.sql.streaming.{GroupState, GroupStateTimeout}
import org.apache.spark.unsafe.types.CalendarInterval
Expand Down Expand Up @@ -164,8 +164,7 @@ private[sql] class GroupStateImpl[S] private(
throw new IllegalArgumentException(s"Provided duration ($duration) is not positive")
}

val millisPerMonth = TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31
cal.milliseconds + cal.months * millisPerMonth
cal.milliseconds + cal.months * MILLIS_PER_MONTH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting since this change doesn't affect our tests.

}

private def checkTimeoutTimestampAllowed(): Unit = {
Expand Down