Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 13, 2019

What changes were proposed in this pull request?

In the PR, I propose to create an instance of TimestampFormatter only once at the initialization, and reuse it inside of nullSafeEval() and doGenCode() in the case when the fmt parameter is foldable.

Why are the changes needed?

The changes improve performance of the date_format() function.

Before:

format date:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
format date wholestage off                    7180 / 7181          1.4         718.0       1.0X
format date wholestage on                     7051 / 7194          1.4         705.1       1.0X

After:

format date:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
format date wholestage off                    4787 / 4839          2.1         478.7       1.0X
format date wholestage on                     4736 / 4802          2.1         473.6       1.0X

Does this PR introduce any user-facing change?

No.

How was this patch tested?

By existing test suites DateExpressionsSuite and DateFunctionsSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 13, 2019

@dongjoon-hyun @maropu @srowen Please, take a look at the PR.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite know the logic of foldable enough to comment, but seems plausible.

override protected def nullSafeEval(timestamp: Any, format: Any): Any = {
val df = TimestampFormatter(format.toString, zoneId)
UTF8String.fromString(df.format(timestamp.asInstanceOf[Long]))
val tf = if (formatter.isEmpty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about .getOrElse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.getOrElse has some overhead of calling the lambda function. I explicitly avoided its usage in the interpreted mode. For consistency, I could do the same in the codegen function but I don't think it does matter.

@SparkQA
Copy link

SparkQA commented Sep 13, 2019

Test build #110566 has finished for PR 25782 at commit 5ab324f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2019

Test build #110567 has finished for PR 25782 at commit ee2a7d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @MaxGekk . In general, this PR looks fine. I left a few minor comments. I'll take a look later again.

@SparkQA
Copy link

SparkQA commented Sep 14, 2019

Test build #110586 has finished for PR 25782 at commit 3b5a8c2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 14, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 14, 2019

Test build #110590 has finished for PR 25782 at commit 3b5a8c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

LGTM if tests pass.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 16, 2019

jenkins, retest this, please

@HyukjinKwon
Copy link
Member

Seems like Jenkins is down.

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #4870 has finished for PR 25782 at commit a9fd8e4.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 16, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110639 has finished for PR 25782 at commit a9fd8e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110678 has finished for PR 25782 at commit a9fd8e4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@MaxGekk MaxGekk deleted the date_format-foldable branch October 5, 2019 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants