Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Apr 27, 2020

What changes were proposed in this pull request?

With #28310, the operation of date +/- interval(m, d, 0) has been improved a lot.

According to the benchmark results, about 75% time cost is reduced because of no casting date to timestamp back and forth.

In this PR, we add a benchmark for these operations, and timestamp +/- interval operations as accessories.

Why are the changes needed?

Performance test coverage, since these operations are missing in the DateTimeBenchmark.

Does this PR introduce any user-facing change?

No, just test

How was this patch tested?

regenerated benchmark results

@yaooqinn
Copy link
Member Author

cc @cloud-fan @dongjoon-hyun @maropu @HyukjinKwon many thank you for your valuable time.

@yaooqinn yaooqinn changed the title [SPARK-31527][SQL][TESTS] Add a benchmark test for datetime add/subtract interval operations [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations Apr 27, 2020
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wholestage on or off doesn't matter actually. Could you build one table with default settings (wholestage on) like Conversion from/to external types in this benchmark.

@yaooqinn
Copy link
Member Author

wholestage on or off doesn't matter actually. Could you build one table with default settings (wholestage on) like Conversion from/to external types in this benchmark.

Can you explain the reason why it should follow that specific case but not other common ones?

@MaxGekk
Copy link
Member

MaxGekk commented Apr 27, 2020

Can you explain the reason why it should follow that specific case but not other common ones?

because

  1. You do benchmarking of particular functions, and it doesn't matter from where they are called - from generated code or not
  2. Wholestage on is common, and default case
  3. As your benchmark result show there are no diffs between wholestage on/off
  4. And it is easier to analyze benchmark results when everything related in one table

@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121902 has finished for PR 28369 at commit f4f8b05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

It's benchmark only so we don't need to wait for jenkins.

Thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 54996be Apr 28, 2020
cloud-fan pushed a commit that referenced this pull request Apr 28, 2020
… add/subtract interval operations

### What changes were proposed in this pull request?
With #28310, the operation of date +/- interval(m, d, 0) has been improved a lot.

According to the benchmark results, about 75% time cost is reduced because of no casting date to timestamp back and forth.

In this PR, we add a benchmark for these operations, and timestamp +/- interval operations as accessories.

### Why are the changes needed?

Performance test coverage, since these operations are missing in the DateTimeBenchmark.

### Does this PR introduce any user-facing change?

No, just test

### How was this patch tested?

regenerated benchmark results

Closes #28369 from yaooqinn/SPARK-31527-F.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 54996be)
Signed-off-by: Wenchen Fan <[email protected]>
@SparkQA
Copy link

SparkQA commented Apr 28, 2020

Test build #122004 has finished for PR 28369 at commit aa76f81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
datetime +/- interval: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
date + interval(m) 919 933 22 0.0 306237514.3 1.0X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column Per Row(ns) is incorrect obviously. This PR #28440 fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants