Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Sep 7, 2018

What changes were proposed in this pull request?

When running TPC-DS benchmarks on 2.4 release, @npoggi and @winglungngai saw more than 10% performance regression on the following queries: q67, q24a and q24b. After we applying the PR #22338, the performance regression still exists. If we revert the changes in #19222, @npoggi and @winglungngai found the performance regression was resolved. Thus, this PR is to revert the related changes for unblocking the 2.4 release.

In the future release, we still can continue the investigation and find out the root cause of the regression.

How was this patch tested?

The existing test cases

@gatorsmile
Copy link
Member Author

cc @kiszk @cloud-fan

@gatorsmile gatorsmile changed the title Revert [SPARK-10399] [SPARK-23879] [SPARK-23762] Revert [SPARK-10399] [SPARK-23879] [SPARK-23762] [SPARK-25317] Sep 7, 2018
@SparkQA
Copy link

SparkQA commented Sep 7, 2018

Test build #95808 has finished for PR 22361 at commit cbe3a51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Sep 8, 2018

@gatorsmile Overall, I agree with revert since performance degradation is confirmed.

When I run the TPC-DS in #19222, I have not seen such a performance regression as here. For future analysis, I am very interested in

  • execution time of each query
  • environment (machine, os, jvm, cluster or local, ...)
  • scaling factor,
  • number of repeats per query
  • others

@npoggi and @winglungngai Thanks for your investigations. Would it be possible to share the avobe information?

@cloud-fan
Copy link
Contributor

LGTM, I'm merging it to unblock the 2.4 RC, thanks!

asfgit pushed a commit that referenced this pull request Sep 9, 2018
## What changes were proposed in this pull request?

When running TPC-DS benchmarks on 2.4 release, npoggi and winglungngai  saw more than 10% performance regression on the following queries: q67, q24a and q24b. After we applying the PR #22338, the performance regression still exists. If we revert the changes in #19222, npoggi and winglungngai  found the performance regression was resolved. Thus, this PR is to revert the related changes for unblocking the 2.4 release.

In the future release, we still can continue the investigation and find out the root cause of the regression.

## How was this patch tested?

The existing test cases

Closes #22361 from gatorsmile/revertMemoryBlock.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 0b9ccd5)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 0b9ccd5 Sep 9, 2018
@HyukjinKwon
Copy link
Member

It would be great if the information requested #22361 (comment) is shared so that other people can check it as well.

@npoggi
Copy link
Contributor

npoggi commented Sep 25, 2018

Hi @kiszk and @HyukjinKwon, first sorry for the late reply, I was trying to get some detailed profiling info for this. Basically, the regressions in TPCDS SF 1,000 are as follows when compared branch- 2.3 to 2.4 using a 10-workers cluster of 4-vcpus each :

q67: from 323.2s to 283.6s (18.7%)
q24a: from 139.1s to 153.9 (12.4%)
q24b: from 128.7s to 145.7s (11.9%)

These 3 queries in order are the easiest to root-cause to the new MemoryBlock API, as there is more time spent in UTF8String (getBytes, init, UnsafeRow.getUTF8String). There were also other possibly affected queries: q8, q11, q54.
We can test future performance improvements to the proposal, just ping @winglungngai, @cloud-fan, and me in the PR. Thanks,

@gatorsmile
Copy link
Member Author

Nico will give a talk in the upcoming spark summit: https://databricks.com/session/a-framework-for-evaluating-the-performance-and-the-correctness-of-the-spark-sql-engine

If possible, please join the talk! : )

@kiszk
Copy link
Member

kiszk commented Sep 27, 2018

It looks very intersting talk. Since I cannot join the SAIS unfortunally, I will watch this in live stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants