[SPARK-14092] [SQL] move shouldStop() to end of while loop #11912

davies · 2016-03-23T06:34:30Z

What changes were proposed in this pull request?

This PR rollback some changes in #11274 , which introduced some performance regression when do a simple aggregation on parquet scan with one integer column.

Does not really understand how this change introduce this huge impact, maybe related show JIT compiler inline functions. (saw very different stats from profiling).

How was this patch tested?

Manually run the parquet reader benchmark, before this change:

Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
Int and String Scan:                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized                   2391 / 3107         43.9          22.8       1.0X

After this change

Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
Int and String Scan:                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized                   2032 / 2626         51.6          19.4       1.0X```

rxin · 2016-03-23T06:42:31Z

We would need to dump the JITed assembly to understand what's going on.

rxin · 2016-03-23T06:44:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

+      |     while ($idx < numRows) {
      |       int $rowidx = $idx++;
      |       ${consume(ctx, columns1).trim}
+      |       if (shouldStop()) return;


Can we add some comment somewhere to explain why shouldStop needs to be here? It'd be great to reference the JIRA ticket.

It's here in the beginning.

where is it?

btw i'm not sure but i suspect this has to do with loop unrolling. jit stops unrolling the loop when shouldStop is part of the terminal condition.

Can we add a comment around line 248 saying this loop is very perf sensitive and changes to it should be measured carefully?

SparkQA · 2016-03-23T08:17:59Z

Test build #53905 has finished for PR 11912 at commit b31115d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator

nongli · 2016-03-23T17:57:56Z

LGTM

davies · 2016-03-23T18:58:20Z

Added comment, merging this into master.

move shouldStop() to end of while loop

b31115d

rxin reviewed Mar 23, 2016
View reviewed changes

add comment

53ee657

asfgit closed this in 02d9c35 Mar 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-14092] [SQL] move shouldStop() to end of while loop #11912

[SPARK-14092] [SQL] move shouldStop() to end of while loop #11912

Uh oh!

davies commented Mar 23, 2016

Uh oh!

rxin commented Mar 23, 2016

Uh oh!

rxin Mar 23, 2016

Uh oh!

davies Mar 23, 2016

Uh oh!

rxin Mar 23, 2016

Uh oh!

rxin Mar 23, 2016

Uh oh!

nongli Mar 23, 2016

Uh oh!

SparkQA commented Mar 23, 2016

Uh oh!

nongli commented Mar 23, 2016

Uh oh!

davies commented Mar 23, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-14092] [SQL] move shouldStop() to end of while loop #11912

[SPARK-14092] [SQL] move shouldStop() to end of while loop #11912

Uh oh!

Conversation

davies commented Mar 23, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin commented Mar 23, 2016

Uh oh!

rxin Mar 23, 2016

Choose a reason for hiding this comment

Uh oh!

davies Mar 23, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Mar 23, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Mar 23, 2016

Choose a reason for hiding this comment

Uh oh!

nongli Mar 23, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 23, 2016

Uh oh!

nongli commented Mar 23, 2016

Uh oh!

davies commented Mar 23, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants