Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Mar 23, 2016

What changes were proposed in this pull request?

This PR rollback some changes in #11274 , which introduced some performance regression when do a simple aggregation on parquet scan with one integer column.

Does not really understand how this change introduce this huge impact, maybe related show JIT compiler inline functions. (saw very different stats from profiling).

How was this patch tested?

Manually run the parquet reader benchmark, before this change:

Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
Int and String Scan:                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized                   2391 / 3107         43.9          22.8       1.0X

After this change

Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
Int and String Scan:                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized                   2032 / 2626         51.6          19.4       1.0X```

@rxin
Copy link
Contributor

rxin commented Mar 23, 2016

We would need to dump the JITed assembly to understand what's going on.

| while ($idx < numRows) {
| int $rowidx = $idx++;
| ${consume(ctx, columns1).trim}
| if (shouldStop()) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comment somewhere to explain why shouldStop needs to be here? It'd be great to reference the JIRA ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here in the beginning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw i'm not sure but i suspect this has to do with loop unrolling. jit stops unrolling the loop when shouldStop is part of the terminal condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment around line 248 saying this loop is very perf sensitive and changes to it should be measured carefully?

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53905 has finished for PR 11912 at commit b31115d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator

@nongli
Copy link
Contributor

nongli commented Mar 23, 2016

LGTM

@davies
Copy link
Contributor Author

davies commented Mar 23, 2016

Added comment, merging this into master.

@asfgit asfgit closed this in 02d9c35 Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants