Skip to content

Conversation

@mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Dec 1, 2017

What changes were proposed in this pull request?

In many parts of the codebase for code generation, we are splitting the code to avoid exceptions due to the 64KB method size limit. This is generating a lot of methods which are called every time, even though sometime this is not needed. As pointed out here: #19752 (comment), this is a not negligible overhead which can be avoided.

The PR applies the same approach used in #19752 also to the other places where this was feasible.

How was this patch tested?

existing UTs.

@SparkQA
Copy link

SparkQA commented Dec 1, 2017

Test build #84376 has finished for PR 19860 at commit ce74fb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 2, 2017

@kiszk
Copy link
Member

kiszk commented Dec 2, 2017

this is a not negligible overhead which can be avoided.

How much can this PR reduce this overhead?

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 2, 2017

@kiszk of course it depends on each specific case, on average after this PR we use only 50% of the function calls. Thus on average the overhead caused by the many function calls is reduced by 50%.

@viirya
Copy link
Member

viirya commented Dec 3, 2017 via email

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 3, 2017

@viirya sorry, I don't understand your question.
In Coalesce, we need to find the first non-null element. As soon as we find one, we don't need to evaluate anything else. Previously, the code generated by coalesce would have been:

methodName_1();
methodName_2();
...
methodName_X();

and in each method we were using ${ev.isNull} to avoid the computation of the unnecessary expressions, after the first non-null condition was met.
In this case, even though we are doing nothing inside these function we are still calling all them and this is not cheap, as pointed out by @gatorsmile here: #19752 (comment).
Thus, in the new generated code, we avoid calling the methods when it is not necessary, since the generated code is:

do {
  methodName_1();
  if (!isNull_1234) {
    continue;
  }
  ...
} while (false);

@viirya
Copy link
Member

viirya commented Dec 3, 2017 via email

@kiszk
Copy link
Member

kiszk commented Dec 3, 2017

I am also interested in how much this PR can improve performance.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 3, 2017

@kiszk @viirya I made the following performance test:

val a = (1 to 100000).map(x => 1).toDS
val filtered = a.where($"value".isin((1 to 100000): _*))
(1 to 20).map(x=>time(filtered.count)).sum / 20 // where time is an easy function which measures the function time

before the PR the average execution time over the 20 trials is 3.428 s, while after the PR it is 3.121 s (on OSX 2,8 GHz Intel Core i7). This means about 10% improvement of the overall performance in this case.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in 2c16267 Dec 3, 2017
@gatorsmile
Copy link
Member

Thanks for your work! A late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants