-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22669][SQL] Avoid unnecessary function calls in code generation #19860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #84376 has finished for PR 19860 at commit
|
How much can this PR reduce this overhead? |
|
@kiszk of course it depends on each specific case, on average after this PR we use only 50% of the function calls. Thus on average the overhead caused by the many function calls is reduced by 50%. |
|
I'm not sure if I currently follow this. For example, Coalesce, doesn't
guarantee the later functions won't be called by the conditions of
ev.isNull? Why we need to apply this do loop?
…On Dec 3, 2017 2:30 AM, "Marco Gaido" ***@***.***> wrote:
@kiszk <https://github.com/kiszk> of course it depends on each specific
case, on average after this PR we use only 50% of the function calls. Thus
on average the overhead caused by the many function calls is reduced by 50%.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19860 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEM99VLlpALSalH-aXQ9gma4kSyKhiAks5s8ZdNgaJpZM4QyVuP>
.
|
|
@viirya sorry, I don't understand your question. and in each method we were using |
|
OK. I see the intention here now. I'm not sure if it does considerable
impact, especially smaller functions will be inlined IIUC.
If it has impact not ignoring, it should be worth doing.
On Dec 3, 2017 1:56 PM, "Marco Gaido" <[email protected]> wrote:
@viirya <https://github.com/viirya> sorry, I don't understand your question.
In Coalesce, we need to find the first non-null element. As soon as we find
one, we don't need to evaluate anything else. Previously, the code
generated by coalesce would have been:
methodName_1();
methodName_2();
...
methodName_X();
and in each method we were using ${ev.isNull} to avoid the computation of
the unnecessary expressions, after the first non-null condition was met.
In this case, even though we are doing nothing inside these function we are
still calling all them and this is not cheap, as pointed out by @gatorsmile
<https://github.com/gatorsmile> here: #19752 (comment)
<#19752 (comment)>.
Thus, in the new generated code, we avoid calling the methods when it is
not necessary, since the generated code is:
do {
methodName_1();
if (!isNull_1234) {
continue;
}
...
} while (false);
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19860 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEM9zsYPO_0A1a19ws-EuQtcTch6_F4ks5s8jf_gaJpZM4QyVuP>
.
|
|
I am also interested in how much this PR can improve performance. |
|
@kiszk @viirya I made the following performance test: before the PR the average execution time over the 20 trials is 3.428 s, while after the PR it is 3.121 s (on OSX 2,8 GHz Intel Core i7). This means about 10% improvement of the overall performance in this case. |
|
LGTM, merging to master! |
|
Thanks for your work! A late LGTM |
What changes were proposed in this pull request?
In many parts of the codebase for code generation, we are splitting the code to avoid exceptions due to the 64KB method size limit. This is generating a lot of methods which are called every time, even though sometime this is not needed. As pointed out here: #19752 (comment), this is a not negligible overhead which can be avoided.
The PR applies the same approach used in #19752 also to the other places where this was feasible.
How was this patch tested?
existing UTs.