-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP][SPARK-24498][SQL] Add JDK compiler for runtime codegen #21777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #93038 has finished for PR 21777 at commit
|
|
Yea, as you said, the JDK compiler generates different bytecode though, I couldn't get obvious performance gains for TPCDS as compared to the janino one. So, I couldn't find the strong reason to implement this in terms of performance; https://docs.google.com/spreadsheets/d/1Mgdd9dfFaACXOUHqKfaeKrj09hB3X1j9sKTJlJ6UM6w/edit#gid=1236423798 As another viewpoint, I think it might be useful to check if the generated Java code of Spark could be compiled by the JDK compiler (JDK8 code compatibility checks). But, since the compilation of the JDK compiler is too slow (see the performance values in the google spread sheet above), IMO it is impractical to check this in Jenkins.... (I found it took 7~8 hours to run the tests of the |
|
@maropu Except the TPC-DS queries, are we able to find some workloads that could perform faster using the bytecode generated by the JDK compiler? Or, does that mean Janino compiler is always better than JDK compiler? (that does not sound true to me) |
|
Also cc @rednaxelafx |
Since I don't have real workloads or non-TPCDS queries, it is difficult for me to find the workload.... can you? @kiszk
No, I meant averaged performance. I'm not a Java/JDK expert, so I don't completely understand how the two compilers work. IIUC, these compilers might apply simple optimization a little though, they simply convert Java code into bytecode? The hotspot has a responsibility for optimization? Anyway, we'd be better to wait for other developers. |
|
btw, it seems this pr exceeds the current timeout... Any way to temporarily make the timeout longer? We always need to configure timeout in the Jenkins-side like #20222 (comment)? |
|
It is a good question. Let me think and investigate for a while. |
|
@maropu Based on my understanding, the bytecodes generated by these two compilers are different. That is why the performance should be different. Previously, I expected the bytecodes generated by JDK compiler can be better optimized by JIT, compared with the one generated by Janino. Maybe our JVM internal experts @kiszk and @rednaxelafx can give more guidance. |
|
Test build #97552 has finished for PR 21777 at commit
|
|
@kiszk Can you close this? |
What changes were proposed in this pull request?
This PR allow a user to select Javac bytecode compiler to compile a DataFrame/Dataset program.
Details will be filled later.
This PR is based on @maropu's implementation.
How was this patch tested?
Added
CompilerSuite