Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Jul 16, 2018

What changes were proposed in this pull request?

This PR allow a user to select Javac bytecode compiler to compile a DataFrame/Dataset program.
Details will be filled later.

This PR is based on @maropu's implementation.

How was this patch tested?

Added CompilerSuite

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93038 has finished for PR 21777 at commit 523bf3d.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

@maropu @kiszk Thank you for taking this effort!

Based on my initial understanding, the code generated by the JDK compiler can be better optimized by JIT in many cases. Is my understanding right?

@maropu
Copy link
Member

maropu commented Jul 18, 2018

Yea, as you said, the JDK compiler generates different bytecode though, I couldn't get obvious performance gains for TPCDS as compared to the janino one. So, I couldn't find the strong reason to implement this in terms of performance; https://docs.google.com/spreadsheets/d/1Mgdd9dfFaACXOUHqKfaeKrj09hB3X1j9sKTJlJ6UM6w/edit#gid=1236423798

As another viewpoint, I think it might be useful to check if the generated Java code of Spark could be compiled by the JDK compiler (JDK8 code compatibility checks). But, since the compilation of the JDK compiler is too slow (see the performance values in the google spread sheet above), IMO it is impractical to check this in Jenkins.... (I found it took 7~8 hours to run the tests of the spark/sql only in aws instances). WDYT?

@gatorsmile
Copy link
Member

@maropu Except the TPC-DS queries, are we able to find some workloads that could perform faster using the bytecode generated by the JDK compiler? Or, does that mean Janino compiler is always better than JDK compiler? (that does not sound true to me)

@gatorsmile
Copy link
Member

Also cc @rednaxelafx

@maropu
Copy link
Member

maropu commented Jul 19, 2018

Except the TPC-DS queries, are we able to find some workloads that could perform faster using the bytecode generated by the JDK compiler?

Since I don't have real workloads or non-TPCDS queries, it is difficult for me to find the workload.... can you? @kiszk

Or, does that mean Janino compiler is always better than JDK compiler? (that does not sound true to me)

No, I meant averaged performance.

I'm not a Java/JDK expert, so I don't completely understand how the two compilers work. IIUC, these compilers might apply simple optimization a little though, they simply convert Java code into bytecode? The hotspot has a responsibility for optimization? Anyway, we'd be better to wait for other developers.

@maropu
Copy link
Member

maropu commented Jul 19, 2018

btw, it seems this pr exceeds the current timeout... Any way to temporarily make the timeout longer? We always need to configure timeout in the Jenkins-side like #20222 (comment)?

@kiszk
Copy link
Member Author

kiszk commented Jul 20, 2018

It is a good question. Let me think and investigate for a while.
I appreciate comment from @rednaxelafx

@gatorsmile
Copy link
Member

@maropu Based on my understanding, the bytecodes generated by these two compilers are different. That is why the performance should be different. Previously, I expected the bytecodes generated by JDK compiler can be better optimized by JIT, compared with the one generated by Janino. Maybe our JVM internal experts @kiszk and @rednaxelafx can give more guidance.

@SparkQA
Copy link

SparkQA commented Oct 19, 2018

Test build #97552 has finished for PR 21777 at commit a7ebdfd.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 6, 2018

@kiszk Can you close this?

@kiszk kiszk closed this Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants