-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14593][SQL] Make currentVars work with splitExpressions to enable whole stage codegen for large input columns #12351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…egen for large input columns.
|
Test build #55695 has finished for PR 12351 at commit
|
|
Test build #55698 has finished for PR 12351 at commit
|
|
cc @cloud-fan @davies Please take a look. Thanks. |
|
A high level question: given a lot of expression code, we will group them into several blocks, and each block will be put into a method. For the method, previously we only use a single |
|
@cloud-fan I noticed that problem during testing this pr. For each block, I only pass the parameters needed by the expressions in the block. That said, I also group the parameters into the parts for blocks (methods). Although by doing this, the parameter number for each method can still reach java limit. So I set a limit for splitting expressions with |
|
How do you group the parameters? For the worst case, every expression may reference to all parameters. |
|
For the current usage of |
|
I don't think it's the semantic of |
|
maybe. actually |
|
Current usage of As during whole stage pipeline, we mostly process |
|
cc @davies have time to look at this? |
|
Since we can easily fallback, I'd like not to make it even complicated to support all the corner. Each column require two arguments, java method only support 255 arguments, to make this really works (not hit this limit), it need to be more complicated than current shape. Without a very good reason, I'd not to do this for now. |
|
ok. then let me close this now. |
What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-14593
We now disable whole stage codegen if the input column number are too large. If we can make
CodeGenContext.currentVarswork withCodeGenContext.splitExpressions. We can enable whole stage codegen for large input columns.How was this patch tested?
Existing tests.