-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22500][SQL][followup] cast for struct can split code even with whole stage codegen #19891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #84467 has finished for PR 19891 at commit
|
|
@cloud-fan sorry but I cannot express a feedback on this, because I don't understand the reason/logic behind your change, I am missing some knowledge. But I'd be very happy if you can explain me. |
| // three function arguments are: child.primitive, result.primitive and result.isNull | ||
| // it returns the code snippets to be put in null safe evaluation region | ||
| // The function arguments are: `input`, `result` and `resultIsNull`. We don't need `inputIsNull` | ||
| // in parameter list, because the returned code will be put in null safe evaluation region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some renaming to make it more readable.
| boolean $resultIsNull = $inputIsNull; | ||
| ${ctx.javaType(resultType)} $result = ${ctx.defaultValue(resultType)}; | ||
| if (!$inputIsNull) { | ||
| ${cast(input, result, resultIsNull)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some renaming to make it more readable.
| final $rowClass $result = new $rowClass(${fieldsCasts.length}); | ||
| final InternalRow $tmpRow = $c; | ||
| final $rowClass $tmpResult = new $rowClass(${fieldsCasts.length}); | ||
| final InternalRow $tmpInput = $input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmpInput and tmpResult are the only inputs we need for the generated code to cast struct, and we don't depend on ctx.INPUT_ROW and ctx.currentVars here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in another word, the code to cast a struct is always row-based, the input is a variable of type InternalRow. We don't care about ctx.INPUT_ROW and ctx.currentVars here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, now I see! Thanks for the kind explanation.
|
LGTM, if we want to nit, we can also switch to the new multiline string style in the places we are changing. |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| private[this] def castCode(ctx: CodegenContext, childPrim: String, childNull: String, | ||
| resultPrim: String, resultNull: String, resultType: DataType, cast: CastFunction): String = { | ||
| private[this] def castCode(ctx: CodegenContext, input: String, inputIsNull: String, | ||
| result: String, resultIsNull: String, resultType: DataType, cast: CastFunction): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indents.
|
Thanks! Merged to master. |
What changes were proposed in this pull request?
A followup of #19730, we can split the code for casting struct even with whole stage codegen.
This PR also has some renaming to make the code easier to read.
How was this patch tested?
existing test