-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24635][SQL] Remove Blocks class from JavaCode class hierarchy #21619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #92251 has finished for PR 21619 at commit
|
| def + (other: Block): Block | ||
| def + (other: Block): Block = other match { | ||
| case EmptyBlock => this | ||
| case _ => code"$this\n$other" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need \n here? It may be a single space as well in many cases or even nothing, IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concatenation of two blocks needs a newline between them like following:
[block1]
[block2]
In embedding case, like code"$block1 ... $block2", no extra newlines are added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for your explanation
| // Concatenates this block with other block. | ||
| def + (other: Block): Block | ||
| def + (other: Block): Block = other match { | ||
| case EmptyBlock => this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this is an EmptyBlock? shall we a case also for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty block + other empty block is an empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, probably the wrong line for the comment. I mean the case: EmptyBlock + non-empty block. Shall we add a check and return other in that case? Or we can avoid to remove the overriden + in EmptyBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a idea to early expand EmptyBlock before. Now I commit it into. Then both concatenation and embedding cases, EmptyBlock won't be kept in arguments to code block.
|
Test build #92259 has finished for PR 21619 at commit
|
|
|
||
| // Concatenates this block with other block. | ||
| def + (other: Block): Block | ||
| def + (other: Block): Block = other match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question about +.
Previously we generate a giant string for an expression tree, which is hard to tune. To keep more information in the generated code, we introduce this JavaCode/CodeBlock framework to keep a tree of strings instead of a giant string.
For an expression a op b, we should generate a tree of strings for a and b, then op creates a new tree node and keeps a and b as children. That means, if we refer to a CodeBlock inside a code"...", the CodeBlock should become a child of the new CodeBlock. However, + usually happens within the same operator, I'm not sure if we should create a new level of tree node here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tree structure itself is an important information in the generated code, we should think carefully about what a tree node means. For example, when we want to split the code into methods, how shall we deal with the tree node? Shall we split an entire tree node into one method? What's our assumption to the java code inside one tree node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ideal usage is that we all put a semantically integral java code into a CodeBlock. If the CodeBlock is produced by an expression in codegen, in order to split it, we should split the entire tree of the CodeBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good answer, let's make sure that when we call +, the 2 blocks are 2 individual semantically integral java code.
|
@cloud-fan Any more thing I should do for this to make this in? Thanks. |
| * method splitting. | ||
| */ | ||
| case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[JavaCode]) extends Block { | ||
| override lazy val exprValues: Set[ExprValue] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but we should think about it in the future. If we treat CodeBlock as a tree of generated code, then this method doesn't make a lot of sense: it collects all references from its children and put them into a set, which means every time we transform a CodeBlock and create a new copy, we need to build this set again.
It's unclear how exprValues would be used, but I'd image we can provide a contains method which recursively check the children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is a lazy one, so we may only build it when we use it. It was invented originally for manipulating expressions in a code block.
But I realized that we may not actually need exprValues if we treat CodeBlock as tree. In the PR #21405, the manipulating API doesn't use exprValues when transforming a CodeBlock.
Thus I agree with you that we can get rid of exprValues in the PR. Then we may have a method to return ExprValue contained in a Block.
|
also cc @rednaxelafx |
|
thanks, merging to master! |
What changes were proposed in this pull request?
The
Blocksclass inJavaCodeclass hierarchy is not necessary. Its function can be taken byCodeBlock. We should remove it to make simpler class hierarchy.How was this patch tested?
Existing tests.