Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jun 23, 2018

What changes were proposed in this pull request?

The Blocks class in JavaCode class hierarchy is not necessary. Its function can be taken by CodeBlock. We should remove it to make simpler class hierarchy.

How was this patch tested?

Existing tests.

@viirya
Copy link
Member Author

viirya commented Jun 23, 2018

cc @cloud-fan @kiszk @mgaido91

@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92251 has finished for PR 21619 at commit b035bbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def + (other: Block): Block
def + (other: Block): Block = other match {
case EmptyBlock => this
case _ => code"$this\n$other"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need \n here? It may be a single space as well in many cases or even nothing, IIUC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concatenation of two blocks needs a newline between them like following:

[block1]
[block2]

In embedding case, like code"$block1 ... $block2", no extra newlines are added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for your explanation

// Concatenates this block with other block.
def + (other: Block): Block
def + (other: Block): Block = other match {
case EmptyBlock => this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this is an EmptyBlock? shall we a case also for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty block + other empty block is an empty?

Copy link
Contributor

@mgaido91 mgaido91 Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, probably the wrong line for the comment. I mean the case: EmptyBlock + non-empty block. Shall we add a check and return other in that case? Or we can avoid to remove the overriden + in EmptyBlock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a idea to early expand EmptyBlock before. Now I commit it into. Then both concatenation and embedding cases, EmptyBlock won't be kept in arguments to code block.

@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92259 has finished for PR 21619 at commit 9e14397.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// Concatenates this block with other block.
def + (other: Block): Block
def + (other: Block): Block = other match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question about +.

Previously we generate a giant string for an expression tree, which is hard to tune. To keep more information in the generated code, we introduce this JavaCode/CodeBlock framework to keep a tree of strings instead of a giant string.

For an expression a op b, we should generate a tree of strings for a and b, then op creates a new tree node and keeps a and b as children. That means, if we refer to a CodeBlock inside a code"...", the CodeBlock should become a child of the new CodeBlock. However, + usually happens within the same operator, I'm not sure if we should create a new level of tree node here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tree structure itself is an important information in the generated code, we should think carefully about what a tree node means. For example, when we want to split the code into methods, how shall we deal with the tree node? Shall we split an entire tree node into one method? What's our assumption to the java code inside one tree node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal usage is that we all put a semantically integral java code into a CodeBlock. If the CodeBlock is produced by an expression in codegen, in order to split it, we should split the entire tree of the CodeBlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good answer, let's make sure that when we call +, the 2 blocks are 2 individual semantically integral java code.

@viirya
Copy link
Member Author

viirya commented Jul 4, 2018

@cloud-fan Any more thing I should do for this to make this in? Thanks.

* method splitting.
*/
case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[JavaCode]) extends Block {
override lazy val exprValues: Set[ExprValue] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we should think about it in the future. If we treat CodeBlock as a tree of generated code, then this method doesn't make a lot of sense: it collects all references from its children and put them into a set, which means every time we transform a CodeBlock and create a new copy, we need to build this set again.

It's unclear how exprValues would be used, but I'd image we can provide a contains method which recursively check the children.

Copy link
Member Author

@viirya viirya Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is a lazy one, so we may only build it when we use it. It was invented originally for manipulating expressions in a code block.

But I realized that we may not actually need exprValues if we treat CodeBlock as tree. In the PR #21405, the manipulating API doesn't use exprValues when transforming a CodeBlock.

Thus I agree with you that we can get rid of exprValues in the PR. Then we may have a method to return ExprValue contained in a Block.

@cloud-fan
Copy link
Contributor

also cc @rednaxelafx

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 1a2655a Jul 4, 2018
@viirya viirya deleted the SPARK-24635 branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants