Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jun 9, 2018

What changes were proposed in this pull request?

The code block interpolator currently accepts strings. Based on previous discussion, we should forbid string interpolation to prevent losing references to expressions (ExprValue) in code blocks.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Jun 9, 2018

Test build #91619 has finished for PR 21520 at commit a85ffe6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2018

Test build #91621 has finished for PR 21520 at commit 349d63e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 9, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 10, 2018

Test build #91624 has finished for PR 21520 at commit 349d63e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-24505][SQL][WIP] Forbidding string interpolation in CodeBlock [SPARK-24505][SQL] Forbidding string interpolation in CodeBlock Jun 10, 2018
@viirya
Copy link
Member Author

viirya commented Jun 10, 2018

cc @cloud-fan @kiszk @hvanhovell

@viirya
Copy link
Member Author

viirya commented Jun 10, 2018

Sorry this change is quite large, but it can't split into smaller pieces because it must be changed as a whole to pass compilation.

@SparkQA
Copy link

SparkQA commented Jun 10, 2018

Test build #91631 has finished for PR 21520 at commit 2a85388.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Jun 10, 2018

Thank you for a lot of works to update many places. It is very hard to split it into several pieces.

Now, we are seeing several typical patterns in the all of changes, in paticular by wrapping the original code. Would it be possible to make changes simpler by introducing several APIs?

  1. We are seeing many inline prefix with a few typical patterns.
inline"${classOf[...].getName}"
inline"${CodeGenerator.javaType(...)}"
inline"${CodeGenerator.boxedType(...)}"

Can we introduce new APIs to avoid repetations of adding inline, for example JavaCode.className(Class[_]): JavaCode for the first call.

  1. We are seeing many JavaCode.global() or JavaCode.variable() when we create a new variable.
    Would it be possible to make them simpler?

For example, we may introduce these APIs.

ctx.addMutableState(..., Class[_])
ctx.freshName(..., DataType)
ctx.freshNameIsNull(...)

The first one calls JavaCode.global() in the method. The second one calls JavaCode.variable().

WDYT?

@mgaido91
Copy link
Contributor

mgaido91 commented Jun 10, 2018

yes, this is a great work @viirya ! It would be great if we can split it into smaller updates. What do you think about starting doing the needed changes in smaller PRs which focus only on specific part and forbidding the string interpolation after those have made the needed changes smaller?

I also like and agree with @kiszk about introducing new APIs. In particular, in the last days I checked the part related to the usage of freshName . IMO, we can introduce a new method addLocalVariable (and maybe renaming addMutableState to addGlobalVariable) and make freshName private (or if it not feasible as we are using it also for methods' names we can restrict its usage only to that particular case).

Another change I was thinking about order to avoid code duplication, IMHO, is to add a common class to Global and Local variable with a declare(initialValue: Option[ExprValue] = None). But probably this is quite unrelated to this PR, even though might help avoiding some code duplication.

What do you think?

@viirya
Copy link
Member Author

viirya commented Jun 11, 2018

@kiszk @mgaido91 Thanks for your comment!

What do you think about starting doing the needed changes in smaller PRs which focus only on specific part and forbidding the string interpolation after those have made the needed changes smaller?

By disallowing string interpolation in code blocks, any strings passed into a code block won't pass the compilation. It is also more guaranteed that we don't miss any strings. It is why this change is quite big and not in many smaller pieces. Most important is, I need to have all the changes together to see if we can pass all the tests once we completely forbid string interpolation.

But it doesn't mean we need to review and merge this big change. It is still possible to break this into smaller PRs. It may work like this:

  1. Split a part of change into a smaller PR, review it and finally merge it.
  2. Incorporate the merged change back to this PR. Make test passed. Go back to step 1.

WDYT?

@viirya
Copy link
Member Author

viirya commented Jun 11, 2018

  1. We are seeing many inline prefix with a few typical patterns.
    Can we introduce new APIs to avoid repetations of adding inline, for example JavaCode.className(Class[_]): JavaCode for the first call.

@kiszk I initially took a similar approach but found soon that I'd create too many APIs. I'm not pretty sure if that is what we want to have distinguish them in API level because they are all actually a simple piece of inline string in code, so I turned to a inline to treat them as same.

  1. We are seeing many JavaCode.global() or JavaCode.variable() when we create a new variable.
    Would it be possible to make them simpler?

Yes, I noticed that too. I was planning to change existing API such as ctx.freshName. But I leave it as it and set the first goal to pass all tests after forbidding string interpolation. Since the tests are passed now, I think we can incrementally make the changes more simpler and clear. I've proposed to do this part in some smaller PRs (ref: #21520 (comment)). WDYT?

@kiszk
Copy link
Member

kiszk commented Jun 11, 2018

For 1, I agree with you that it is not good to introduce many APIs at first. On the other hand, it would be good to prepare only a few APIs that are frequently used, not to prepare many APIs. It make code simpler and cleaner.
I think that inline"${classOf[...].getName}" and inline"${CodeGenerator.javaType(...)}" are frequently used. inline"${CodeGenerator.boxedType(...)}" may be prepared for consistency with inline"${CodeGenerator.javaType(...)}".

For 2, new APIs in ctx can co-exist with old APIs. Thus, to introduce new APIs can co-exist with your new proposal.

WDYT?

@viirya
Copy link
Member Author

viirya commented Jun 11, 2018

@kiszk Seems good to me. I will go to create and use those APIs in pieces of PRs.

@viirya
Copy link
Member Author

viirya commented Jun 12, 2018

As I will incrementally split this into smaller PRs, I will first close this.

@viirya viirya closed this Jun 12, 2018
@HyukjinKwon
Copy link
Member

@viirya ~ I was just trying to read the PRs. Would you please mind if I ask where is the "Based on previous discussion" ?

@viirya
Copy link
Member Author

viirya commented Aug 8, 2018

@HyukjinKwon Thanks for looking into this. It is based on the comment and discussion here #21193 (comment).

asfgit pushed a commit that referenced this pull request Aug 15, 2018
…ndAttribute

## What changes were proposed in this pull request?

This is split from #21520. This includes changes of `BoundAttribute` and `Cast`.
This patch also adds few convenient APIs:

```scala
CodeGenerator.freshVariable(name: String, dt: DataType): VariableValue
CodeGenerator.freshVariable(name: String, javaClass: Class[_]): VariableValue

JavaCode.javaType(javaClass: Class[_]): Inline
JavaCode.javaType(dataType: DataType): Inline
JavaCode.boxedType(dataType: DataType): Inline
```

## How was this patch tested?

Existing tests.

Closes #21537 from viirya/SPARK-24505-1.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
@viirya viirya deleted the SPARK-24505 branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants