Skip to content

Conversation

@falaki
Copy link
Contributor

@falaki falaki commented Nov 7, 2019

What changes were proposed in this pull request?

The implementation for walking through the user function AST and picking referenced variables and functions, had an optimization to skip a branch if it had already seen it. This runs into an interesting problem in the following example

df <- createDataFrame(data.frame(x=1))
f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
dapplyCollect(df, function(x) { f1(x); f2(x) })

Results in error:

org.apache.spark.SparkException: R computation failed with
 Error in f1(x) : could not find function "f1"
Calls: compute -> computeFunc -> f2

Why are the changes needed?

Bug fix

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests in test_utils.R

@mengxr mengxr self-requested a review November 7, 2019 23:04
@falaki
Copy link
Contributor Author

falaki commented Nov 7, 2019

@mengxr can we cherry-pick this to 2.4.x?

@SparkQA
Copy link

SparkQA commented Nov 8, 2019

Test build #113409 has finished for PR 26429 at commit 51481fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

found <- sapply(funcList, function(func) {
ifelse(identical(func, obj), TRUE, FALSE)
})
if (sum(found) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. The change would lead to dead loop. We should make sure newEnv contains the (cleaned) node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dead loop case when the same function with the same environments is recursively called in the closure?

@SparkQA
Copy link

SparkQA commented Nov 9, 2019

Test build #113493 has finished for PR 26429 at commit 10925bf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113519 has finished for PR 26429 at commit 10925bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@falaki
Copy link
Contributor Author

falaki commented Nov 11, 2019

@shivaram and @felixcheung do you guys have any input on this?

@falaki
Copy link
Contributor Author

falaki commented Nov 13, 2019

@HyukjinKwon would you please take a look?

@HyukjinKwon
Copy link
Member

retest this please

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. one question and one nit. Wait, I take it back. Let me take another look.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113738 has finished for PR 26429 at commit 10925bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
user_func <- function(x) { f1(x); f2(x) }
c_user_func_env <- environment(cleanClosure(user_func))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think naming c_user_func_env is not preferred (it was discussed here #17590 (comment)) before.
I think it should rather be cUserFuncEnv.

Google guide seems updated rapidly (https://google.github.io/styleguide/Rguide.html).. We will have to update the guide ...

@HyukjinKwon
Copy link
Member

Looks fine except two nits: #26429 (comment) and #26429 (comment)

@falaki
Copy link
Contributor Author

falaki commented Nov 14, 2019

Thanks for reviewing @HyukjinKwon I addressed your comments.

@SparkQA
Copy link

SparkQA commented Nov 15, 2019

Test build #113814 has finished for PR 26429 at commit 6ad35c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 15, 2019

Test build #113827 has finished for PR 26429 at commit 6ad35c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 15, 2019

The test cases should be fixed as of 65a189c

@srowen
Copy link
Member

srowen commented Nov 16, 2019

Just needs a rebase?

@felixcheung
Copy link
Member

felixcheung commented Nov 17, 2019 via email

@falaki
Copy link
Contributor Author

falaki commented Nov 18, 2019

Rebased this. Sorry for the delay.

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114031 has finished for PR 26429 at commit cc9c416.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Let's give a try.
Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants