-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10903] [SPARKR] R - Simplify SQLContext method signatures and use a singleton #9192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @davies |
|
@felixcheung This looks great overall, could you also update all the examples? |
|
Rebased to master. Updated to fix the new @davies Thanks! Appreciated. I'd like to leave R doc update separated if that's ok by you - too many files to change and too many possible conflicts. It would be easier as a doc only PR. |
|
Test build #44047 has finished for PR 9192 at commit
|
|
Test build #44042 has finished for PR 9192 at commit
|
R/pkg/R/SQLContext.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wording: Temporary -> Dispatching ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reroute" was the term corresponding to "dispatch"
"temporary" was referring to the fact that we intend this to go away - please see my other answer regarding your question on this.
|
Test build #44080 has finished for PR 9192 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print.jobj can be updated to use this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you address this comment?
|
Test build #44139 has finished for PR 9192 at commit
|
|
Any more thought on this and #9185 ? It looks like we are cutting the 1.6.0 release very soon, and it will be good if API changes can go into a minor release change |
|
Actually I'd love to have this...! |
|
Would you like API without SQLContext (or SparkSession) parameter or, as what is in this PR, API that can be backward compatible with or without it? |
|
I think maintaining backward compatibility would be useful. Do you have time to bring this up to date this week so we get it in for 2.0? It's pretty late for a change of this size, but given this is a huge usability improvement and R is experimental, it might be ok. |
Squashed commits: [2c16ca8] add getClassName from feedback [b0348d7] fix the new as.DataFrame method [d8e91f3] fix test [8b3141a] Change to method dispatch update more tests and add tests for back compat [fd3a835] update tests [fa50f78] Improve route logic [efedce5] Method dispatch to support omission of 'sqlContext' argument [612f7f3] Refractor SQLContext and DataFrame functions to lookup sqlContext instance in the env [9382244] fix test [943889f] Change to method dispatch update more tests and add tests for back compat [c5c41c2] update tests [a8e1ea6] Improve route logic [c779c9d] Method dispatch to support omission of 'sqlContext' argument
|
Sorry I was busy last week and missed this -- but +1 to keeping backwards compatibility. BTW on that note will this also change the entry point in SparkR to be SparkSession (instead of SQL/SparkContext that is) ? |
|
SparkSession definitely makes more sense, given DataFrame is the main API ... |
|
Test build #59164 has finished for PR 9192 at commit
|
|
Took me a while to rebase, and caught up with new changes. |
|
Test build #59269 has finished for PR 9192 at commit
|
|
Test build #59272 has finished for PR 9192 at commit
|
|
|
||
| #' Temporary function to reroute old S3 Method call to new | ||
| #' We need to check the class of x to ensure it is SQLContext before dispatching | ||
| dispatchFunc <- function(newFuncSig, x, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this to utils.R. Also some function level comments on what the arguments mean would be useful (for example numFuncSig is only used to print the deprecation warning from what i see)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very specific to this file - this helper is implemented to specifically check & remove sqlContext parameter; I'll add more documentation on this.
|
Thanks @felixcheung for the update. I left some minor comments inline. It seems unfortunate that we need to do some amount of code duplication to get this to work (i.e. define |
| #' new_df <- tableToDF(sqlContext, "table") | ||
| #' new_df <- tableToDF("table") | ||
| #' } | ||
| #' @note since 2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we consistently added "since" for all new API methods in 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet, I don't want to keep checking which method is added in 2.0, so I'm adding it in this file for now, that would be the next PR.
|
LGTM except some minor comments. |
|
Test build #59331 has finished for PR 9192 at commit
|
|
@shivaram it's true there's some scaffolding we need to add (though I'm pretty we could codegen them on the fly instead). I think the idea is this is temporary and in the next release (2.1.0?) we could remove these very easily (just a few lines before and after the method, plus renaming |
|
Test build #59335 has finished for PR 9192 at commit
|
|
Test build #59337 has finished for PR 9192 at commit
|
|
Thanks for the update. LGTM. Will merge after Jenkins passes. |
|
Test build #59341 has finished for PR 9192 at commit
|
|
Merging this to master and branch-2.0 |
…se a singleton Eliminate the need to pass sqlContext to method since it is a singleton - and we don't want to support multiple contexts in a R session. Changes are done in a back compat way with deprecation warning added. Method signature for S3 methods are added in a concise, clean approach such that in the next release the deprecated signature can be taken out easily/cleanly (just delete a few lines per method). Custom method dispatch is implemented to allow for multiple JVM reference types that are all 'jobj' in R and to avoid having to add 30 new exports. Author: felixcheung <[email protected]> Closes #9192 from felixcheung/rsqlcontext. (cherry picked from commit c76457c) Signed-off-by: Shivaram Venkataraman <[email protected]>
### What changes were proposed in this pull request? Add back the deprecated R APIs removed by #22843 and #22815. These APIs are - `sparkR.init` - `sparkRSQL.init` - `sparkRHive.init` - `registerTempTable` - `createExternalTable` - `dropTempTable` No need to port the function such as ```r createExternalTable <- function(x, ...) { dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...) } ``` because this was for the backward compatibility when SQLContext exists before assuming from #9192, but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635. ### Why are the changes needed? Amend Spark's Semantic Versioning Policy ### Does this PR introduce any user-facing change? Yes The removed R APIs are put back. ### How was this patch tested? Add back the removed tests Closes #28058 from huaxingao/r. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
### What changes were proposed in this pull request? Add back the deprecated R APIs removed by #22843 and #22815. These APIs are - `sparkR.init` - `sparkRSQL.init` - `sparkRHive.init` - `registerTempTable` - `createExternalTable` - `dropTempTable` No need to port the function such as ```r createExternalTable <- function(x, ...) { dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...) } ``` because this was for the backward compatibility when SQLContext exists before assuming from #9192, but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635. ### Why are the changes needed? Amend Spark's Semantic Versioning Policy ### Does this PR introduce any user-facing change? Yes The removed R APIs are put back. ### How was this patch tested? Add back the removed tests Closes #28058 from huaxingao/r. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit fd0b228) Signed-off-by: HyukjinKwon <[email protected]>
### What changes were proposed in this pull request? Add back the deprecated R APIs removed by apache#22843 and apache#22815. These APIs are - `sparkR.init` - `sparkRSQL.init` - `sparkRHive.init` - `registerTempTable` - `createExternalTable` - `dropTempTable` No need to port the function such as ```r createExternalTable <- function(x, ...) { dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...) } ``` because this was for the backward compatibility when SQLContext exists before assuming from apache#9192, but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at apache#13635. ### Why are the changes needed? Amend Spark's Semantic Versioning Policy ### Does this PR introduce any user-facing change? Yes The removed R APIs are put back. ### How was this patch tested? Add back the removed tests Closes apache#28058 from huaxingao/r. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
Eliminate the need to pass sqlContext to method since it is a singleton - and we don't want to support multiple contexts in a R session.
Changes are done in a back compat way with deprecation warning added. Method signature for S3 methods are added in a concise, clean approach such that in the next release the deprecated signature can be taken out easily/cleanly (just delete a few lines per method).
Custom method dispatch is implemented to allow for multiple JVM reference types that are all 'jobj' in R and to avoid having to add 30 new exports.