Skip to content

Conversation

@felixcheung
Copy link
Member

Eliminate the need to pass sqlContext to method since it is a singleton - and we don't want to support multiple contexts in a R session.

Changes are done in a back compat way with deprecation warning added. Method signature for S3 methods are added in a concise, clean approach such that in the next release the deprecated signature can be taken out easily/cleanly (just delete a few lines per method).

Custom method dispatch is implemented to allow for multiple JVM reference types that are all 'jobj' in R and to avoid having to add 30 new exports.

@shivaram
Copy link
Contributor

cc @davies

@davies
Copy link
Contributor

davies commented Oct 21, 2015

@felixcheung This looks great overall, could you also update all the examples?

@felixcheung
Copy link
Member Author

Rebased to master. Updated to fix the new as.DataFrame method.

@davies Thanks! Appreciated. I'd like to leave R doc update separated if that's ok by you - too many files to change and too many possible conflicts. It would be easier as a doc only PR.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44047 has finished for PR 9192 at commit b0348d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44042 has finished for PR 9192 at commit d8e91f3.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wording: Temporary -> Dispatching ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"reroute" was the term corresponding to "dispatch"
"temporary" was referring to the fact that we intend this to go away - please see my other answer regarding your question on this.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44080 has finished for PR 9192 at commit 2c16ca8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print.jobj can be updated to use this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you address this comment?

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44139 has finished for PR 9192 at commit 2f9e50d.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

Any more thought on this and #9185 ? It looks like we are cutting the 1.6.0 release very soon, and it will be good if API changes can go into a minor release change

@rxin
Copy link
Contributor

rxin commented May 20, 2016

Actually I'd love to have this...!

@felixcheung
Copy link
Member Author

Would you like API without SQLContext (or SparkSession) parameter or, as what is in this PR, API that can be backward compatible with or without it?

@rxin
Copy link
Contributor

rxin commented May 22, 2016

I think maintaining backward compatibility would be useful. Do you have time to bring this up to date this week so we get it in for 2.0? It's pretty late for a change of this size, but given this is a huge usability improvement and R is experimental, it might be ok.

Squashed commits:
[2c16ca8] add getClassName from feedback
[b0348d7] fix the new as.DataFrame method
[d8e91f3] fix test
[8b3141a] Change to method dispatch
update more tests and add tests for back compat
[fd3a835] update tests
[fa50f78] Improve route logic
[efedce5] Method dispatch to support omission of 'sqlContext' argument
[612f7f3] Refractor SQLContext and DataFrame functions to lookup sqlContext instance in the env
[9382244] fix test
[943889f] Change to method dispatch
update more tests and add tests for back compat
[c5c41c2] update tests
[a8e1ea6] Improve route logic
[c779c9d] Method dispatch to support omission of 'sqlContext' argument
@shivaram
Copy link
Contributor

Sorry I was busy last week and missed this -- but +1 to keeping backwards compatibility. BTW on that note will this also change the entry point in SparkR to be SparkSession (instead of SQL/SparkContext that is) ?

@rxin
Copy link
Contributor

rxin commented May 23, 2016

SparkSession definitely makes more sense, given DataFrame is the main API ...

@SparkQA
Copy link

SparkQA commented May 23, 2016

Test build #59164 has finished for PR 9192 at commit 4884e56.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

Took me a while to rebase, and caught up with new changes.
This is WIP - I will need to check on roxygen2 doc, and look into SparkSession

@felixcheung felixcheung changed the title [SPARK-10903] [SPARKR] R - Simplify SQLContext method signatures and use a singleton [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLContext method signatures and use a singleton May 23, 2016
@SparkQA
Copy link

SparkQA commented May 25, 2016

Test build #59269 has finished for PR 9192 at commit 153d5e7.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 25, 2016

Test build #59272 has finished for PR 9192 at commit f53b148.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung felixcheung changed the title [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLContext method signatures and use a singleton [SPARK-10903] [SPARKR] R - Simplify SQLContext method signatures and use a singleton May 25, 2016

#' Temporary function to reroute old S3 Method call to new
#' We need to check the class of x to ensure it is SQLContext before dispatching
dispatchFunc <- function(newFuncSig, x, ...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to utils.R. Also some function level comments on what the arguments mean would be useful (for example numFuncSig is only used to print the deprecation warning from what i see)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very specific to this file - this helper is implemented to specifically check & remove sqlContext parameter; I'll add more documentation on this.

@shivaram
Copy link
Contributor

Thanks @felixcheung for the update. I left some minor comments inline.

It seems unfortunate that we need to do some amount of code duplication to get this to work (i.e. define read.df and read.df.default etc.) But I think thats fine for two reasons (a) this is an internal code issue and we can continue to clean it up (b) i dont think we are adding a lot of methods there -- in fact we should remove some of the unused ones.

#' new_df <- tableToDF(sqlContext, "table")
#' new_df <- tableToDF("table")
#' }
#' @note since 2.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

@sun-rui sun-rui May 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we consistently added "since" for all new API methods in 2.0?

Copy link
Member Author

@felixcheung felixcheung May 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, I don't want to keep checking which method is added in 2.0, so I'm adding it in this file for now, that would be the next PR.

@sun-rui
Copy link
Contributor

sun-rui commented May 26, 2016

LGTM except some minor comments.

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59331 has finished for PR 9192 at commit 98e7ab9.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

felixcheung commented May 26, 2016

@shivaram it's true there's some scaffolding we need to add (though I'm pretty we could codegen them on the fly instead). I think the idea is this is temporary and in the next release (2.1.0?) we could remove these very easily (just a few lines before and after the method, plus renaming x.default back to x) to stop supporting deprecated calls.

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59335 has finished for PR 9192 at commit 640ffca.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59337 has finished for PR 9192 at commit 90641a7.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Thanks for the update. LGTM. Will merge after Jenkins passes.

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59341 has finished for PR 9192 at commit f67095e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Merging this to master and branch-2.0

@asfgit asfgit closed this in c76457c May 26, 2016
asfgit pushed a commit that referenced this pull request May 26, 2016
…se a singleton

Eliminate the need to pass sqlContext to method since it is a singleton - and we don't want to support multiple contexts in a R session.

Changes are done in a back compat way with deprecation warning added. Method signature for S3 methods are added in a concise, clean approach such that in the next release the deprecated signature can be taken out easily/cleanly (just delete a few lines per method).

Custom method dispatch is implemented to allow for multiple JVM reference types that are all 'jobj' in R and to avoid having to add 30 new exports.

Author: felixcheung <[email protected]>

Closes #9192 from felixcheung/rsqlcontext.

(cherry picked from commit c76457c)
Signed-off-by: Shivaram Venkataraman <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Apr 1, 2020
### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by #22843 and #22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists before assuming from #9192,  but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes #28058 from huaxingao/r.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Apr 1, 2020
### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by #22843 and #22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists before assuming from #9192,  but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at #13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes #28058 from huaxingao/r.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit fd0b228)
Signed-off-by: HyukjinKwon <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by apache#22843 and apache#22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists before assuming from apache#9192,  but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at apache#13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes apache#28058 from huaxingao/r.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants