-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support in R #22107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #94760 has finished for PR 22107 at commit
|
|
cc @felixcheung |
|
Test build #94764 has finished for PR 22107 at commit
|
R/pkg/R/DataFrame.R
Outdated
| #' intersectAllDF <- intersectAll(df1, df2) | ||
| #' } | ||
| #' @rdname intersectAll | ||
| #' @note intersectAll since 2.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put 2.4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheungu Ok.
|
Test build #94777 has finished for PR 22107 at commit
|
| list("a", 1), | ||
| list("a", 1), | ||
| list("b", 3), | ||
| list("c", 4)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
list(list("a", 1), list("a", 1), list("a", 1),
list("a", 1), list("b", 3), list("c", 4)),| schema = c("a", "b")) | ||
| df2 <- createDataFrame( | ||
| list(list("a", 1), list("a", 1), list("b", 3)), | ||
| schema = c("a", "b")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
df2 <- createDataFrame(list(list("a", 1), list("a", 1), list("b", 3)), schema = c("a", "b"))| stringsAsFactors = FALSE) | ||
| except_all_expected <- data.frame("a" = c("a", "a", "c"), "b" = c(1, 1, 4), | ||
| stringsAsFactors = FALSE) | ||
| intersect_all_df <- arrange(intersectAll(df1, df2), df1$a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly, the naming rule is intersectAllDf or intersect.all.df (see #17590 (comment))
|
Seems fine. |
|
Test build #94789 has finished for PR 22107 at commit
|
|
retest this please |
|
Test build #94791 has finished for PR 22107 at commit
|
R/pkg/R/DataFrame.R
Outdated
| #' df2 <- read.json(path2) | ||
| #' exceptAllDF <- exceptAll(df1, df2) | ||
| #' } | ||
| #' @rdname exceptAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bug in except there should only be one @rdname for each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Thanks .. Did you want the original function except fixed at part of this ?
R/pkg/R/DataFrame.R
Outdated
| #' df2 <- read.json(path2) | ||
| #' intersectAllDF <- intersectAll(df1, df2) | ||
| #' } | ||
| #' @rdname intersectAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here
| function(x, y) { | ||
| intersected <- callJMethod(x@sdf, "intersectAll", y@sdf) | ||
| dataFrame(intersected) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add extra empty line after code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung OK.
| excepted <- callJMethod(x@sdf, "exceptAll", y@sdf) | ||
| dataFrame(excepted) | ||
| }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove one of the two empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Sure.
|
Test build #94844 has finished for PR 22107 at commit
|
|
@felixcheung I have incorporated the comments. |
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
|
merged to master |
|
Thank you very much @HyukjinKwon @felixcheung |
What changes were proposed in this pull request?
SPARK-21274 added support for EXCEPT ALL and INTERSECT ALL. This PR adds the support in R.
How was this patch tested?
Added test in test_sparkSQL.R