-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11210][SPARKR] Add window functions into SparkR [step 2]. #9196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2038,6 +2038,28 @@ setMethod("cumeDist", | |
| column(jc) | ||
| }) | ||
|
|
||
| #' denseRank | ||
| #' | ||
| #' Window function: returns the rank of rows within a window partition, without any gaps. | ||
| #' The difference between rank and denseRank is that denseRank leaves no gaps in ranking | ||
| #' sequence when there are ties. That is, if you were ranking a competition using denseRank | ||
| #' and had three people tie for second place, you would say that all three were in second | ||
| #' place and that the next person came in third. | ||
| #' | ||
| #' This is equivalent to the DENSE_RANK function in SQL. | ||
| #' | ||
| #' @rdname denseRank | ||
| #' @name denseRank | ||
| #' @family window_funcs | ||
| #' @export | ||
| #' @examples \dontrun{denseRank()} | ||
| setMethod("denseRank", | ||
| signature(x = "missing"), | ||
| function() { | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "denseRank") | ||
| column(jc) | ||
| }) | ||
|
|
||
| #' lag | ||
| #' | ||
| #' Window function: returns the value that is `offset` rows before the current row, and | ||
|
|
@@ -2111,3 +2133,73 @@ setMethod("ntile", | |
| jc <- callJStatic("org.apache.spark.sql.functions", "ntile", as.integer(x)) | ||
| column(jc) | ||
| }) | ||
|
|
||
| #' percentRank | ||
| #' | ||
| #' Window function: returns the relative rank (i.e. percentile) of rows within a window partition. | ||
| #' | ||
| #' This is computed by: | ||
| #' | ||
| #' (rank of row in its partition - 1) / (number of rows in the partition - 1) | ||
| #' | ||
| #' This is equivalent to the PERCENT_RANK function in SQL. | ||
| #' | ||
| #' @rdname percentRank | ||
| #' @name percentRank | ||
| #' @family window_funcs | ||
| #' @export | ||
| #' @examples \dontrun{percentRank()} | ||
| setMethod("percentRank", | ||
| signature(x = "missing"), | ||
| function() { | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "percentRank") | ||
| column(jc) | ||
| }) | ||
|
|
||
| #' rank | ||
| #' | ||
| #' Window function: returns the rank of rows within a window partition. | ||
| #' | ||
| #' The difference between rank and denseRank is that denseRank leaves no gaps in ranking | ||
| #' sequence when there are ties. That is, if you were ranking a competition using denseRank | ||
| #' and had three people tie for second place, you would say that all three were in second | ||
| #' place and that the next person came in third. | ||
| #' | ||
| #' This is equivalent to the RANK function in SQL. | ||
| #' | ||
| #' @rdname rank | ||
| #' @name rank | ||
| #' @family window_funcs | ||
| #' @export | ||
| #' @examples \dontrun{rank()} | ||
| setMethod("rank", | ||
| signature(x = "missing"), | ||
| function() { | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "rank") | ||
| column(jc) | ||
| }) | ||
|
|
||
| # Expose rank() in the R base package | ||
| setMethod("rank", | ||
| signature(x = "ANY"), | ||
| function(x, ...) { | ||
| base::rank(x, ...) | ||
| }) | ||
|
|
||
| #' rowNumber | ||
| #' | ||
| #' Window function: returns a sequential number starting at 1 within a window partition. | ||
| #' | ||
| #' This is equivalent to the ROW_NUMBER function in SQL. | ||
| #' | ||
| #' @rdname rowNumber | ||
| #' @name rowNumber | ||
| #' @family window_funcs | ||
| #' @export | ||
| #' @examples \dontrun{rowNumber()} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also could we have some examples for these functions ? It could be a pretty simple one
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these functions should work with over clause and window specification, which have not been enabled in SparkR. I will submit a JIRA issue for it and we can add example in PR for that JIRA.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| setMethod("rowNumber", | ||
| signature(x = "missing"), | ||
| function() { | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "rowNumber") | ||
| column(jc) | ||
| }) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rankseems like a more common r function. Are there any alternate ideas for names here ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since base::rank() has a different signature with this rank(), it is possible to expose both of them under the same name rank().