Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ exportMethods("%in%",
"datediff",
"dayofmonth",
"dayofyear",
"denseRank",
"desc",
"endsWith",
"exp",
Expand Down Expand Up @@ -182,16 +183,19 @@ exportMethods("%in%",
"next_day",
"ntile",
"otherwise",
"percentRank",
"pmod",
"quarter",
"rand",
"randn",
"rank",
"regexp_extract",
"regexp_replace",
"reverse",
"rint",
"rlike",
"round",
"rowNumber",
"rpad",
"rtrim",
"second",
Expand Down
92 changes: 92 additions & 0 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -2038,6 +2038,28 @@ setMethod("cumeDist",
column(jc)
})

#' denseRank
#'
#' Window function: returns the rank of rows within a window partition, without any gaps.
#' The difference between rank and denseRank is that denseRank leaves no gaps in ranking
#' sequence when there are ties. That is, if you were ranking a competition using denseRank
#' and had three people tie for second place, you would say that all three were in second
#' place and that the next person came in third.
#'
#' This is equivalent to the DENSE_RANK function in SQL.
#'
#' @rdname denseRank
#' @name denseRank
#' @family window_funcs
#' @export
#' @examples \dontrun{denseRank()}
setMethod("denseRank",
signature(x = "missing"),
function() {
jc <- callJStatic("org.apache.spark.sql.functions", "denseRank")
column(jc)
})

#' lag
#'
#' Window function: returns the value that is `offset` rows before the current row, and
Expand Down Expand Up @@ -2111,3 +2133,73 @@ setMethod("ntile",
jc <- callJStatic("org.apache.spark.sql.functions", "ntile", as.integer(x))
column(jc)
})

#' percentRank
#'
#' Window function: returns the relative rank (i.e. percentile) of rows within a window partition.
#'
#' This is computed by:
#'
#' (rank of row in its partition - 1) / (number of rows in the partition - 1)
#'
#' This is equivalent to the PERCENT_RANK function in SQL.
#'
#' @rdname percentRank
#' @name percentRank
#' @family window_funcs
#' @export
#' @examples \dontrun{percentRank()}
setMethod("percentRank",
signature(x = "missing"),
function() {
jc <- callJStatic("org.apache.spark.sql.functions", "percentRank")
column(jc)
})

#' rank
#'
#' Window function: returns the rank of rows within a window partition.
#'
#' The difference between rank and denseRank is that denseRank leaves no gaps in ranking
#' sequence when there are ties. That is, if you were ranking a competition using denseRank
#' and had three people tie for second place, you would say that all three were in second
#' place and that the next person came in third.
#'
#' This is equivalent to the RANK function in SQL.
#'
#' @rdname rank
#' @name rank
#' @family window_funcs
#' @export
#' @examples \dontrun{rank()}
setMethod("rank",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rank seems like a more common r function. Are there any alternate ideas for names here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since base::rank() has a different signature with this rank(), it is possible to expose both of them under the same name rank().

signature(x = "missing"),
function() {
jc <- callJStatic("org.apache.spark.sql.functions", "rank")
column(jc)
})

# Expose rank() in the R base package
setMethod("rank",
signature(x = "ANY"),
function(x, ...) {
base::rank(x, ...)
})

#' rowNumber
#'
#' Window function: returns a sequential number starting at 1 within a window partition.
#'
#' This is equivalent to the ROW_NUMBER function in SQL.
#'
#' @rdname rowNumber
#' @name rowNumber
#' @family window_funcs
#' @export
#' @examples \dontrun{rowNumber()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could we have some examples for these functions ? It could be a pretty simple one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these functions should work with over clause and window specification, which have not been enabled in SparkR. I will submit a JIRA issue for it and we can add example in PR for that JIRA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setMethod("rowNumber",
signature(x = "missing"),
function() {
jc <- callJStatic("org.apache.spark.sql.functions", "rowNumber")
column(jc)
})
16 changes: 16 additions & 0 deletions R/pkg/R/generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,10 @@ setGeneric("dayofmonth", function(x) { standardGeneric("dayofmonth") })
#' @export
setGeneric("dayofyear", function(x) { standardGeneric("dayofyear") })

#' @rdname denseRank
#' @export
setGeneric("denseRank", function(x) { standardGeneric("denseRank") })

#' @rdname explode
#' @export
setGeneric("explode", function(x) { standardGeneric("explode") })
Expand Down Expand Up @@ -878,6 +882,10 @@ setGeneric("ntile", function(x) { standardGeneric("ntile") })
#' @export
setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") })

#' @rdname percentRank
#' @export
setGeneric("percentRank", function(x) { standardGeneric("percentRank") })

#' @rdname pmod
#' @export
setGeneric("pmod", function(y, x) { standardGeneric("pmod") })
Expand All @@ -894,6 +902,10 @@ setGeneric("rand", function(seed) { standardGeneric("rand") })
#' @export
setGeneric("randn", function(seed) { standardGeneric("randn") })

#' @rdname rank
#' @export
setGeneric("rank", function(x, ...) { standardGeneric("rank") })

#' @rdname regexp_extract
#' @export
setGeneric("regexp_extract", function(x, pattern, idx) { standardGeneric("regexp_extract") })
Expand All @@ -911,6 +923,10 @@ setGeneric("reverse", function(x) { standardGeneric("reverse") })
#' @export
setGeneric("rint", function(x, ...) { standardGeneric("rint") })

#' @rdname rowNumber
#' @export
setGeneric("rowNumber", function(x) { standardGeneric("rowNumber") })

#' @rdname rpad
#' @export
setGeneric("rpad", function(x, len, pad) { standardGeneric("rpad") })
Expand Down
5 changes: 5 additions & 0 deletions R/pkg/inst/tests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,11 @@ test_that("column functions", {
c11 <- to_date(c) + trim(c) + unbase64(c) + unhex(c) + upper(c)
c12 <- lead("col", 1) + lead(c, 1) + lag("col", 1) + lag(c, 1)
c13 <- cumeDist() + ntile(1)
c14 <- denseRank() + percentRank() + rank() + rowNumber()

# Test if base::rank() is exposed
expect_equal(class(rank())[[1]], "Column")
expect_equal(rank(1:3), as.numeric(c(1:3)))

df <- jsonFile(sqlContext, jsonPath)
df2 <- select(df, between(df$age, c(20, 30)), between(df$age, c(10, 20)))
Expand Down