Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/pkg/R/column.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ operators <- list(
"&" = "and", "|" = "or", #, "!" = "unary_$bang"
"^" = "pow"
)
column_functions1 <- c("asc", "desc", "isNull", "isNotNull")
column_functions1 <- c("asc", "desc", "isNaN", "isNull", "isNotNull")
column_functions2 <- c("like", "rlike", "startsWith", "endsWith", "getField", "getItem", "contains")

createOperator <- function(op) {
Expand Down
26 changes: 19 additions & 7 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -488,19 +488,31 @@ setMethod("initcap",
column(jc)
})

#' isNaN
#' is.nan
#'
#' Return true iff the column is NaN.
#' Return true if the column is NaN, alias for \link{isnan}
#'
#' @rdname isNaN
#' @name isNaN
#' @rdname is.nan
#' @name is.nan
#' @family normal_funcs
#' @export
#' @examples \dontrun{isNaN(df$c)}
setMethod("isNaN",
#' @examples
#' \dontrun{
#' is.nan(df$c)
#' isnan(df$c)
#' }
setMethod("is.nan",
signature(x = "Column"),
function(x) {
isnan(x)
})

#' @rdname is.nan
#' @name isnan
setMethod("isnan",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "isNaN", x@jc)
jc <- callJStatic("org.apache.spark.sql.functions", "isnan", x@jc)
column(jc)
})

Expand Down
8 changes: 6 additions & 2 deletions R/pkg/R/generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,10 @@ setGeneric("getField", function(x, ...) { standardGeneric("getField") })
#' @export
setGeneric("getItem", function(x, ...) { standardGeneric("getItem") })

#' @rdname column
#' @export
setGeneric("isNaN", function(x) { standardGeneric("isNaN") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isNaN is being removed, we shouldn't set this generic right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, isNaN is around since Spark 1.5 - if we are taking this out we would need to note this breaking change in release doc with a JIRA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, rereading the description. it sounds like we have
isNaN for Column
isnan for DataFrame
?
That seems a bit confusing..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. I have send #10056 to push SQL side make change to uniform interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't uniform the isNaN and isnan interface at SQL side, please refer the comments at #10056. Further more, there are different semantics between Scala and R about isnull. So let's narrow this PR to provide the same functions as Scala at SparkR side, this is my original motivation. I will add test cases for column as @sun-rui suggested. Then I think this PR can be merged firstly and we start to discuss corresponding alias or explanations about the NULL/NA difference at SPARK-12071. @sun-rui @felixcheung @shivaram

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the resolution here? isNull is still exported after this PR ? How is @felixcheung 's example going to change after this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isNull and isNotNull are still exported after this PR because they have been exist at Spark 1.5. isNull and isNotNull (with upper case) are functions of Column, I consider they are Spark specific functions so did not remove them. If we want to remove them, I can do it in a follow-up PR and they are breaking changes need to be explain in release note. @shivaram

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I get it. Let me summarize the situation below and let know if I am getting it right.

  1. We have isNaN, isNull, isNotNull for Column as defined in column.R. These mirror the scala functions.
  2. We have added isnan and is.nan for Column in this PR. These call isnan in Scala. And I presume their behavior is this the same as isNaN ?
  3. In addition to this, we have some DataFrame operators called isNaN ? I can't find that call in our unit test file, so I guess it doesn't exist in SparkR ? Does this exist in Scala ?
  4. We convert NA in R to null in the SparkSQL side.

I think the change looks fine to me, but I just want to understand the different things going on here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's correct.We could scope this PR to only isnan on Column. And track others with JIRAs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram, to be clear, IsNaN, isnan and is.nan are functions (in org.apache.spark.sql.functions) applied to Column, they are not DataFrame operators. IsNaN function is deprecated by isnan function. We add is.nan as an alias of isnan.


#' @rdname column
#' @export
setGeneric("isNull", function(x) { standardGeneric("isNull") })
Expand Down Expand Up @@ -796,9 +800,9 @@ setGeneric("initcap", function(x) { standardGeneric("initcap") })
#' @export
setGeneric("instr", function(y, x) { standardGeneric("instr") })

#' @rdname isNaN
#' @rdname is.nan
#' @export
setGeneric("isNaN", function(x) { standardGeneric("isNaN") })
setGeneric("isnan", function(x) { standardGeneric("isnan") })

#' @rdname kurtosis
#' @export
Expand Down
6 changes: 5 additions & 1 deletion R/pkg/inst/tests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -878,7 +878,7 @@ test_that("column functions", {
c2 <- avg(c) + base64(c) + bin(c) + bitwiseNOT(c) + cbrt(c) + ceil(c) + cos(c)
c3 <- cosh(c) + count(c) + crc32(c) + exp(c)
c4 <- explode(c) + expm1(c) + factorial(c) + first(c) + floor(c) + hex(c)
c5 <- hour(c) + initcap(c) + isNaN(c) + last(c) + last_day(c) + length(c)
c5 <- hour(c) + initcap(c) + last(c) + last_day(c) + length(c)
c6 <- log(c) + (c) + log1p(c) + log2(c) + lower(c) + ltrim(c) + max(c) + md5(c)
c7 <- mean(c) + min(c) + month(c) + negate(c) + quarter(c)
c8 <- reverse(c) + rint(c) + round(c) + rtrim(c) + sha1(c)
Expand All @@ -889,6 +889,10 @@ test_that("column functions", {
c13 <- lead("col", 1) + lead(c, 1) + lag("col", 1) + lag(c, 1)
c14 <- cume_dist() + ntile(1)
c15 <- dense_rank() + percent_rank() + rank() + row_number()
c16 <- is.nan(c) + isnan(c) + isNaN(c)

# Test if base::is.nan() is exposed
expect_equal(is.nan(c("a", "b")), c(FALSE, FALSE))

# Test if base::rank() is exposed
expect_equal(class(rank())[[1]], "Column")
Expand Down