[SPARK-20438][R] SparkR wrappers for split and repeat #17729

zero323 · 2017-04-22T22:23:12Z

What changes were proposed in this pull request?

Add wrappers for o.a.s.sql.functions:

split as split_string
repeat as repeat_string

How was this patch tested?

Existing tests, additional unit tests, check-cran.sh

SparkQA · 2017-04-22T23:00:58Z

Test build #76071 has finished for PR 17729 at commit 255863a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2017-04-22T23:04:40Z

cc @felixcheung

felixcheung

Cool, thanks!

felixcheung · 2017-04-23T00:25:47Z

R/pkg/NAMESPACE

              "rank",
              "regexp_extract",
              "regexp_replace",
+              "repeat_string",


good call on these names!

felixcheung · 2017-04-23T00:34:39Z

R/pkg/R/functions.R

+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function


Note is somewhat hard to discover on the generated doc page, if you want this, you could put it as 2nd content paragraph like below and it will show up as the details section like here http://spark.apache.org/docs/latest/api/R/read.jdbc.html

#' split_string #' #' Splits string on regular expression. #' #' This is equivalent to \code{split} SQL function

(yes, through the magic of roxygen2)

Also, instead of \code{split} you might want to link to Spark Scala doc too

Thats cool :) I am not convince about the linking though. Scala docs are not very useful.

I considered adding expr or selectExpr version to examples:

selectExpr(df, "split(value, '@')")

I think that's good to have but want to caution that we might forget to update it if it changes

felixcheung · 2017-04-23T00:35:52Z

R/pkg/R/functions.R

+#' head(select(repeat_string(df$text, 3)))
+#' }
+#' @note repeat_string 2.3.0
+#' @note equivalent to \code{repeat} SQL function


ditto above

felixcheung · 2017-04-23T00:37:15Z

R/pkg/R/functions.R

+#' @examples \dontrun{
+#' df <- createDataFame(data.frame(
+#'   text = c("foo", "bar")
+#' ))


I'm ok with this though would it be better with the read.text example than a fake 1 row like this?

I thought about this but it is hard to find a good source at hand. We could use data/streaming/AFINN-111.txt which has nice and short lines, or README.md and just take head(., 1) (the rest is empty or longish.

felixcheung · 2017-04-23T00:38:24Z

R/pkg/inst/tests/testthat/test_sparkSQL.R

+    "abcabcabc"
+  )
+  expect_equal(
+    collect(select(df5, repeat_string(df5$a, -1)))[1, 1],


:) ahh, -1 works?!

Right? I think we should keep it this way to avoid any confusion when users switch between SQL and DSL. If anything changes it will cause test failure and then we can add R side checks.

felixcheung · 2017-04-23T00:39:28Z

R/pkg/R/functions.R

+setMethod("repeat_string",
+          signature(x = "Column", n = "numeric"),
+          function(x, n) {
+            jc <- callJStatic("org.apache.spark.sql.functions", "repeat", x@jc, as.integer(n))


this is good actually, may I introduce you to numToInt, an internal util

That's useful.

SparkQA · 2017-04-24T15:01:32Z

Test build #76109 has finished for PR 17729 at commit ce0c4b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

LGTM

felixcheung · 2017-04-24T17:57:12Z

merged to master

zero323 · 2017-04-24T18:18:17Z

Thanks @felixcheung

## What changes were proposed in this pull request? Add wrappers for `o.a.s.sql.functions`: - `split` as `split_string` - `repeat` as `repeat_string` ## How was this patch tested? Existing tests, additional unit tests, `check-cran.sh` Author: zero323 <[email protected]> Closes apache#17729 from zero323/SPARK-20438.

Add split_string and repeat_string

255863a

felixcheung reviewed Apr 23, 2017

View reviewed changes

zero323 added 4 commits April 24, 2017 03:22

Replace as.integer with numToInt

d6dd987

Fix split_string example

afa8642

Move SQL notes to details

97c6a81

Add selectExpr equivalents

ce0c4b6

felixcheung approved these changes Apr 24, 2017

View reviewed changes

asfgit closed this in 8a272dd Apr 24, 2017

zero323 deleted the SPARK-20438 branch April 26, 2017 13:17

[SPARK-20438][R] SparkR wrappers for split and repeat #17729

[SPARK-20438][R] SparkR wrappers for split and repeat #17729

Uh oh!

Conversation

zero323 commented Apr 22, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

zero323 commented Apr 22, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Apr 24, 2017

Uh oh!

zero323 commented Apr 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants