[SPARK-17577][Follow-up][SparkR] SparkR spark.addFile supports adding directory recursively #15216

yanboliang · 2016-09-23T14:29:30Z

What changes were proposed in this pull request?

#15140 exposed JavaSparkContext.addFile(path: String, recursive: Boolean) to Python/R, then we can update SparkR spark.addFile to support adding directory recursively.

How was this patch tested?

Added unit test.

HyukjinKwon · 2016-09-23T16:01:25Z

~~I seems the failure is spurious?~~ I manually ran another test (on this PR) :

Second try (manually adding plyr package on this PR) :

Third try (tests on master branch) :

Fourth try (do not unlink whole tempdir but use another dir on this PR) :

I see, it seems installed.packages() seems referring some .rds files in tempdir() so it seems we can't just remove this. I just locally tested. It seems tempdir() returns the same dir per session. We can wait and see the "Fourth try".

Changes I made are as below in test_context.r:

   # Test add directory recursively.
 -  path <- tempdir()
 +  path <- paste0(tempdir(), "/", "recursivedir")
 +  dir.create(path)
    dir_name <- basename(path)

felixcheung · 2016-09-23T17:39:06Z

R/pkg/R/context.R

 #' use spark.getSparkFiles(fileName) to find its download location.
 #'
+#' A directory can be given if the recursive option is set to true.
+#' Currently directories are only supported for Hadoop-supported filesystems.


this might be a bit confusing - do we have links to what this mean?

The annotation here is consistent with Scala/Python, and Hadoop-supported filesystem is the file system which Hadoop supported. I think it's easy to understand for users. Or should we add a link to Hadoop-supported filesystems?

It depends. Recently someone was asking about why SparkR was using Hadoop file system classes to read NFS, local, etc. in the user list - it might not be obvious to users

Make sense, added links to Hadoop-supported filesystem. Thanks!

felixcheung · 2016-09-23T17:40:03Z

R/pkg/R/context.R

 #' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs,
 #' use spark.getSparkFiles(fileName) to find its download location.
 #'
+#' A directory can be given if the recursive option is set to true.


I'd merge this into @param path below?

Or omit this since it's described in @param recursive?

felixcheung · 2016-09-23T17:43:13Z

R/pkg/R/context.R

+#'
 #' @rdname spark.addFile
 #' @param path The path of the file to be added
+#' @param recursive Recursive or not if the path is directory. Default is FALSE.


Shouldn't this say "whether to add files recursively from the path" or similar?
I mean, the directory could have nested multiple-level sub-directories and recursive will add all of them? Doesn't seem like that is called out here.

Agree, updated.

SparkQA · 2016-09-23T18:42:32Z

Test build #65826 has finished for PR 15216 at commit f9a3a66.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-25T09:42:35Z

Test build #65879 has finished for PR 15216 at commit b2f3a59.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-26T05:47:55Z

Test build #65895 has finished for PR 15216 at commit 2b6e2af.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-09-26T18:19:21Z

LGTM

yanboliang · 2016-09-26T23:48:23Z

Merged into master, thanks for review.

yanboliang added 2 commits September 23, 2016 07:00

SparkR spark.addFile supports adding directory recursively.

fa6f078

Fix typos and lint-r issues.

f9a3a66

felixcheung reviewed Sep 23, 2016

View reviewed changes

Fix tempdir() issue in test case & update doc.

b2f3a59

Update doc.

2b6e2af

asfgit closed this in 93c743f Sep 26, 2016

yanboliang deleted the spark-17577-2 branch September 26, 2016 23:50

[SPARK-17577][Follow-up][SparkR] SparkR spark.addFile supports adding directory recursively #15216

[SPARK-17577][Follow-up][SparkR] SparkR spark.addFile supports adding directory recursively #15216

Uh oh!

Conversation

yanboliang commented Sep 23, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Sep 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung Sep 23, 2016

Choose a reason for hiding this comment

Uh oh!

yanboliang Sep 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung Sep 25, 2016

Choose a reason for hiding this comment

Uh oh!

yanboliang Sep 26, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung Sep 23, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung Sep 23, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung Sep 23, 2016

Choose a reason for hiding this comment

Uh oh!

yanboliang Sep 25, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 23, 2016

Uh oh!

SparkQA commented Sep 25, 2016

Uh oh!

SparkQA commented Sep 26, 2016

Uh oh!

felixcheung commented Sep 26, 2016

Uh oh!

yanboliang commented Sep 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Sep 23, 2016 •

edited

Loading

yanboliang Sep 25, 2016 •

edited

Loading