Skip to content

Conversation

@yanboliang
Copy link
Contributor

What changes were proposed in this pull request?

#15140 exposed JavaSparkContext.addFile(path: String, recursive: Boolean) to Python/R, then we can update SparkR spark.addFile to support adding directory recursively.

How was this patch tested?

Added unit test.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Sep 23, 2016

I seems the failure is spurious? I manually ran another test (on this PR) : AppVeyor build status

Second try (manually adding plyr package on this PR) : AppVeyor build status

Third try (tests on master branch) : AppVeyor build status

Fourth try (do not unlink whole tempdir but use another dir on this PR) : AppVeyor build status

I see, it seems installed.packages() seems referring some .rds files in tempdir() so it seems we can't just remove this. I just locally tested. It seems tempdir() returns the same dir per session. We can wait and see the "Fourth try".

Changes I made are as below in test_context.r:

   # Test add directory recursively.
 -  path <- tempdir()
 +  path <- paste0(tempdir(), "/", "recursivedir")
 +  dir.create(path)
    dir_name <- basename(path)

#' use spark.getSparkFiles(fileName) to find its download location.
#'
#' A directory can be given if the recursive option is set to true.
#' Currently directories are only supported for Hadoop-supported filesystems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a bit confusing - do we have links to what this mean?

Copy link
Contributor Author

@yanboliang yanboliang Sep 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation here is consistent with Scala/Python, and Hadoop-supported filesystem is the file system which Hadoop supported. I think it's easy to understand for users. Or should we add a link to Hadoop-supported filesystems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. Recently someone was asking about why SparkR was using Hadoop file system classes to read NFS, local, etc. in the user list - it might not be obvious to users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, added links to Hadoop-supported filesystem. Thanks!

#' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs,
#' use spark.getSparkFiles(fileName) to find its download location.
#'
#' A directory can be given if the recursive option is set to true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd merge this into @param path below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or omit this since it's described in @param recursive?

#'
#' @rdname spark.addFile
#' @param path The path of the file to be added
#' @param recursive Recursive or not if the path is directory. Default is FALSE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this say "whether to add files recursively from the path" or similar?
I mean, the directory could have nested multiple-level sub-directories and recursive will add all of them? Doesn't seem like that is called out here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated.

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65826 has finished for PR 15216 at commit f9a3a66.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 25, 2016

Test build #65879 has finished for PR 15216 at commit b2f3a59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2016

Test build #65895 has finished for PR 15216 at commit 2b6e2af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM

@yanboliang
Copy link
Contributor Author

Merged into master, thanks for review.

@asfgit asfgit closed this in 93c743f Sep 26, 2016
@yanboliang yanboliang deleted the spark-17577-2 branch September 26, 2016 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants