-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17577][Follow-up][SparkR] SparkR spark.addFile supports adding directory recursively #15216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| #' use spark.getSparkFiles(fileName) to find its download location. | ||
| #' | ||
| #' A directory can be given if the recursive option is set to true. | ||
| #' Currently directories are only supported for Hadoop-supported filesystems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a bit confusing - do we have links to what this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation here is consistent with Scala/Python, and Hadoop-supported filesystem is the file system which Hadoop supported. I think it's easy to understand for users. Or should we add a link to Hadoop-supported filesystems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. Recently someone was asking about why SparkR was using Hadoop file system classes to read NFS, local, etc. in the user list - it might not be obvious to users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, added links to Hadoop-supported filesystem. Thanks!
| #' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, | ||
| #' use spark.getSparkFiles(fileName) to find its download location. | ||
| #' | ||
| #' A directory can be given if the recursive option is set to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd merge this into @param path below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or omit this since it's described in @param recursive?
R/pkg/R/context.R
Outdated
| #' | ||
| #' @rdname spark.addFile | ||
| #' @param path The path of the file to be added | ||
| #' @param recursive Recursive or not if the path is directory. Default is FALSE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this say "whether to add files recursively from the path" or similar?
I mean, the directory could have nested multiple-level sub-directories and recursive will add all of them? Doesn't seem like that is called out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, updated.
|
Test build #65826 has finished for PR 15216 at commit
|
|
Test build #65879 has finished for PR 15216 at commit
|
|
Test build #65895 has finished for PR 15216 at commit
|
|
LGTM |
|
Merged into master, thanks for review. |
What changes were proposed in this pull request?
#15140 exposed
JavaSparkContext.addFile(path: String, recursive: Boolean)to Python/R, then we can update SparkRspark.addFileto support adding directory recursively.How was this patch tested?
Added unit test.