-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17287] [PYSPARK] Add recursive kwarg to Python SparkContext.addFile #14861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
recursive kwarg to Java Python SparkContext.addFile|
ok to test |
|
Test build #64591 has finished for PR 14861 at commit
|
cabcca3 to
03d29b8
Compare
|
I fixed the Python style issue and amended the last commit - tests should be good now |
|
Test build #64606 has finished for PR 14861 at commit
|
|
Is anyone available to review this small change? What's the proper process here, @MLnick? It seems @JoshRosen committed the original |
| @@ -0,0 +1 @@ | |||
| Hello World! | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry didn't notice this is for test. But why not use the existing folder test_folder directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to ensure that the recursiveness was working and it seemed a bit heavy handed to distribute the entire /test_support/sql/ folder using addFile - however I'm happy to just use that if you think it's better practice.
| FTP URI. | ||
| A directory can be given if the recursive option is set to true. | ||
| Currently directories are onlysupported for Hadoop-supported filesystems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: typo (onlysupported needs a space)
|
@jpiper are you still working on this PR? If so you can merge in the latest version of master so we can continue to review? (If not interested that's ok just let us know :)). |
|
@holdenk sorry I've been on vacation! I'll fix the typo and merge in the latest master for you later today or tomorrow. Cheers! |
03d29b8 to
190c63b
Compare
|
Looks like this was actually added in #15140, so we can close this :) |
|
Test build #66593 has finished for PR 14861 at commit
|
What changes were proposed in this pull request?
Add the ability to add entire directories using the PySpark interface
SparkContext.addFile(dir, recursive=True)How was this patch tested?
I've added a test file in a nested folders in
python/test_support. I useaddFileto distribute this folder, and then read the file back using the directory structure.