[SPARK-17287] [PYSPARK] Add recursive kwarg to Python SparkContext.addFile #14861

jpiper · 2016-08-29T10:09:11Z

What changes were proposed in this pull request?

Add the ability to add entire directories using the PySpark interface SparkContext.addFile(dir, recursive=True)

How was this patch tested?

I've added a test file in a nested folders in python/test_support. I use addFile to distribute this folder, and then read the file back using the directory structure.

MLnick · 2016-08-29T18:56:23Z

ok to test

SparkQA · 2016-08-29T18:59:41Z

Test build #64591 has finished for PR 14861 at commit cabcca3.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

jpiper · 2016-08-29T22:55:38Z

I fixed the Python style issue and amended the last commit - tests should be good now

SparkQA · 2016-08-30T01:07:18Z

Test build #64606 has finished for PR 14861 at commit 03d29b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jpiper · 2016-09-01T01:15:27Z

Is anyone available to review this small change? What's the proper process here, @MLnick?

It seems @JoshRosen committed the original addFile implementation, maybe he's the most appropriate person to review?

zjffdu · 2016-09-01T03:22:04Z

python/test_support/test_folder/test_folder2/hello.txt

@@ -0,0 +1 @@
+Hello World!


Please remove this file

Sorry didn't notice this is for test. But why not use the existing folder test_folder directly ?

I wanted to ensure that the recursiveness was working and it seemed a bit heavy handed to distribute the entire /test_support/sql/ folder using addFile - however I'm happy to just use that if you think it's better practice.

holdenk · 2016-10-02T05:21:46Z

python/pyspark/context.py

        FTP URI.

+        A directory can be given if the recursive option is set to true.
+        Currently directories are onlysupported for Hadoop-supported filesystems.


Minor nit: typo (onlysupported needs a space)

holdenk · 2016-10-07T19:12:48Z

@jpiper are you still working on this PR? If so you can merge in the latest version of master so we can continue to review? (If not interested that's ok just let us know :)).

jpiper · 2016-10-08T05:36:55Z

@holdenk sorry I've been on vacation! I'll fix the typo and merge in the latest master for you later today or tomorrow.

Cheers!

jpiper · 2016-10-09T05:41:31Z

Looks like this was actually added in #15140, so we can close this :)

SparkQA · 2016-10-09T07:45:32Z

Test build #66593 has finished for PR 14861 at commit 190c63b.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

jpiper changed the title ~~[SPARK-17287] [PySpark] Add recursive kwarg to Java Python SparkContext.addFile~~ [SPARK-17287] [PYSPARK] Add recursive kwarg to Python SparkContext.addFile Aug 29, 2016

jpiper force-pushed the jpiper/pyspark_addfiles branch from cabcca3 to 03d29b8 Compare August 29, 2016 22:54

zjffdu reviewed Sep 1, 2016
View reviewed changes

holdenk reviewed Oct 2, 2016

View reviewed changes

Add recursive kwarg to Java Python SparkContext.addFile

190c63b

jpiper force-pushed the jpiper/pyspark_addfiles branch from 03d29b8 to 190c63b Compare October 9, 2016 05:10

jpiper closed this Oct 9, 2016

[SPARK-17287] [PYSPARK] Add recursive kwarg to Python SparkContext.addFile #14861

[SPARK-17287] [PYSPARK] Add recursive kwarg to Python SparkContext.addFile #14861

Uh oh!

Conversation

jpiper commented Aug 29, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

MLnick commented Aug 29, 2016

Uh oh!

SparkQA commented Aug 29, 2016

Uh oh!

jpiper commented Aug 29, 2016

Uh oh!

SparkQA commented Aug 30, 2016

Uh oh!

jpiper commented Sep 1, 2016

Uh oh!

zjffdu Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

zjffdu Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpiper Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 2, 2016

Choose a reason for hiding this comment

Uh oh!

holdenk commented Oct 7, 2016

Uh oh!

jpiper commented Oct 8, 2016

Uh oh!

jpiper commented Oct 9, 2016

Uh oh!

SparkQA commented Oct 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zjffdu Sep 1, 2016 •

edited

Loading