Skip to content

Conversation

@jpiper
Copy link

@jpiper jpiper commented Aug 29, 2016

What changes were proposed in this pull request?

Add the ability to add entire directories using the PySpark interface SparkContext.addFile(dir, recursive=True)

How was this patch tested?

I've added a test file in a nested folders in python/test_support. I use addFile to distribute this folder, and then read the file back using the directory structure.

@jpiper jpiper changed the title [SPARK-17287] [PySpark] Add recursive kwarg to Java Python SparkContext.addFile [SPARK-17287] [PYSPARK] Add recursive kwarg to Python SparkContext.addFile Aug 29, 2016
@MLnick
Copy link
Contributor

MLnick commented Aug 29, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64591 has finished for PR 14861 at commit cabcca3.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jpiper jpiper force-pushed the jpiper/pyspark_addfiles branch from cabcca3 to 03d29b8 Compare August 29, 2016 22:54
@jpiper
Copy link
Author

jpiper commented Aug 29, 2016

I fixed the Python style issue and amended the last commit - tests should be good now

@SparkQA
Copy link

SparkQA commented Aug 30, 2016

Test build #64606 has finished for PR 14861 at commit 03d29b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jpiper
Copy link
Author

jpiper commented Sep 1, 2016

Is anyone available to review this small change? What's the proper process here, @MLnick?

It seems @JoshRosen committed the original addFile implementation, maybe he's the most appropriate person to review?

@@ -0,0 +1 @@
Hello World!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file

Copy link
Contributor

@zjffdu zjffdu Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry didn't notice this is for test. But why not use the existing folder test_folder directly ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ensure that the recursiveness was working and it seemed a bit heavy handed to distribute the entire /test_support/sql/ folder using addFile - however I'm happy to just use that if you think it's better practice.

FTP URI.
A directory can be given if the recursive option is set to true.
Currently directories are onlysupported for Hadoop-supported filesystems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: typo (onlysupported needs a space)

@holdenk
Copy link
Contributor

holdenk commented Oct 7, 2016

@jpiper are you still working on this PR? If so you can merge in the latest version of master so we can continue to review? (If not interested that's ok just let us know :)).

@jpiper
Copy link
Author

jpiper commented Oct 8, 2016

@holdenk sorry I've been on vacation! I'll fix the typo and merge in the latest master for you later today or tomorrow.

Cheers!

@jpiper jpiper force-pushed the jpiper/pyspark_addfiles branch from 03d29b8 to 190c63b Compare October 9, 2016 05:10
@jpiper
Copy link
Author

jpiper commented Oct 9, 2016

Looks like this was actually added in #15140, so we can close this :)

@jpiper jpiper closed this Oct 9, 2016
@SparkQA
Copy link

SparkQA commented Oct 9, 2016

Test build #66593 has finished for PR 14861 at commit 190c63b.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants