Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Jun 2, 2015

Closes #2765

@SparkQA
Copy link

SparkQA commented Jun 2, 2015

Test build #33990 has finished for PR 6588 at commit 2d85159.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36271 has finished for PR 6588 at commit d7f42c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@GordonWang
Copy link

@zsxwing Do you have any further plan about this PR ? When is it going to be merged?

Thanks.

@zsxwing
Copy link
Member Author

zsxwing commented Jul 6, 2015

@zsxwing Do you have any further plan about this PR ? When is it going to be merged?

Thanks.

@tdas will look at this one soon. I think this one should be able to deliver in 1.5.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the other recursive call on l.203 above) may blow up the call stack in very deep hierarchies (think Maildir++, etc). I realise there's a depth limit set, but users may precisely want to set it extremely high for some of those cases.

How about making the exploration using a @tailrec auxiliary function, with a queue of yet-to-be-explored directories in arguments ?

@andrewor14
Copy link
Contributor

@tdas @zsxwing any updates on this? Is this targeted for 1.6 now?

@gprivitera
Copy link

@tdas : any update on this PR? Is there anything that should be done yet?

@gprivitera
Copy link

@zsxwing : your solution is not going to work on S3, since org.apache.hadoop.fs.FileStatus.getModificationTime() always returns 0L on directories in S3.

@tdas
Copy link
Contributor

tdas commented Nov 2, 2015

Good point!

@srowen
Copy link
Member

srowen commented Jan 20, 2016

@zsxwing is this stalled now? It doesn't look like this will proceed.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 20, 2016

Let me just close it. Maybe visit later.

@zsxwing zsxwing closed this Jan 20, 2016
@stshruthi
Copy link

Is there a plan to support nested directory streaming with latest versions of spark?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants