Skip to content

Conversation

@gf53520
Copy link
Contributor

@gf53520 gf53520 commented Feb 18, 2017

What changes were proposed in this pull request?

SPARK-19617
When restart a structured streaming job, spark will recompute WAL offsets and generate the same hdfs delta file named "currentBatchId.delta" and also generated before job restart .

How was this patch tested?

manual tests

@gf53520
Copy link
Contributor Author

gf53520 commented Feb 18, 2017

cc @zsxwing

@gf53520
Copy link
Contributor Author

gf53520 commented Feb 18, 2017

test this please

@uncleGen
Copy link
Contributor

retest this please.

@uncleGen
Copy link
Contributor

The JIRA ID is not SPARK-19645?

@gf53520
Copy link
Contributor Author

gf53520 commented Feb 26, 2017

@uncleGen The JIRA ID is 19617. But JIRA 19677 is same problem with this.

@zsxwing
Copy link
Member

zsxwing commented Feb 28, 2017

@gf53520 SPARK-19617 is a different issue. I guess you meant SPARK-19645. Thanks a lot for doing this. However, I merged #17012 as the approach is better and it has a unit test to cover this issue. Could you close this one please?

@gf53520
Copy link
Contributor Author

gf53520 commented Mar 1, 2017

@zsxwing no problem, But I think that PR exists an small problem that a tmp file of delta file is still reserved in hdfs.

@zsxwing
Copy link
Member

zsxwing commented Mar 1, 2017

Good catch. Could you submit a follow up PR to fix it?

@gf53520
Copy link
Contributor Author

gf53520 commented Mar 1, 2017

@zsxwing okay, I will submit a PR to fix it.

@gf53520 gf53520 closed this Mar 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants