Skip to content

Conversation

@ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

DynamicPartitionDataConcurrentWriter it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task.

This pr calls closeFile before commit task.

Why are the changes needed?

fix bug

Does this PR introduce any user-facing change?

yes, after this pr the metrics is correct

How was this patch tested?

add test

@github-actions github-actions bot added the SQL label Apr 26, 2023
@ulysses-you
Copy link
Contributor Author

cc @cloud-fan

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Is this a long-standing bug?

@ulysses-you
Copy link
Contributor Author

@cloud-fan , it happened since #32198 and with concurrent writer on.

val missing = new Path(tempDirPath, "missing")
val tracker = new BasicWriteTaskStatsTracker(conf)
tracker.newFile(missing.toString)
tracker.closeFile(missing.toString)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after refactor of #32198, one newFile should have one closeFile

@ulysses-you
Copy link
Contributor Author

@cloud-fan any comments ?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

cc @mridulm , too.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.4!

@cloud-fan cloud-fan closed this in 592e922 May 16, 2023
cloud-fan pushed a commit that referenced this pull request May 16, 2023
### What changes were proposed in this pull request?

`DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task.

This pr calls `closeFile` before commit task.

### Why are the changes needed?

fix bug

### Does this PR introduce _any_ user-facing change?

yes, after this pr the metrics is correct

### How was this patch tested?

add test

Closes #40952 from ulysses-you/SPARK-43281.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 592e922)
Signed-off-by: Wenchen Fan <[email protected]>
@ulysses-you ulysses-you deleted the SPARK-43281 branch May 16, 2023 07:44
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?

`DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task.

This pr calls `closeFile` before commit task.

### Why are the changes needed?

fix bug

### Does this PR introduce _any_ user-facing change?

yes, after this pr the metrics is correct

### How was this patch tested?

add test

Closes apache#40952 from ulysses-you/SPARK-43281.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 592e922)
Signed-off-by: Wenchen Fan <[email protected]>
GladwinLee pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
### What changes were proposed in this pull request?

`DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task.

This pr calls `closeFile` before commit task.

### Why are the changes needed?

fix bug

### Does this PR introduce _any_ user-facing change?

yes, after this pr the metrics is correct

### How was this patch tested?

add test

Closes apache#40952 from ulysses-you/SPARK-43281.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 592e922)
Signed-off-by: Wenchen Fan <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
### What changes were proposed in this pull request?

`DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task.

This pr calls `closeFile` before commit task.

### Why are the changes needed?

fix bug

### Does this PR introduce _any_ user-facing change?

yes, after this pr the metrics is correct

### How was this patch tested?

add test

Closes apache#40952 from ulysses-you/SPARK-43281.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 592e922)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants