-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-43281][SQL] Fix concurrent writer does not update file metrics #40952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan |
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Is this a long-standing bug?
|
@cloud-fan , it happened since #32198 and with concurrent writer on. |
| val missing = new Path(tempDirPath, "missing") | ||
| val tracker = new BasicWriteTaskStatsTracker(conf) | ||
| tracker.newFile(missing.toString) | ||
| tracker.closeFile(missing.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after refactor of #32198, one newFile should have one closeFile
|
@cloud-fan any comments ? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
cc @mridulm , too. |
|
thanks, merging to master/3.4! |
### What changes were proposed in this pull request? `DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task. This pr calls `closeFile` before commit task. ### Why are the changes needed? fix bug ### Does this PR introduce _any_ user-facing change? yes, after this pr the metrics is correct ### How was this patch tested? add test Closes #40952 from ulysses-you/SPARK-43281. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 592e922) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? `DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task. This pr calls `closeFile` before commit task. ### Why are the changes needed? fix bug ### Does this PR introduce _any_ user-facing change? yes, after this pr the metrics is correct ### How was this patch tested? add test Closes apache#40952 from ulysses-you/SPARK-43281. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 592e922) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? `DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task. This pr calls `closeFile` before commit task. ### Why are the changes needed? fix bug ### Does this PR introduce _any_ user-facing change? yes, after this pr the metrics is correct ### How was this patch tested? add test Closes apache#40952 from ulysses-you/SPARK-43281. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 592e922) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? `DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task. This pr calls `closeFile` before commit task. ### Why are the changes needed? fix bug ### Does this PR introduce _any_ user-facing change? yes, after this pr the metrics is correct ### How was this patch tested? add test Closes apache#40952 from ulysses-you/SPARK-43281. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 592e922) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
DynamicPartitionDataConcurrentWriterit uses temp file path to get file status after commit task. However, the temp file has already moved to new path during commit task.This pr calls
closeFilebefore commit task.Why are the changes needed?
fix bug
Does this PR introduce any user-facing change?
yes, after this pr the metrics is correct
How was this patch tested?
add test