-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-34064][SQL] Cancel the running broadcast sub-jobs when SQL statement is cancelled #31227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tement is cancelled
| */ | ||
| def setJobGroup(groupId: String, | ||
| description: String, interruptOnCancel: Boolean = false): Unit = { | ||
| val actualGroupId = getJobGroupId(groupId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a silent API behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add this to description
|
This change is a bit too low-level and may have a big impact. Can we do something in We also need to update the broadcast exchange to read the special local property. |
|
Test build #134186 has finished for PR 31227 at commit
|
|
I have a little bit confuse, why not just add a local property such as |
|
ah that sounds simpler |
| /** | ||
| * Statement id is only used for thrift server | ||
| */ | ||
| private[spark] val SPARK_STATEMENT_ID = "spark.statement.id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define it in sql/core module, as it's only used there and the STS module. Probably in SQLExecution object.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134195 has finished for PR 31227 at commit
|
|
@LantaoJin any updates? |
|
Test build #137564 has finished for PR 31227 at commit
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
The refactor PR for #31119.
In this PR, we add a local property key
spark.statement.idwhich set by STS. STS will set statementId to this local property,in broadcast exchange, the
runIdreads the value from this property if defined, or uses a random UUID.Why are the changes needed?
When broadcasting a table takes too long and the SQL statement is cancelled. However, the background Spark job is still running and it wastes resources.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually test.
Since broadcasting a table is too fast to cancel in UT, but it is very easy to verify manually: