Skip to content

Conversation

@xi-db
Copy link
Contributor

@xi-db xi-db commented Apr 15, 2024

(Original PR)

What changes were proposed in this pull request?

Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling runner.join() in ExecutorHolder.close(), we set a post-cleanup function as the callback through runner.processOnCompletion, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread.

Why are the changes needed?

In the rare cases mentioned above, performance can be severely affected.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests and a new test Async cleanup callback gets called after the execution is closed in ReattachableExecuteSuite.scala.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Contributor

@vicennial vicennial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @hvanhovell

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this in branch-3.5, we need to change the issue type first, @xi-db and @hvanhovell .

Screenshot 2024-04-15 at 08 41 30

@hvanhovell
Copy link
Contributor

@dongjoon-hyun I updated the ticket. I am a bit mystified by this though, this was never an issue before. This particular case can be both seen as a bug/improvement.

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun I updated the ticket.

Thank you for updating.

I am a bit mystified by this though, this was never an issue before. This particular case can be both seen as a bug/improvement.

I agree with you. We can choose our path.

  • If we want to have this in branch-3.5, we need to categorize it as a Bug.
  • Otherwise, we can keep it in master only as Improvement and wait and see the customer report at Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun dismissed their stale review April 15, 2024 16:52

Resolved.

@xi-db xi-db changed the title [WIP][SPARK-47819][CONNECT][3.5] Use asynchronous callback for execution cleanup [SPARK-47819][CONNECT][3.5] Use asynchronous callback for execution cleanup Apr 15, 2024
hvanhovell pushed a commit that referenced this pull request Apr 24, 2024
…leanup

([Original PR](#46027))

### What changes were proposed in this pull request?

Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling `runner.join()` in `ExecutorHolder.close()`, we set a post-cleanup function as the callback through `runner.processOnCompletion`, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread.

### Why are the changes needed?

In the rare cases mentioned above, performance can be severely affected.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests and a new test `Async cleanup callback gets called after the execution is closed` in `ReattachableExecuteSuite.scala`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46064 from xi-db/SPARK-47819-async-cleanup-3.5.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
@hvanhovell hvanhovell closed this Apr 24, 2024
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…leanup (apache#383)

([Original PR](apache#46027))

### What changes were proposed in this pull request?

Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling `runner.join()` in `ExecutorHolder.close()`, we set a post-cleanup function as the callback through `runner.processOnCompletion`, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread.

### Why are the changes needed?

In the rare cases mentioned above, performance can be severely affected.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests and a new test `Async cleanup callback gets called after the execution is closed` in `ReattachableExecuteSuite.scala`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46064 from xi-db/SPARK-47819-async-cleanup-3.5.

Authored-by: Xi Lyu <[email protected]>

Signed-off-by: Herman van Hovell <[email protected]>
Co-authored-by: Xi Lyu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants