Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Oct 17, 2016

What changes were proposed in this pull request?

Calling Await.result will allow other tasks to be run on the same thread when using ForkJoinPool. However, SQL uses a ThreadLocal execution id to trace Spark jobs launched by a query, which doesn't work perfectly in ForkJoinPool.

This PR just uses Awaitable.result instead to prevent ForkJoinPool from running other tasks in the current waiting thread.

How was this patch tested?

Jenkins

@jodersky
Copy link
Member

I'm not sure I completely understand why you have to use Awaitable instead of Await. Is it to avoid the blocking() call in Await?

@zsxwing
Copy link
Member Author

zsxwing commented Oct 17, 2016

Is it to avoid the blocking() call in Await?

Yep. Without blocking(), other tasks cannot run in the current thread during waiting for result, then ThreadLocal will not be leaked to other tasks.

@jodersky
Copy link
Member

jodersky commented Oct 17, 2016

That makes sense, thanks for explaining. Would installing a custom BlockingContext on threads that run tasks also work? I'm not very familiar with the internals of the Awaitable api so I'm just throwing this out there in the hopes you know more :)

@SparkQA
Copy link

SparkQA commented Oct 17, 2016

Test build #67088 has finished for PR 15520 at commit 54078ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 17, 2016

Would installing a custom BlockingContext on threads that run tasks also work?

Yes, I think so. It just needs more lines :)

@zsxwing
Copy link
Member Author

zsxwing commented Oct 17, 2016

retest this please

@jodersky
Copy link
Member

Yes, I think so. It just needs more lines :)

I think the solution you propose is sound. Do you think that a quick mention about avoiding the BlockingContext in a comment would help in case someone sees this code in the future and is puzzled as to what it does?

@SparkQA
Copy link

SparkQA commented Oct 18, 2016

Test build #67094 has finished for PR 15520 at commit 54078ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 18, 2016

cc @andrewor14

@SparkQA
Copy link

SparkQA commented Oct 18, 2016

Test build #67112 has finished for PR 15520 at commit 6aa9e2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// scalastyle:off awaitresult
Await.result(...)
// scalastyle:on awaitresult
If your codes use ThreadLocal and run in a user's thread, use ThreadUtils.awaitResultInForkJoinSafely instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the user's thread mean at here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant threads created by the user. Fixed the description.

@SparkQA
Copy link

SparkQA commented Oct 24, 2016

Test build #67461 has finished for PR 15520 at commit e9753ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Oct 26, 2016

lgtm

@zsxwing
Copy link
Member Author

zsxwing commented Oct 26, 2016

Thanks! Merging to master and 2.0.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 26, 2016

There are conflicts with 2.0. #15646 is the backport for 2.0.

@zsxwing zsxwing deleted the SPARK-13747 branch October 26, 2016 17:43
asfgit pushed a commit that referenced this pull request Oct 26, 2016
…(branch 2.0)

## What changes were proposed in this pull request?

Backport #15520 to 2.0.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <[email protected]>

Closes #15646 from zsxwing/SPARK-13747-2.0.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

Calling `Await.result` will allow other tasks to be run on the same thread when using ForkJoinPool. However, SQL uses a `ThreadLocal` execution id to trace Spark jobs launched by a query, which doesn't work perfectly in ForkJoinPool.

This PR just uses `Awaitable.result` instead to  prevent ForkJoinPool from running other tasks in the current waiting thread.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <[email protected]>

Closes apache#15520 from zsxwing/SPARK-13747.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Calling `Await.result` will allow other tasks to be run on the same thread when using ForkJoinPool. However, SQL uses a `ThreadLocal` execution id to trace Spark jobs launched by a query, which doesn't work perfectly in ForkJoinPool.

This PR just uses `Awaitable.result` instead to  prevent ForkJoinPool from running other tasks in the current waiting thread.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <[email protected]>

Closes apache#15520 from zsxwing/SPARK-13747.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants