-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-29046][SQL] Fix NPE in SQLConf.get when active SparkContext is stopping #25790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc. @dongjoon-hyun @srowen @vanzin @HyukjinKwon who reviewed previous PR. This is revised version of #25753 - please refer #25753 (comment) to see why the patch were reverted and new patch doesn't contain test. |
|
Test build #110591 has finished for PR 25790 at commit
|
|
Thank you, @HeartSaVioR . BTW, if there is a person to investigate the previous failure, I believe it is you. Could you take a look at that? |
|
There's leakage on threads after running previous test: and it affects other suites. and That was actually due to not stop dagScheduler before assigning null. But even I fixed it, it didn't resolve other side-effects coming from Spark's ground rule - Spark doesn't support multiple SparkContext instances. (Other things were broken.) I just learned more for this time, it doesn't work even one is being used actively and other one is sitting down to wait. Never have two SparkContext instances in JVM. |
|
Another possible code of test which don't swap SparkContext: but I wouldn't guarantee the test can be run concurrently with other tests - so if possible I'll not update the patch. I'd even suspect how tests have been able to run concurrently where we make a change with static val (I meant object), but I don't have unlimited bandwidth and there're lots of remaining works on me (still have many ideas on SS area, and I've done only first part of SPARK-28594) so I'd like to stop putting efforts on this and revisit later when my backlog is empty. |
|
Ya. All of us learned more at this. Thank you for spending your time on this. |
|
Thanks for the kind words :) Really appreciated your consistent efforts on INFRA/background things like stabilizing builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to be clear we think this is a correct change (the test in isolation worked) but it's hard to test without creating problems for other tests? OK, this change seems fine still by itself.
|
Ya. I believe so, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master back.
Thank you, @HeartSaVioR and @srowen .
|
Thanks all for reviewing and merging again. :) |
… stopping ### What changes were proposed in this pull request? This patch fixes the bug regarding NPE in SQLConf.get, which is only possible when SparkContext._dagScheduler is null due to stopping SparkContext. The logic doesn't seem to consider active SparkContext could be in progress of stopping. Note that it can't be encountered easily as SparkContext.stop() blocks the main thread, but there're many cases which SQLConf.get is accessed concurrently while SparkContext.stop() is executing - users run another threads, or listener is accessing SQLConf.get after dagScheduler is set to null (this is the case what I encountered.) ### Why are the changes needed? The bug brings NPE. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Previous patch apache#25753 was tested with new UT, and due to disruption with other tests in concurrent test run, the test is excluded in this patch. Closes apache#25790 from HeartSaVioR/SPARK-29046-v2. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
FYI: I've raised the patch for #25798 against branch-2.4 as the branch I've originally found the issue was 2.4.x. (Sorry I missed to add these versions in affected versions.) Please let me know if we wouldn't port back like this case - I'll close it. |
What changes were proposed in this pull request?
This patch fixes the bug regarding NPE in SQLConf.get, which is only possible when SparkContext._dagScheduler is null due to stopping SparkContext. The logic doesn't seem to consider active SparkContext could be in progress of stopping.
Note that it can't be encountered easily as SparkContext.stop() blocks the main thread, but there're many cases which SQLConf.get is accessed concurrently while SparkContext.stop() is executing - users run another threads, or listener is accessing SQLConf.get after dagScheduler is set to null (this is the case what I encountered.)
Why are the changes needed?
The bug brings NPE.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Previous patch #25753 was tested with new UT, and due to disruption with other tests in concurrent test run, the test is excluded in this patch.