Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,8 @@ object SQLConf {
}
} else {
val isSchedulerEventLoopThread = SparkContext.getActive
.map(_.dagScheduler.eventProcessLoop.eventThread)
.flatMap { sc => Option(sc.dagScheduler) }
.map(_.eventProcessLoop.eventThread)
.exists(_.getId == Thread.currentThread().getId)
if (isSchedulerEventLoopThread) {
// DAGScheduler event loop thread does not have an active SparkSession, the `confGetter`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package org.apache.spark.sql.internal

import org.apache.hadoop.fs.Path

import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.spark.sql.internal.StaticSQLConf._
import org.apache.spark.sql.test.{SharedSparkSession, TestSQLContext}
Expand Down Expand Up @@ -320,4 +321,22 @@ class SQLConfSuite extends QueryTest with SharedSparkSession {
assert(e2.getMessage.contains("spark.sql.shuffle.partitions"))
}

test("SPARK-29046: SQLConf.get shouldn't throw NPE when active SparkContext is stopping") {
// Logically, there's only one case SQLConf.get throws NPE: there's active SparkContext,
// but SparkContext is stopping - especially it sets dagScheduler as null.

val oldSparkContext = SparkContext.getActive
Utils.tryWithSafeFinally {
// this is necessary to set new SparkContext as active: it cleans up active SparkContext
oldSparkContext.foreach(_ => SparkContext.clearActiveContext())

val conf = new SparkConf().setAppName("test").setMaster("local")
LocalSparkContext.withSpark(new SparkContext(conf)) { sc =>
sc.dagScheduler = null
SQLConf.get
}
} {
oldSparkContext.orElse(Some(null)).foreach(SparkContext.setActiveContext)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, All.

Ur, I'm wondering if this PR's test case is safe in concurrent testing environments?
This might be the root cause of the current outage in Apache Spark Jenkins jobs.
After this PR, all Jenkins on master branch never succeeds.

Could you take a look a little bit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is late Friday, just revert this if you thinks it's breaking things.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've been investigating this and the other commits. But, this is the usual suspect given the error message. Since master branch has been broken over 2 days and it seems that we cannot fix this during weekend, I'll revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding. I haven't noticed concurrent tests are executed, as there're many tests leveraging single object like SparkContext. I'll just exclude test code in new PR as I don't see other way to test this.

Btw, do we run different options between PR and master build? I'm curious about the reason - as we tend to guess PR build pass as "OK".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is probably Maven vs SBT testing differences. I personally would prefer to standardize on the Maven build for everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeartSaVioR . PRBuilder is the same. PRBuilder has been very flaky last two days. I hit the same failure on my independent PRs too many times. That was the reason I investigated this and reached here to ask help.

@srowen . According to the Jenkins status, All Jenkins job combinations failed. This is irrelevant to SBT and Maven difference.

}
}
}