-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4180993
55debab
71609f5
868e326
85ffc02
4e057c9
6fc5560
6f7dcd4
15f88a3
fa7175b
4603a39
3e098d8
e4a4593
912a637
848819c
93e263a
45bd33d
7b67f5a
08e627e
ead8966
3ea7896
2ec848a
efcb6e1
4776f9e
4a93c7f
e719c39
c229fbe
eea6e19
36d2f7a
e705386
086ee25
8a2b6ec
b311806
424fea4
203634e
28d4d93
bcf25bf
0f890e6
eabda80
0a2fc14
0325caf
6a91b14
4567ffc
9ff4851
dfc83fd
d009c55
3011efc
9aadb0b
2cdd009
97cb85c
1b8b53e
f6af132
248935d
42bea55
0fc1b09
a42300c
0413b90
6434b06
16a9f01
5cca0d2
69c46a6
ff45c89
389090d
2cf8a0f
f9b6fb3
59cc2cd
f0c7fba
990c078
10be654
67fa5eb
817e4f0
6570494
39169e4
3288b2b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -187,6 +187,15 @@ class SparkContext(config: SparkConf) extends Logging { | |
| val master = conf.get("spark.master") | ||
| val appName = conf.get("spark.app.name") | ||
|
|
||
| private[spark] val isEventLogEnabled = conf.getBoolean("spark.eventLog.enabled", false) | ||
| private[spark] val eventLogDir: Option[String] = { | ||
| if (isEventLogEnabled) { | ||
| Some(conf.get("spark.eventLog.dir", EventLoggingListener.DEFAULT_LOG_DIR).stripSuffix("/")) | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
|
|
||
| // Generate the random name for a temp folder in Tachyon | ||
| // Add a timestamp as the suffix here to make it more safe | ||
| val tachyonFolderName = "spark-" + randomUUID.toString() | ||
|
|
@@ -200,6 +209,7 @@ class SparkContext(config: SparkConf) extends Logging { | |
| private[spark] val listenerBus = new LiveListenerBus | ||
|
|
||
| // Create the Spark execution environment (cache, map output tracker, etc) | ||
| conf.set("spark.executor.id", "driver") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this used for? The driver is technically not an executor (yes, we conflate the two elsewhere but that's bad too).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the idea is that certain types of metrics can be generated by components that run on both drivers and executors (e.g. BlockManager metrics) and we'd like to capture the location where the metric is generated as part of the metric name.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I don't have a good alternative suggestion for this. |
||
| private[spark] val env = SparkEnv.create( | ||
| conf, | ||
| "<driver>", | ||
|
|
@@ -232,19 +242,6 @@ class SparkContext(config: SparkConf) extends Logging { | |
| /** A default Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse. */ | ||
| val hadoopConfiguration = SparkHadoopUtil.get.newConfiguration(conf) | ||
|
|
||
| // Optionally log Spark events | ||
| private[spark] val eventLogger: Option[EventLoggingListener] = { | ||
| if (conf.getBoolean("spark.eventLog.enabled", false)) { | ||
| val logger = new EventLoggingListener(appName, conf, hadoopConfiguration) | ||
| logger.start() | ||
| listenerBus.addListener(logger) | ||
| Some(logger) | ||
| } else None | ||
| } | ||
|
|
||
| // At this point, all relevant SparkListeners have been registered, so begin releasing events | ||
| listenerBus.start() | ||
|
|
||
| val startTime = System.currentTimeMillis() | ||
|
|
||
| // Add each JAR given through the constructor | ||
|
|
@@ -309,6 +306,29 @@ class SparkContext(config: SparkConf) extends Logging { | |
| // constructor | ||
| taskScheduler.start() | ||
|
|
||
| val applicationId: String = taskScheduler.applicationId() | ||
| conf.set("spark.app.id", applicationId) | ||
|
|
||
| val metricsSystem = env.metricsSystem | ||
|
|
||
| // The metrics system for Driver need to be set spark.app.id to app ID. | ||
| // So it should start after we get app ID from the task scheduler and set spark.app.id. | ||
| metricsSystem.start() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about putting these three into |
||
|
|
||
| // Optionally log Spark events | ||
| private[spark] val eventLogger: Option[EventLoggingListener] = { | ||
| if (isEventLogEnabled) { | ||
| val logger = | ||
| new EventLoggingListener(applicationId, eventLogDir.get, conf, hadoopConfiguration) | ||
| logger.start() | ||
| listenerBus.addListener(logger) | ||
| Some(logger) | ||
| } else None | ||
| } | ||
|
|
||
| // At this point, all relevant SparkListeners have been registered, so begin releasing events | ||
| listenerBus.start() | ||
|
|
||
| private[spark] val cleaner: Option[ContextCleaner] = { | ||
| if (conf.getBoolean("spark.cleaner.referenceTracking", true)) { | ||
| Some(new ContextCleaner(this)) | ||
|
|
@@ -411,8 +431,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| // Post init | ||
| taskScheduler.postStartHook() | ||
|
|
||
| private val dagSchedulerSource = new DAGSchedulerSource(this.dagScheduler, this) | ||
| private val blockManagerSource = new BlockManagerSource(SparkEnv.get.blockManager, this) | ||
| private val dagSchedulerSource = new DAGSchedulerSource(this.dagScheduler) | ||
| private val blockManagerSource = new BlockManagerSource(SparkEnv.get.blockManager) | ||
|
|
||
| private def initDriverMetrics() { | ||
| SparkEnv.get.metricsSystem.registerSource(dagSchedulerSource) | ||
|
|
@@ -1278,7 +1298,7 @@ class SparkContext(config: SparkConf) extends Logging { | |
| private def postApplicationStart() { | ||
| // Note: this code assumes that the task scheduler has been initialized and has contacted | ||
| // the cluster manager to get an application ID (in case the cluster manager provides one). | ||
| listenerBus.post(SparkListenerApplicationStart(appName, taskScheduler.applicationId(), | ||
| listenerBus.post(SparkListenerApplicationStart(appName, Some(applicationId), | ||
| startTime, sparkUser)) | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -259,11 +259,15 @@ object SparkEnv extends Logging { | |
| } | ||
|
|
||
| val metricsSystem = if (isDriver) { | ||
| // Don't start metrics system right now for Driver. | ||
| // We need to wait for the task scheduler to give us an app ID. | ||
| // Then we can start the metrics system. | ||
| MetricsSystem.createMetricsSystem("driver", conf, securityManager) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to add a comment here to explain why we don't initialize it right away if it's the driver (i.e. we need to wait for the task scheduler to give us an app ID) |
||
| } else { | ||
| MetricsSystem.createMetricsSystem("executor", conf, securityManager) | ||
| val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager) | ||
| ms.start() | ||
| ms | ||
| } | ||
| metricsSystem.start() | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove random new line |
||
| // Set the sparkFiles directory, used when downloading dependencies. In local mode, | ||
| // this is a temporary directory; in distributed mode, this is the executor's current working | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,10 +22,10 @@ import com.codahale.metrics.{Gauge,MetricRegistry} | |
| import org.apache.spark.SparkContext | ||
| import org.apache.spark.metrics.source.Source | ||
|
|
||
| private[spark] class DAGSchedulerSource(val dagScheduler: DAGScheduler, sc: SparkContext) | ||
| private[spark] class DAGSchedulerSource(val dagScheduler: DAGScheduler) | ||
| extends Source { | ||
| override val metricRegistry = new MetricRegistry() | ||
| override val sourceName = "%s.DAGScheduler".format(sc.appName) | ||
| override val sourceName = "DAGScheduler" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this change? I'd say we want at least some relationship with the application here. Since we rely on the application ID we can solve SPARK-3377, but as a backup it's still not a bad idea to use the app name.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After this patch, I don't think that this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. So really this is the |
||
|
|
||
| metricRegistry.register(MetricRegistry.name("stage", "failedStages"), new Gauge[Int] { | ||
| override def getValue: Int = dagScheduler.failedStages.size | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you don't pass this into the task scheduler, you don't even need this variable anymore. Then you can do the check for whether event logging is enabled down there with where you instantiate the
EventLoggingListener.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, those are needed because SparkDeploySchedulerBackend references sc.eventLogDir.
We cannot do that pass the eventLogDir to only EventLoggingListener and SparkDeploySchedulerBackend refers the eventLogDir via EventLoggingListener because EventLoggingListener should be instantiated after creating task scheduler. Task scheduler cannot refer.
SparkDeploySchedulerBackend and EventLoggingListener can get eventLogDir by conf.get themselves. But, if we take such approach, EventLoggingListener should check whether event logging is enabled before getting eventLogDir even though the check is done in SparkContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it's because of this new dependency order. Alright, I guess we have no choice but to keep them then. Either way these should be
private[spark]instead of completely public.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we have to do is make isEventLogDir and eventLogDir private[spark] right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes