Skip to content

Conversation

@JoshRosen
Copy link
Contributor

What changes were proposed in this pull request?

This change updates JsonProtocol to add logic to exclude the "Task Executor Metrics" field from SparkListenerTaskEnd events in cases where all metric values are zero.

Why are the changes needed?

This is done to save space from event logs when Spark runs under its default out-of-the-box configuration and tasks are shorter than the executor hearbeat interval.

SPARK-26329 added "Task Executor Metrics" to JsonProtocol SparkListenerTaskEnd JSON. With the default spark.executor.metrics.pollingInterval = 0 configuration these metric values are only updated when heartbeats occur. If a task launches and finishes between executor heartbeats then all of the "Task Executor Metrics" values will be zero. For jobs with large numbers of short tasks, this contributes to significant event log bloat.

JsonProtocol already knows how to handle the absence of the "Task Executor Metrics" field, so I think it's safe for us to omit this field when all values are zero.

There is a possibility that third-party code which directly consumes Spark event logs might be relying on the presence of this field. As an "escape-hatch" to avoid breaking such workloads, I have introduced a spark.eventLog.includeAllZeroTaskExecutorMetrics (default false) which can be set to true to restore the old behavior.

Does this PR introduce any user-facing change?

No user-facing changes in history server.

This could be considered a user-facing change from the perspective of third-party code which does its own processing of Spark logs, hence the config. I think it's reasonable to set a sensible default which shrinks event logs for most users instead of keeping a conservative default to support a hypothetical third-party use case of our event logs.

How was this patch tested?

Added new test cases in JsonProtocolSuite.

@github-actions github-actions bot added the CORE label Jan 27, 2023
* We use this instead of passing SparkConf directly because it lets us avoid
* repeated re-parsing of configuration values on each read.
*/
private[spark] class JsonProtocolOptions(conf: SparkConf) {
Copy link
Contributor Author

@JoshRosen JoshRosen Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same flagging pattern used in #39763, so these PRs will conflict and will need updating after one of them merges. I've marked this PR as WIP pending discussion on how / whether we want to flag this change (perhaps I'm being overly conservative and we'd be okay with saying that third-party code directly consuming event logs should update itself to be robust to the new / changed format and not go out of our way to provide escape hatches for this use-case).

@mridulm
Copy link
Contributor

mridulm commented Jan 29, 2023

Are you actually observing this case ? I am quite surprised that all the metrics are zero - since it includes jvm heap memory, execution memory, etc.


test("SPARK-42206: skip logging of all-zero Task Executor Metrics") {
val allZeroTaskExecutorMetrics = new ExecutorMetrics(Map.empty[String, Long])
val allZeroTaskExecutorMetrics = new ExecutorMetrics(Array.emptyLongArray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map.empty[String, Long] should also ok for Scala 2.13 after #39772 merged

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 15, 2023
@github-actions github-actions bot closed this May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants