[WIP][SPARK-42206][CORE] Omit "Task Executor Metrics" field in eventlogs if values are all zero #39770
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This change updates
JsonProtocolto add logic to exclude the "Task Executor Metrics" field from SparkListenerTaskEnd events in cases where all metric values are zero.Why are the changes needed?
This is done to save space from event logs when Spark runs under its default out-of-the-box configuration and tasks are shorter than the executor hearbeat interval.
SPARK-26329 added "Task Executor Metrics" to JsonProtocol SparkListenerTaskEnd JSON. With the default
spark.executor.metrics.pollingInterval = 0configuration these metric values are only updated when heartbeats occur. If a task launches and finishes between executor heartbeats then all of the "Task Executor Metrics" values will be zero. For jobs with large numbers of short tasks, this contributes to significant event log bloat.JsonProtocol already knows how to handle the absence of the "Task Executor Metrics" field, so I think it's safe for us to omit this field when all values are zero.
There is a possibility that third-party code which directly consumes Spark event logs might be relying on the presence of this field. As an "escape-hatch" to avoid breaking such workloads, I have introduced a
spark.eventLog.includeAllZeroTaskExecutorMetrics(defaultfalse) which can be set totrueto restore the old behavior.Does this PR introduce any user-facing change?
No user-facing changes in history server.
This could be considered a user-facing change from the perspective of third-party code which does its own processing of Spark logs, hence the config. I think it's reasonable to set a sensible default which shrinks event logs for most users instead of keeping a conservative default to support a hypothetical third-party use case of our event logs.
How was this patch tested?
Added new test cases in JsonProtocolSuite.