Skip to content

Commit 007a42b

Browse files
Shreyesh Shaju Arangathdongjoon-hyun
authored andcommitted
[SPARK-43002][YARN] Modify yarn client application report logging frequency to reduce noise
### What changes were proposed in this pull request? * Added a new config property — `spark.yarn.report.loggingFrequency` * Limit the number of times the yarn application report is logged based on the number of reports processed. ### Why are the changes needed? Currently, an application report is generated every second, this tends to add a lot of noise especially for long running applications. This bloats the log file and makes it hard to navigate. With this change, we can limit the amount of times the application status report is logged based on the number of reports processed. ### Does this PR introduce _any_ user-facing change? The logs are now ~30s apart ``` 31-03-2023 15:00:08 PDT countByCountryFlow_countByCountry INFO - 23/03/31 22:00:08 INFO yarn.Client: Application report for application_1676870052658_5870144 (state: RUNNING) 31-03-2023 15:00:38 PDT countByCountryFlow_countByCountry INFO - 23/03/31 22:00:38 INFO yarn.Client: Application report for application_1676870052658_5870144 (state: RUNNING) 31-03-2023 15:01:08 PDT countByCountryFlow_countByCountry INFO - 23/03/31 22:01:08 INFO yarn.Client: Application report for application_1676870052658_5870144 (state: RUNNING) 31-03-2023 15:01:38 PDT countByCountryFlow_countByCountry INFO - 23/03/31 22:01:38 INFO yarn.Client: Application report for application_1676870052658_5870144 (state: RUNNING) 31-03-2023 15:02:09 PDT countByCountryFlow_countByCountry INFO - 23/03/31 22:02:09 INFO yarn.Client: Application report for application_1676870052658_5870144 (state: RUNNING) 31-03-2023 15:02:31 PDT countByCountryFlow_countByCountry INFO - 23/03/31 22:02:31 INFO yarn.Client: Application report for application_1676870052658_5870144 (state: FINISHED) ``` ### How was this patch tested? Tested locally to ensure the behavior was as expected Closes #40637 from ShreyeshArangath/ShreyeshArangath/yarn-log-frequency-patch. Authored-by: Shreyesh Shaju Arangath <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 74713b8 commit 007a42b

File tree

3 files changed

+29
-2
lines changed

3 files changed

+29
-2
lines changed

docs/running-on-yarn.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -646,6 +646,16 @@ To use a custom metrics.properties for the application master and executors, upd
646646
</td>
647647
<td>0.9.0</td>
648648
</tr>
649+
<tr>
650+
<td><code>spark.yarn.report.loggingFrequency</code></td>
651+
<td><code>30</code></td>
652+
<td>
653+
Maximum number of application reports processed until the next application status
654+
is logged. If there is a change of state, the application status will be logged regardless
655+
of the number of application reports processed.
656+
</td>
657+
<td>3.5.0</td>
658+
</tr>
649659
<tr>
650660
<td><code>spark.yarn.clientLaunchMonitorInterval</code></td>
651661
<td><code>1s</code></td>

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1142,6 +1142,8 @@ private[spark] class Client(
11421142
logApplicationReport: Boolean = true,
11431143
interval: Long = sparkConf.get(REPORT_INTERVAL)): YarnAppReport = {
11441144
var lastState: YarnApplicationState = null
1145+
val reportsTillNextLog: Int = sparkConf.get(REPORT_LOG_FREQUENCY)
1146+
var reportsSinceLastLog: Int = 0
11451147
while (true) {
11461148
Thread.sleep(interval)
11471149
val report: ApplicationReport =
@@ -1160,9 +1162,12 @@ private[spark] class Client(
11601162
Some(msg))
11611163
}
11621164
val state = report.getYarnApplicationState
1163-
1165+
reportsSinceLastLog += 1
11641166
if (logApplicationReport) {
1165-
logInfo(s"Application report for $appId (state: $state)")
1167+
if (lastState != state || reportsSinceLastLog >= reportsTillNextLog) {
1168+
logInfo(s"Application report for $appId (state: $state)")
1169+
reportsSinceLastLog = 0
1170+
}
11661171

11671172
// If DEBUG is enabled, log report details every iteration
11681173
// Otherwise, log them every time the application changes state

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,18 @@ package object config extends Logging {
225225
.timeConf(TimeUnit.MILLISECONDS)
226226
.createWithDefaultString("1s")
227227

228+
private[spark] val REPORT_LOG_FREQUENCY = {
229+
ConfigBuilder("spark.yarn.report.loggingFrequency")
230+
.doc("Maximum number of application reports processed " +
231+
"until the next application status is logged. " +
232+
"If there is a change of state, the application status will be logged " +
233+
"regardless of the number of application reports processed.")
234+
.version("3.5.0")
235+
.intConf
236+
.checkValue(_ > 0, "logging frequency should be positive")
237+
.createWithDefault(30)
238+
}
239+
228240
private[spark] val CLIENT_LAUNCH_MONITOR_INTERVAL =
229241
ConfigBuilder("spark.yarn.clientLaunchMonitorInterval")
230242
.doc("Interval between requests for status the client mode AM when starting the app.")

0 commit comments

Comments
 (0)