[SPARK-25222][K8S] Improve container status logging #22215

rvesse · 2018-08-24T10:13:34Z

What changes were proposed in this pull request?

Currently when running Spark on Kubernetes a logger is run by the client that watches the K8S API for events related to the Driver pod and logs them. However for the container status aspect of the logging this simply dumps the raw object which is not human readable e.g.

This is despite the fact that the logging class in question actually has methods to pretty print this information but only invokes these at the end of a job.

This PR improves the logging to always use the pretty printing methods, additionally modifying them to include further useful information provided by the K8S API.

A similar issue also exists when tasks are lost that will be addressed by further commits to this PR

Improved LoggingPodStatusWatcher
Improved container status on task failure

How was this patch tested?

Built and launched jobs with the updated Spark client and observed the new human readable output:

Suggested reviewers: @liyinan926 @mccheah

Actually log human readable container status information rather than dumping the raw status object returned by the K8S API

Moves the methods for logging pod and container statuses into the KubernetesUtils class so they can be reused elsewhere

Modifies ExecutorPodsLifecycleManager so that it uses the container status formatting methods as part of its output on task failure. It also avoids outputting null reasons and messages.

liyinan926 · 2018-08-24T18:16:28Z

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

+          .mkString(", ")),
+      ("phase", pod.getStatus.getPhase),
+      ("status", pod.getStatus.getContainerStatuses.asScala.map { status =>
+        Seq(


Looks like you can use containersDescription here.

liyinan926 · 2018-08-24T18:17:28Z

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

+  def containersDescription(p: Pod): String = {
+    p.getStatus.getContainerStatuses.asScala.map { status =>
+      Seq(
+        ("Container name", status.getName),


Should use all lowercase for consistency with other rows.

liyinan926 · 2018-08-24T18:18:19Z

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

+      .map {
+        case running: ContainerStateRunning =>
+          Seq(
+            ("Container state", "Running"),


Ditto. Please use all lowercase.

nrchakradhar · 2018-08-25T09:20:47Z

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

+  }
+
+  /**
+    * Given a pod output a human readable representation of its state


nit. Adding a "," after pod make it better reading. Given a pod, output ...

- Address PR comments about consistent formatting and method resuse - Update tests to check for new improved log format

rvesse · 2018-08-28T10:23:26Z

@liyinan926 @nrchakradhar Addressed all your comments, thanks for the reviews.

Is someone able to kick off the Jenkins testing on this PR?

liyinan926 · 2018-08-30T17:41:59Z

@mccheah can you give ok to test to this one and help merge it?

SparkQA · 2018-08-30T17:59:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2703/

SparkQA · 2018-08-30T18:10:03Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2703/

mccheah · 2018-08-31T22:52:47Z

ok to test

mccheah

Definitely looks a lot prettier - some small comments but otherwise looks good.

mccheah · 2018-08-31T22:49:41Z

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

+
+  def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : String = {
+    // Use more loggable format if value is null or empty
+    val indentStr = "\t" * indent


Can we prefer space-based indentation? Curious as to whether others have an opinion about this.

I just preserved the original codes choice here, I would happily change to spaces if preferred

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

...ore/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala

SparkQA · 2018-08-31T23:07:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2753/

SparkQA · 2018-08-31T23:10:23Z

Test build #95568 has finished for PR 22215 at commit 6f6442f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-31T23:15:05Z

Test build #95569 has finished for PR 22215 at commit 6f6442f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-31T23:15:42Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2753/

When the API supplies no reason/message use N/A instead of the empty string

rvesse · 2018-09-03T08:41:39Z

@mccheah Thanks for the review, have made the change you suggested to use N/A instead of empty string.

I have left indentation as tabs for now, as I said in a previous comment this was just what the existing code used and I am happy to change it if others also want the change to spaces made

SparkQA · 2018-09-03T08:52:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2789/

SparkQA · 2018-09-03T08:59:51Z

Test build #95616 has finished for PR 22215 at commit 4c39a81.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-03T09:02:59Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2789/

rvesse · 2018-09-06T13:33:44Z

Think this is pretty much ready to merge, can folks take another look when they get chance

mccheah · 2018-09-06T23:14:19Z

Yup this can merge now, thanks!

rvesse added 3 commits August 24, 2018 11:03

[SPARK-25222][K8S] Improve container status logging

ebcbf05

Actually log human readable container status information rather than dumping the raw status object returned by the K8S API

[SPARK-25222][K8S] Move pod & container logging to utils

842a0b3

Moves the methods for logging pod and container statuses into the KubernetesUtils class so they can be reused elsewhere

[SPARK-25222][K8S] Human readable container status on task failure

dcb5b00

Modifies ExecutorPodsLifecycleManager so that it uses the container status formatting methods as part of its output on task failure. It also avoids outputting null reasons and messages.

rvesse changed the title ~~[SPARK-25222][K8S][WIP] Improve container status logging~~ [SPARK-25222][K8S] Improve container status logging Aug 24, 2018

liyinan926 reviewed Aug 24, 2018

View reviewed changes

nrchakradhar reviewed Aug 25, 2018

View reviewed changes

[SPARK-25222][K8S] Address comments and update tests

355e66d

- Address PR comments about consistent formatting and method resuse - Update tests to check for new improved log format

liyinan926 approved these changes Aug 28, 2018

View reviewed changes

[SPARK-25222][K8S] Fix scalastyle for utils methods

6f6442f

mccheah suggested changes Aug 31, 2018

View reviewed changes

[SPARK-25222][K8S] Use N/A inside of empty

4c39a81

When the API supplies no reason/message use N/A instead of the empty string

asfgit closed this in 27d3b0a Sep 6, 2018

rvesse deleted the SPARK-25222 branch October 31, 2018 11:16

[SPARK-25222][K8S] Improve container status logging #22215

[SPARK-25222][K8S] Improve container status logging #22215

Uh oh!

Conversation

rvesse commented Aug 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

liyinan926 Aug 24, 2018

Choose a reason for hiding this comment

Uh oh!

liyinan926 Aug 24, 2018

Choose a reason for hiding this comment

Uh oh!

liyinan926 Aug 24, 2018

Choose a reason for hiding this comment

Uh oh!

nrchakradhar Aug 25, 2018

Choose a reason for hiding this comment

Uh oh!

rvesse commented Aug 28, 2018

Uh oh!

liyinan926 commented Aug 30, 2018

Uh oh!

SparkQA commented Aug 30, 2018

Uh oh!

SparkQA commented Aug 30, 2018

Uh oh!

mccheah commented Aug 31, 2018

Uh oh!

mccheah left a comment

Choose a reason for hiding this comment

Uh oh!

mccheah Aug 31, 2018

Choose a reason for hiding this comment

Uh oh!

rvesse Sep 3, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Aug 31, 2018

Uh oh!

SparkQA commented Aug 31, 2018

Uh oh!

SparkQA commented Aug 31, 2018

Uh oh!

SparkQA commented Aug 31, 2018

Uh oh!

rvesse commented Sep 3, 2018

Uh oh!

SparkQA commented Sep 3, 2018

Uh oh!

SparkQA commented Sep 3, 2018

Uh oh!

SparkQA commented Sep 3, 2018

Uh oh!

rvesse commented Sep 6, 2018

Uh oh!

mccheah commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rvesse commented Aug 24, 2018 •

edited

Loading