[SPARK-26790][CORE] Change approach for retrieving executor logs and attributes: self-retrieve #23706

HeartSaVioR · 2019-01-30T23:26:23Z

What changes were proposed in this pull request?

This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor:

AS-IS: extract information from Container API and include them to container launch context
TO-BE: let YARN executor self-extracting information

This approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly.

How was this patch tested?

Existing unit tests.

…attributes: self-retrieve

HeartSaVioR · 2019-01-30T23:28:49Z

core/src/main/scala/org/apache/spark/executor/BaseCoarseGrainedExecutorBackend.scala

+import org.apache.spark.serializer.SerializerInstance
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+private[spark] abstract class BaseCoarseGrainedExecutorBackend(


This is completely same as original CoarseGrainedExecutorBackend (so please consider it as renaming) but added abstract and let derived classes implement extractLogUrls and extractAttributes.

I'd like to make clear abstract class has its prefix to determine whether it is abstract class, but I'm open to other option like keeping this as CoarseGrainedExecutorBackend and rename new CoarseGrainedExecutorBackend as DefaultCoarseGrainedExecutorBackend.

Why touch CoarseGrainedExecutorBackend at all? Can't you keep it as is, and override what you need in the YARN version?

Nope, just wanted to guarantee executor log url and attributes are overridable. Looks like we would want to have minimized diff, then I'll just let YARN executor backend override them.

HeartSaVioR · 2019-01-30T23:34:54Z

resource-managers/yarn/src/main/scala/org/apache/spark/util/YarnContainerInfoHelper.scala

    if (yarnHttpPolicy == "HTTPS_ONLY") "https://" else "http://"
  }

  def getNodeManagerHttpAddress(container: Option[Container]): String = container match {


Same applies to all methods: now we don't have an actual usage to pass Container as parameter, but it's still safer given that it's independent on context of process. Would we want to leave this as it is, or remove this and add a note to class (or methods) javadoc?

HeartSaVioR · 2019-01-30T23:37:36Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {
+
+  def main(args: Array[String]) {
+    var driverUrl: String = null


I'm not sure we also want to refactor this, as we might need to have redundant code again when one of case needs to have additional arguments. But I can also refactor if we think we can do it later when really needed.

This is the same as the core argument parser and should remain the same.

If and when a new argument needs to be added just for the YARN side, then you can think about refactoring this.

HeartSaVioR · 2019-01-30T23:38:01Z

cc. @vanzin

SparkQA · 2019-01-31T04:01:04Z

Test build #101919 has finished for PR 23706 at commit a9579b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

pgandhi999 · 2019-02-01T15:34:11Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RetrieveSparkAppConfig, SparkAppConfig}
+import org.apache.spark.util.{Utils, YarnContainerInfoHelper}
+
+private[spark] class YarnCoarseGrainedExecutorBackend(


Can we have some comments here as to what does this class do?

Sorry missed this. Just addressed.

pgandhi999 · 2019-02-01T15:46:59Z

As far as I understand this PR, you are allowing each resource manager to override the BaseCoarseGrainedExecutorBackend.scala class and provide their own implementation. Shouldn't this PR have a broader scope in that case and a different title? Like [SPARK-26790][CORE] - Create a low-level executor backend interface for plugging in different executor backends. Just thinking out loud.

HeartSaVioR · 2019-02-01T22:18:24Z

@pgandhi999
Your understanding is right, but It has been possible for other resource managers to provide their own implementation, like what I did for YarnCoarseGrainedExecutorBackend. While I created base class for CoarseGrainedExecutorBackend, it only opens derived classes to override executor log url and attributes which will be effectively different on YARN and others.

So the intention of PR is honestly not target to broader scope. I just want to ensure executor log url and attributes are overridable.

vanzin · 2019-02-12T18:04:09Z

core/src/main/scala/org/apache/spark/executor/BaseCoarseGrainedExecutorBackend.scala

+import org.apache.spark.serializer.SerializerInstance
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+private[spark] abstract class BaseCoarseGrainedExecutorBackend(


Why touch CoarseGrainedExecutorBackend at all? Can't you keep it as is, and override what you need in the YARN version?

vanzin · 2019-02-12T18:06:06Z

core/src/main/scala/org/apache/spark/executor/BaseCoarseGrainedExecutorBackend.scala

+}
+
+object BaseCoarseGrainedExecutorBackend {
+  private[spark] def run(


Do you actually need different command line parsing for the YARN version? Up to now they've been the same, so it seems to me they should remain the same.

So if instead of this, you add main(args, backendCreateFn) to CoarseGrainedExecutorBackend, you could share more code.

HeartSaVioR · 2019-02-12T23:44:58Z

Just addressed review comments regarding making changeset small.

HeartSaVioR · 2019-02-12T23:47:19Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

-      |   --worker-url <workerUrl>
-      |   --user-class-path <url>
-      |""".stripMargin)
+        |Usage: CoarseGrainedExecutorBackend [options]


IntelliJ automatically corrects these indentations: I guess we could push along with the change, but please let me know if we would not want to fix this in here.

Spark uses the previous indentation style in many places, so I guess that answers your question. I wouldn't exactly call this "correcting the indentation"...

Yeah, I wouldn't mind reverting this. Will revert.

Btw, I have a feeling that some style rule is conflicted with auto-correction (in point of IDE's view) of IDE and fixing it back is another kind of annoying. Does we have any recommend IDE configuration for avoiding this? Or does we have to deal with this manually?

I don't use an IDE, can't help you there.

SparkQA · 2019-02-13T04:19:29Z

Test build #102264 has finished for PR 23706 at commit 6a3ef10.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-13T05:02:39Z

Test build #102265 has finished for PR 23706 at commit 23d5432.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2019-02-13T18:30:48Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+import org.apache.spark.deploy.worker.WorkerWatcher
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.RpcEnv
+import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RetrieveSparkAppConfig, SparkAppConfig}


Line too long, just use a wildcard.

Looks like there're unused imports including this line. Will sort them out.

vanzin · 2019-02-13T18:31:51Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+ * properties are available for container being set via YARN.
+ */
+private[spark] class YarnCoarseGrainedExecutorBackend(
+    rpcEnv: RpcEnv,


nit: indent args and extends one more level.

I'm sorry I'm not sure I understand correctly. All args are 4 spaces and extends line is 2 spaces which doesn't seem to violate the style. Could you please guide your suggestion via actual code change?

Hmm, I swear I saw 2 spaces here when I looked at this code before. Ignore me.

vanzin · 2019-02-13T18:32:43Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+
+private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {
+
+  def main(args: Array[String]) {


vanzin · 2019-02-13T18:33:47Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {
+
+  def main(args: Array[String]) {
+    var driverUrl: String = null


This is the same as the core argument parser and should remain the same.

If and when a new argument needs to be added just for the YARN side, then you can think about refactoring this.

vanzin

Minor things only. The list of arguments repeated in a bunch of places is a little noisy, but well, it's not that bad.

vanzin · 2019-02-14T00:09:47Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

      // This is a very fast action so we can use "ThreadUtils.sameThread"
      case Success(msg) =>
-        // Always receive `true`. Just ignore it
+      // Always receive `true`. Just ignore it


vanzin · 2019-02-14T00:15:23Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

+      cores: Int,
+      appId: String,
+      workerUrl: Option[String],
+      userClassPath: Seq[URL],


This is actually not used now. I'm almost suggesting you should just have CoarseGrainedExecutorBackendArguments as an argument here...

Ah yes nice suggestion. I missed it. Will address.

SparkQA · 2019-02-14T04:29:46Z

Test build #102314 has finished for PR 23706 at commit cf44bb4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CoarseGrainedExecutorBackendArguments(

SparkQA · 2019-02-14T05:04:27Z

Test build #102322 has finished for PR 23706 at commit 6970d50.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-02-14T05:08:44Z

retest this, please

SparkQA · 2019-02-14T08:05:02Z

Test build #102333 has finished for PR 23706 at commit 6970d50.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2019-02-14T08:09:27Z

retest this please

SparkQA · 2019-02-14T13:09:09Z

Test build #102337 has finished for PR 23706 at commit 6970d50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin

Very minor issue otherwise looks good.

vanzin · 2019-02-14T16:42:45Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

          System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
          // scalastyle:on println
-          printUsageAndExit()
+          printUsageAndExit(classNameForEntry)


You want classNameForEntry.stripSuffix("$"), either here or in the caller, or the help message will be wrong.

Nice finding! Maybe I'm still thinking many things in Java way. Will address.

vanzin · 2019-02-14T16:43:41Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala


-  def main(args: Array[String]) {
+  def parseArguments(args: Array[String], classNameForEntry: String)
+    : CoarseGrainedExecutorBackendArguments = {


nit: indent more

(or, given CoarseGrainedExecutorBackendArguments is a nested class, you could avoid the redundancy and just call it Arguments.)

Actually I'm not 100% sure it breaks style (I assume we have two spaces for return) and now being confused, but your suggestion on shorten class name sounds nice and better! Will address.

Don't think of it as "2 spaces here, 4 spaces there". Think that method declarations should stand about from the method body. And if you indent both the same, that does not happen.

HeartSaVioR · 2019-02-14T21:40:55Z

...anagers/yarn/src/main/scala/org/apache/spark/executor/YarnCoarseGrainedExecutorBackend.scala

+private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {
+
+  def main(args: Array[String]): Unit = {
+    val createFn: (RpcEnv, CoarseGrainedExecutorBackend.Arguments, SparkEnv) =>


Here I'm using CoarseGrainedExecutorBackend.Arguments instead of Arguments for clarity. Please let me know once we would want to use Arguments directly.

SparkQA · 2019-02-15T02:43:58Z

Test build #102368 has finished for PR 23706 at commit 7992091.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class Arguments(

vanzin · 2019-02-15T20:43:54Z

Merging to master.

HeartSaVioR · 2019-02-16T05:28:00Z

Thanks all for reviewing and merging!

…attributes: self-retrieve ## What changes were proposed in this pull request? This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor: - AS-IS: extract information from `Container` API and include them to container launch context - TO-BE: let YARN executor self-extracting information This approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly. ## How was this patch tested? Existing unit tests. Closes apache#23706 from HeartSaVioR/SPARK-26790. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>

[SPARK-26790][CORE] Change approach for retrieving executor logs and …

a9579b5

…attributes: self-retrieve

HeartSaVioR commented Jan 30, 2019

View reviewed changes

pgandhi999 reviewed Feb 1, 2019

View reviewed changes

HeartSaVioR mentioned this pull request Feb 7, 2019

[SPARK-26311][CORE] New feature: apply custom log URL pattern for executor log URLs in SHS #23260

Closed

vanzin reviewed Feb 12, 2019

View reviewed changes

Reduce code diff to roll back extracting new abstract class

6a3ef10

HeartSaVioR commented Feb 12, 2019

View reviewed changes

Add javadoc to YarnCoarseGrainedExecutorBackend

23d5432

vanzin reviewed Feb 13, 2019

View reviewed changes

Address indentation, reduce code duplication

cf44bb4

vanzin reviewed Feb 14, 2019

View reviewed changes

Undo indentation, simplify the code

6970d50

vanzin reviewed Feb 14, 2019

View reviewed changes

Reduce length of class name and fix indentation based on it

7992091

HeartSaVioR commented Feb 14, 2019

View reviewed changes

vanzin closed this in b6c6875 Feb 15, 2019

HeartSaVioR deleted the SPARK-26790 branch February 16, 2019 05:27

xkrogen mentioned this pull request Jun 7, 2021

[SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line. #32810

Closed


		private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {

		def main(args: Array[String]) {

[SPARK-26790][CORE] Change approach for retrieving executor logs and attributes: self-retrieve #23706

[SPARK-26790][CORE] Change approach for retrieving executor logs and attributes: self-retrieve #23706

Uh oh!

Conversation

HeartSaVioR commented Jan 30, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jan 30, 2019

Uh oh!

SparkQA commented Jan 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgandhi999 commented Feb 1, 2019

Uh oh!

HeartSaVioR commented Feb 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Feb 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 13, 2019

Uh oh!

SparkQA commented Feb 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 14, 2019

Uh oh!

SparkQA commented Feb 14, 2019

Uh oh!

HeartSaVioR commented Feb 14, 2019

HeartSaVioR Feb 13, 2019 •

edited

Loading

HeartSaVioR Feb 14, 2019 •

edited

Loading