Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor:

  • AS-IS: extract information from Container API and include them to container launch context
  • TO-BE: let YARN executor self-extracting information

This approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly.

How was this patch tested?

Existing unit tests.

import org.apache.spark.serializer.SerializerInstance
import org.apache.spark.util.{ThreadUtils, Utils}

private[spark] abstract class BaseCoarseGrainedExecutorBackend(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely same as original CoarseGrainedExecutorBackend (so please consider it as renaming) but added abstract and let derived classes implement extractLogUrls and extractAttributes.

I'd like to make clear abstract class has its prefix to determine whether it is abstract class, but I'm open to other option like keeping this as CoarseGrainedExecutorBackend and rename new CoarseGrainedExecutorBackend as DefaultCoarseGrainedExecutorBackend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why touch CoarseGrainedExecutorBackend at all? Can't you keep it as is, and override what you need in the YARN version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, just wanted to guarantee executor log url and attributes are overridable. Looks like we would want to have minimized diff, then I'll just let YARN executor backend override them.

if (yarnHttpPolicy == "HTTPS_ONLY") "https://" else "http://"
}

def getNodeManagerHttpAddress(container: Option[Container]): String = container match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies to all methods: now we don't have an actual usage to pass Container as parameter, but it's still safer given that it's independent on context of process. Would we want to leave this as it is, or remove this and add a note to class (or methods) javadoc?

private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {

def main(args: Array[String]) {
var driverUrl: String = null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we also want to refactor this, as we might need to have redundant code again when one of case needs to have additional arguments. But I can also refactor if we think we can do it later when really needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the core argument parser and should remain the same.

If and when a new argument needs to be added just for the YARN side, then you can think about refactoring this.

@HeartSaVioR
Copy link
Contributor Author

cc. @vanzin

@SparkQA
Copy link

SparkQA commented Jan 31, 2019

Test build #101919 has finished for PR 23706 at commit a9579b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RetrieveSparkAppConfig, SparkAppConfig}
import org.apache.spark.util.{Utils, YarnContainerInfoHelper}

private[spark] class YarnCoarseGrainedExecutorBackend(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some comments here as to what does this class do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed this. Just addressed.

@pgandhi999
Copy link

As far as I understand this PR, you are allowing each resource manager to override the BaseCoarseGrainedExecutorBackend.scala class and provide their own implementation. Shouldn't this PR have a broader scope in that case and a different title? Like [SPARK-26790][CORE] - Create a low-level executor backend interface for plugging in different executor backends. Just thinking out loud.

@HeartSaVioR
Copy link
Contributor Author

@pgandhi999
Your understanding is right, but It has been possible for other resource managers to provide their own implementation, like what I did for YarnCoarseGrainedExecutorBackend. While I created base class for CoarseGrainedExecutorBackend, it only opens derived classes to override executor log url and attributes which will be effectively different on YARN and others.

So the intention of PR is honestly not target to broader scope. I just want to ensure executor log url and attributes are overridable.

import org.apache.spark.serializer.SerializerInstance
import org.apache.spark.util.{ThreadUtils, Utils}

private[spark] abstract class BaseCoarseGrainedExecutorBackend(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why touch CoarseGrainedExecutorBackend at all? Can't you keep it as is, and override what you need in the YARN version?

}

object BaseCoarseGrainedExecutorBackend {
private[spark] def run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need different command line parsing for the YARN version? Up to now they've been the same, so it seems to me they should remain the same.

So if instead of this, you add main(args, backendCreateFn) to CoarseGrainedExecutorBackend, you could share more code.

@HeartSaVioR
Copy link
Contributor Author

Just addressed review comments regarding making changeset small.

| --worker-url <workerUrl>
| --user-class-path <url>
|""".stripMargin)
|Usage: CoarseGrainedExecutorBackend [options]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntelliJ automatically corrects these indentations: I guess we could push along with the change, but please let me know if we would not want to fix this in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark uses the previous indentation style in many places, so I guess that answers your question. I wouldn't exactly call this "correcting the indentation"...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't mind reverting this. Will revert.

Btw, I have a feeling that some style rule is conflicted with auto-correction (in point of IDE's view) of IDE and fixing it back is another kind of annoying. Does we have any recommend IDE configuration for avoiding this? Or does we have to deal with this manually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use an IDE, can't help you there.

@SparkQA
Copy link

SparkQA commented Feb 13, 2019

Test build #102264 has finished for PR 23706 at commit 6a3ef10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2019

Test build #102265 has finished for PR 23706 at commit 23d5432.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.deploy.worker.WorkerWatcher
import org.apache.spark.internal.Logging
import org.apache.spark.rpc.RpcEnv
import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RetrieveSparkAppConfig, SparkAppConfig}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long, just use a wildcard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there're unused imports including this line. Will sort them out.

* properties are available for container being set via YARN.
*/
private[spark] class YarnCoarseGrainedExecutorBackend(
rpcEnv: RpcEnv,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent args and extends one more level.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I'm not sure I understand correctly. All args are 4 spaces and extends line is 2 spaces which doesn't seem to violate the style. Could you please guide your suggestion via actual code change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I swear I saw 2 spaces here when I looked at this code before. Ignore me.


private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {

def main(args: Array[String]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: Unit =

private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {

def main(args: Array[String]) {
var driverUrl: String = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the core argument parser and should remain the same.

If and when a new argument needs to be added just for the YARN side, then you can think about refactoring this.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things only. The list of arguments repeated in a bunch of places is a little noisy, but well, it's not that bad.

// This is a very fast action so we can use "ThreadUtils.sameThread"
case Success(msg) =>
// Always receive `true`. Just ignore it
// Always receive `true`. Just ignore it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo.

cores: Int,
appId: String,
workerUrl: Option[String],
userClassPath: Seq[URL],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not used now. I'm almost suggesting you should just have CoarseGrainedExecutorBackendArguments as an argument here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes nice suggestion. I missed it. Will address.

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102314 has finished for PR 23706 at commit cf44bb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CoarseGrainedExecutorBackendArguments(

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102322 has finished for PR 23706 at commit 6970d50.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102333 has finished for PR 23706 at commit 6970d50.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102337 has finished for PR 23706 at commit 6970d50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor issue otherwise looks good.

System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
// scalastyle:on println
printUsageAndExit()
printUsageAndExit(classNameForEntry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want classNameForEntry.stripSuffix("$"), either here or in the caller, or the help message will be wrong.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! Maybe I'm still thinking many things in Java way. Will address.


def main(args: Array[String]) {
def parseArguments(args: Array[String], classNameForEntry: String)
: CoarseGrainedExecutorBackendArguments = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent more

(or, given CoarseGrainedExecutorBackendArguments is a nested class, you could avoid the redundancy and just call it Arguments.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not 100% sure it breaks style (I assume we have two spaces for return) and now being confused, but your suggestion on shorten class name sounds nice and better! Will address.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think of it as "2 spaces here, 4 spaces there". Think that method declarations should stand about from the method body. And if you indent both the same, that does not happen.

private[spark] object YarnCoarseGrainedExecutorBackend extends Logging {

def main(args: Array[String]): Unit = {
val createFn: (RpcEnv, CoarseGrainedExecutorBackend.Arguments, SparkEnv) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm using CoarseGrainedExecutorBackend.Arguments instead of Arguments for clarity. Please let me know once we would want to use Arguments directly.

@SparkQA
Copy link

SparkQA commented Feb 15, 2019

Test build #102368 has finished for PR 23706 at commit 7992091.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Arguments(

@vanzin
Copy link
Contributor

vanzin commented Feb 15, 2019

Merging to master.

@vanzin vanzin closed this in b6c6875 Feb 15, 2019
@HeartSaVioR HeartSaVioR deleted the SPARK-26790 branch February 16, 2019 05:27
@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…attributes: self-retrieve

## What changes were proposed in this pull request?

This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor:

 - AS-IS: extract information from `Container` API and include them to container launch context
- TO-BE: let YARN executor self-extracting information

This approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly.

## How was this patch tested?

Existing unit tests.

Closes apache#23706 from HeartSaVioR/SPARK-26790.

Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants