-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26790][CORE] Change approach for retrieving executor logs and attributes: self-retrieve #23706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…attributes: self-retrieve
| import org.apache.spark.serializer.SerializerInstance | ||
| import org.apache.spark.util.{ThreadUtils, Utils} | ||
|
|
||
| private[spark] abstract class BaseCoarseGrainedExecutorBackend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely same as original CoarseGrainedExecutorBackend (so please consider it as renaming) but added abstract and let derived classes implement extractLogUrls and extractAttributes.
I'd like to make clear abstract class has its prefix to determine whether it is abstract class, but I'm open to other option like keeping this as CoarseGrainedExecutorBackend and rename new CoarseGrainedExecutorBackend as DefaultCoarseGrainedExecutorBackend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why touch CoarseGrainedExecutorBackend at all? Can't you keep it as is, and override what you need in the YARN version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, just wanted to guarantee executor log url and attributes are overridable. Looks like we would want to have minimized diff, then I'll just let YARN executor backend override them.
| if (yarnHttpPolicy == "HTTPS_ONLY") "https://" else "http://" | ||
| } | ||
|
|
||
| def getNodeManagerHttpAddress(container: Option[Container]): String = container match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies to all methods: now we don't have an actual usage to pass Container as parameter, but it's still safer given that it's independent on context of process. Would we want to leave this as it is, or remove this and add a note to class (or methods) javadoc?
| private[spark] object YarnCoarseGrainedExecutorBackend extends Logging { | ||
|
|
||
| def main(args: Array[String]) { | ||
| var driverUrl: String = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we also want to refactor this, as we might need to have redundant code again when one of case needs to have additional arguments. But I can also refactor if we think we can do it later when really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as the core argument parser and should remain the same.
If and when a new argument needs to be added just for the YARN side, then you can think about refactoring this.
|
cc. @vanzin |
|
Test build #101919 has finished for PR 23706 at commit
|
| import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RetrieveSparkAppConfig, SparkAppConfig} | ||
| import org.apache.spark.util.{Utils, YarnContainerInfoHelper} | ||
|
|
||
| private[spark] class YarnCoarseGrainedExecutorBackend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some comments here as to what does this class do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry missed this. Just addressed.
|
As far as I understand this PR, you are allowing each resource manager to override the |
|
@pgandhi999 So the intention of PR is honestly not target to broader scope. I just want to ensure |
| import org.apache.spark.serializer.SerializerInstance | ||
| import org.apache.spark.util.{ThreadUtils, Utils} | ||
|
|
||
| private[spark] abstract class BaseCoarseGrainedExecutorBackend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why touch CoarseGrainedExecutorBackend at all? Can't you keep it as is, and override what you need in the YARN version?
| } | ||
|
|
||
| object BaseCoarseGrainedExecutorBackend { | ||
| private[spark] def run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need different command line parsing for the YARN version? Up to now they've been the same, so it seems to me they should remain the same.
So if instead of this, you add main(args, backendCreateFn) to CoarseGrainedExecutorBackend, you could share more code.
|
Just addressed review comments regarding making changeset small. |
| | --worker-url <workerUrl> | ||
| | --user-class-path <url> | ||
| |""".stripMargin) | ||
| |Usage: CoarseGrainedExecutorBackend [options] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntelliJ automatically corrects these indentations: I guess we could push along with the change, but please let me know if we would not want to fix this in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark uses the previous indentation style in many places, so I guess that answers your question. I wouldn't exactly call this "correcting the indentation"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wouldn't mind reverting this. Will revert.
Btw, I have a feeling that some style rule is conflicted with auto-correction (in point of IDE's view) of IDE and fixing it back is another kind of annoying. Does we have any recommend IDE configuration for avoiding this? Or does we have to deal with this manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't use an IDE, can't help you there.
|
Test build #102264 has finished for PR 23706 at commit
|
|
Test build #102265 has finished for PR 23706 at commit
|
| import org.apache.spark.deploy.worker.WorkerWatcher | ||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.rpc.RpcEnv | ||
| import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RetrieveSparkAppConfig, SparkAppConfig} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line too long, just use a wildcard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there're unused imports including this line. Will sort them out.
| * properties are available for container being set via YARN. | ||
| */ | ||
| private[spark] class YarnCoarseGrainedExecutorBackend( | ||
| rpcEnv: RpcEnv, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent args and extends one more level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I'm not sure I understand correctly. All args are 4 spaces and extends line is 2 spaces which doesn't seem to violate the style. Could you please guide your suggestion via actual code change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I swear I saw 2 spaces here when I looked at this code before. Ignore me.
|
|
||
| private[spark] object YarnCoarseGrainedExecutorBackend extends Logging { | ||
|
|
||
| def main(args: Array[String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: Unit =
| private[spark] object YarnCoarseGrainedExecutorBackend extends Logging { | ||
|
|
||
| def main(args: Array[String]) { | ||
| var driverUrl: String = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as the core argument parser and should remain the same.
If and when a new argument needs to be added just for the YARN side, then you can think about refactoring this.
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor things only. The list of arguments repeated in a bunch of places is a little noisy, but well, it's not that bad.
| // This is a very fast action so we can use "ThreadUtils.sameThread" | ||
| case Success(msg) => | ||
| // Always receive `true`. Just ignore it | ||
| // Always receive `true`. Just ignore it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo.
| cores: Int, | ||
| appId: String, | ||
| workerUrl: Option[String], | ||
| userClassPath: Seq[URL], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not used now. I'm almost suggesting you should just have CoarseGrainedExecutorBackendArguments as an argument here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes nice suggestion. I missed it. Will address.
|
Test build #102314 has finished for PR 23706 at commit
|
|
Test build #102322 has finished for PR 23706 at commit
|
|
retest this, please |
|
Test build #102333 has finished for PR 23706 at commit
|
|
retest this please |
|
Test build #102337 has finished for PR 23706 at commit
|
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor issue otherwise looks good.
| System.err.println(s"Unrecognized options: ${tail.mkString(" ")}") | ||
| // scalastyle:on println | ||
| printUsageAndExit() | ||
| printUsageAndExit(classNameForEntry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want classNameForEntry.stripSuffix("$"), either here or in the caller, or the help message will be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding! Maybe I'm still thinking many things in Java way. Will address.
|
|
||
| def main(args: Array[String]) { | ||
| def parseArguments(args: Array[String], classNameForEntry: String) | ||
| : CoarseGrainedExecutorBackendArguments = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent more
(or, given CoarseGrainedExecutorBackendArguments is a nested class, you could avoid the redundancy and just call it Arguments.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not 100% sure it breaks style (I assume we have two spaces for return) and now being confused, but your suggestion on shorten class name sounds nice and better! Will address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think of it as "2 spaces here, 4 spaces there". Think that method declarations should stand about from the method body. And if you indent both the same, that does not happen.
| private[spark] object YarnCoarseGrainedExecutorBackend extends Logging { | ||
|
|
||
| def main(args: Array[String]): Unit = { | ||
| val createFn: (RpcEnv, CoarseGrainedExecutorBackend.Arguments, SparkEnv) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm using CoarseGrainedExecutorBackend.Arguments instead of Arguments for clarity. Please let me know once we would want to use Arguments directly.
|
Test build #102368 has finished for PR 23706 at commit
|
|
Merging to master. |
|
Thanks all for reviewing and merging! |
…attributes: self-retrieve ## What changes were proposed in this pull request? This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor: - AS-IS: extract information from `Container` API and include them to container launch context - TO-BE: let YARN executor self-extracting information This approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly. ## How was this patch tested? Existing unit tests. Closes apache#23706 from HeartSaVioR/SPARK-26790. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor:
ContainerAPI and include them to container launch contextThis approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly.
How was this patch tested?
Existing unit tests.