-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26843][MESOS] Use ConfigEntry for hardcoded configs for "mesos" resource manager #23743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #102076 has finished for PR 23743 at commit
|
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While #23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when #23743 is not applied and succeeds when #23743 is applied. Closes #23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit b4e1d14) Signed-off-by: Dongjoon Hyun <[email protected]>
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While #23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when #23743 is not applied and succeeds when #23743 is applied. Closes #23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit b4e1d14) Signed-off-by: Dongjoon Hyun <[email protected]>
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While #23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when #23743 is not applied and succeeds when #23743 is applied. Closes #23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HeartSaVioR .
Why do we change this function signature silently in this Use ConfigEntry ... PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used for parsing config value, and we are changing the type of config to Seq[String] instead of comma-separated String cause ConfigEntry will handle it better. Same applies for below methods as well.
Similar change was suggested while working on ConfigEntry for others (don't remember which one was...).
I'm not sure changing them would hurt because the object is mesos package private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HeartSaVioR . You should mention that in the PR description at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not sure we're requiring to mention every incompatible changes for non-public API, but I'm not strongly against it. I'll add it to the PR description.
Please consider that dealing with arbitrary reviewers who have different views on reviewing is not easy. Similar change was done in other PR without any mention and comment. I'd be happy if we have some rules/policies which are across the community in project, and if it doesn't fall into the case, ideally I'd like to see reviewers just kindly and politely suggest/ask it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was about the mismatch between PR title and content itself. If you feel like that, sorry about that. I must be careful. Maybe, silently was too strong word?
Why do we change this function signature silently in this
Use ConfigEntry ...PR?
BTW, why do you think my comment is about incompatible changes?
I'm not sure we're requiring to mention every incompatible changes for non-public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was about the mismatch between PR title and content itself.
I didn't know the reason of the question, and why without reason makes me feeling that I got non-positive reviews. I tried to find the reason behind the comment, and thought you were concerning about method signature change hence incompatible - turned out I was wrong. We would have better understanding of each other when we try to ensure explaining rationalization.
Btw, the rationalization of method signature change is follow: at the first time I thought I have to add new methods with different signatures (maybe overloaded) but another thought in my mind it would end up letting old methods being not used anymore, and reviewers would ask me to remove them. So flatten the steps into one would be changing signature anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the purpose of these small comments, which can vary across reviewers, is that it's a way for all of us to sync on how to handle the small things.
The change to the method seems fine, and isn't part of any API. I don't think it needs to be part of the title as it is in the description, even. Let's conclude there.
|
Could you handle the followings? MesosClusterScheduler.scala: v => s"$v -Dspark.mesos.driver.frameworkId=${getDriverFrameworkID(desc)}"
MesosClusterScheduler.scala: logWarning(s"Unknown spark.mesos.appJar.local.resolution.mode $other, using host.")
MesosSchedulerUtils.scala: sc.conf.remove("spark.mesos.driver.frameworkId")
MesosSchedulerUtils.scala: System.clearProperty("spark.mesos.driver.frameworkId") |
c06795c to
c2d0e04
Compare
|
Thanks @dongjoon-hyun for finding missing spots! I addressed review comment. |
|
Test build #102084 has finished for PR 23743 at commit
|
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| private[spark] val APPLICATION_JAR_LOCAL_RESOLUTION_MODE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Can we use a shorter name like APP_JAR_LOCAL_RESOLUTION_MODE like APP_STATUS_METRICS_ENABLED?
| } | ||
|
|
||
| private[spark] val CREDENTIAL_PRINCIPAL = | ||
| ConfigBuilder("spark.mesos.principal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we had better have .doc like the previous conf SECRET_FILENAMES at line 50 ~ 56?
|
|
||
| private[spark] val CONTAINERIZER = | ||
| ConfigBuilder("spark.mesos.containerizer") | ||
| .stringConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add .checkValues(Set("docker", "mesos")).
|
|
||
| private[spark] val ROLE = | ||
| ConfigBuilder("spark.mesos.role") | ||
| .stringConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add .checkValues(Set("host", "container")).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I have no idea about available values. Could you refer where I could find available values? I can't find it in Mesos doc - http://mesos.apache.org/documentation/latest/roles/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah looks like you're referring APP_JAR_LOCAL_RESOLUTION_MODE in below comment. I'll address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It seems that I clicked a wrong one.
| logWarning(s"Unknown spark.mesos.appJar.local.resolution.mode $other, using host.") | ||
| logWarning(s"Unknown ${config.APPLICATION_JAR_LOCAL_RESOLUTION_MODE} $other, using host.") | ||
| false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have checkValue, we can simplify the above logic line 422 ~ 426, too.
|
|
||
| private[spark] val COARSE_SHUTDOWN_TIMEOUT = | ||
| ConfigBuilder("spark.mesos.coarse.shutdownTimeout") | ||
| .timeConf(TimeUnit.MILLISECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add .checkValue(_ >= 0, "spark.mesos.coarse.shutdownTimeout must be >= 0") here.
| private[this] val shutdownTimeoutMS = | ||
| conf.getTimeAsMs("spark.mesos.coarse.shutdownTimeout", "10s") | ||
| .ensuring(_ >= 0, "spark.mesos.coarse.shutdownTimeout must be >= 0") | ||
| conf.get(COARSE_SHUTDOWN_TIMEOUT).ensuring(_ >= 0, s"$COARSE_SHUTDOWN_TIMEOUT must be >= 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have checkValue in config definition, we can remove this ensuring here.
| .stringConf | ||
| .createWithDefault("") | ||
|
|
||
| private[spark] val SLAVE_OFFER_CONSTRAINTS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this comes from the variable name slaveOfferConstraints. But, shall we simply use CONSTRAINTS because most Mesos offer configurations are actually about the slave?
| sparkConf.set("spark.driver.host", "driverHost") | ||
| sparkConf.set("spark.driver.port", "1234") | ||
| sparkConf.set(DRIVER_HOST_ADDRESS.key, "driverHost") | ||
| sparkConf.set(DRIVER_PORT.key, "1234") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- sparkConf.set(DRIVER_HOST_ADDRESS.key, "driverHost")
- sparkConf.set(DRIVER_PORT.key, "1234")
+ sparkConf.set(DRIVER_HOST_ADDRESS, "driverHost")
+ sparkConf.set(DRIVER_PORT, 1234)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments including .doc.
…sClusterScheduler This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied.
…sClusterScheduler This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied.
…sClusterScheduler This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied.
|
@clems4ever This PR is almost done. I'm waiting for a few minor |
|
Alright @dongjoon-hyun, I'll keep an eye on this PR. Thanks for letting me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG mostly, 2 questions
|
|
||
| test("Use configured mesosExecutor.cores for ExecutorInfo") { | ||
| val mesosExecutorCores = 3 | ||
| val mesosExecutorCores = 3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of EXECUTOR_CORES is actually double instead of int or long and if my memory is right compiler complained. If we don't want to touch this we may need to change below line to conf.set(EXECUTOR_CORES.key, mesosExecutorCores.toString)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of a number of cores is logically an integer and can be used where a double is expected. In a few cases a cast may be needed but doesn't require thinking of number of cores as a float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not a expert on this but the documentation of spark.mesos.mesosExecutor.cores clarify that it can receive floating point number so the type still makes sense.
https://github.com/apache/spark/blob/master/docs/running-on-mesos.md
<tr>
<td><code>spark.mesos.mesosExecutor.cores</code></td>
<td><code>1.0</code></td>
<td>
(Fine-grained mode only) Number of cores to give each Mesos executor. This does not
include the cores used to run the Spark tasks. In other words, even if no Spark task
is being run, each Mesos executor will occupy the number of cores configured here.
The value can be a floating point number.
</td>
</tr>
The default value is also explained as 1.0 instead of 1. I'll update the config entry to reflect the sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right mesos can specify fractional cores. OK this makes sense then.
| ).flatten | ||
| val confUris = (conf.get(config.URIS_TO_DOWNLOAD) ++ | ||
| desc.conf.get(config.URIS_TO_DOWNLOAD) ++ | ||
| desc.conf.get(SUBMIT_PYTHON_FILES)).toList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is exactly the same? _.split(",").map(_.trim) - before URIS_TO_DOWNLOAD should be a comma separated list (probably not intentional, but it works) and URIS_TO_DOWNLOAD & SUBMIT_PYTHON_FILES could have whitespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Element values will be trimmed as same as it was - ConfigBuilder handles it:
spark/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
Lines 123 to 126 in dc46fb7
| /** Turns the config entry into a sequence of values of the underlying type. */ | |
| def toSequence: TypedConfigBuilder[Seq[T]] = { | |
| new TypedConfigBuilder(parent, stringToSeq(_, converter), seqToString(_, stringConverter)) | |
| } |
spark/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
Lines 48 to 50 in dc46fb7
| def stringToSeq[T](str: String, converter: String => T): Seq[T] = { | |
| Utils.stringToSeq(str).map(converter) | |
| } |
spark/core/src/main/scala/org/apache/spark/util/Utils.scala
Lines 2619 to 2621 in dc46fb7
| def stringToSeq(str: String): Seq[String] = { | |
| str.split(",").map(_.trim()).filter(_.nonEmpty) | |
| } |
|
Documentation addressed. Please note that I only copied the contents from |
|
Test build #102162 has finished for PR 23743 at commit
|
|
Test build #102163 has finished for PR 23743 at commit
|
…tion on MesosClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While #23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when #23734 is not applied and succeeds when #23734 is applied. Closes #23753 from HeartSaVioR/SPARK-26082-branch-2.4. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…tion on MesosClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While #23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when #23734 is not applied and succeeds when #23734 is applied. Closes #23754 from HeartSaVioR/SPARK-26082-branch-2.3. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
| ConfigBuilder("spark.mesos.constraints") | ||
| .doc("Attribute-based constraints on mesos resource offers. By default, all resource " + | ||
| "offers will be accepted. This setting applies only to executors. Refer to Mesos " + | ||
| "Attributes & Resources doc for more information on attributes. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @HeartSaVioR .
Could you remove the followings line 338 ~ 347? Since it says Refer to Mesos ... already, it looks okay to remove the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agreed. Just addressed it.
|
Test build #102166 has finished for PR 23743 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
|
Thank you, @HeartSaVioR , @srowen , @felixcheung ! |
|
Thanks all for reviewing and merging! |
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied. Closes apache#23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…" resource manager ## What changes were proposed in this pull request? This patch makes hardcoded configs in "mesos" module to use ConfigEntry, avoiding issues on mistake like SPARK-26082. Please note that there're some changes on type while migrating to ConfigEntry: specifically "comma-separated list on a string" becomes "sequence of strings". While SparkConf smoothly handles on the change (comma-separated list on a string is still supported so backward compatible), there're some methods in utility class (`mesos` package private) to depend on the type change, so this patch also modifies the method signature for them a bit. ## How was this patch tested? Existing tests. Closes apache#23743 from HeartSaVioR/SPARK-26843. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied. Closes apache#23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit b4e1d14) Signed-off-by: Dongjoon Hyun <[email protected]>
…tion on MesosClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23734 is not applied and succeeds when apache#23734 is applied. Closes apache#23753 from HeartSaVioR/SPARK-26082-branch-2.4. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied. Closes apache#23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit b4e1d14) Signed-off-by: Dongjoon Hyun <[email protected]>
…tion on MesosClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23734 is not applied and succeeds when apache#23734 is applied. Closes apache#23753 from HeartSaVioR/SPARK-26082-branch-2.4. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…sClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23743 is not applied and succeeds when apache#23743 is applied. Closes apache#23744 from HeartSaVioR/SPARK-26082-add-unit-test. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit b4e1d14) Signed-off-by: Dongjoon Hyun <[email protected]>
…tion on MesosClusterScheduler ## What changes were proposed in this pull request? This patch adds UT on testing SPARK-26082 to avoid regression. While apache#23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them. ## How was this patch tested? Newly added UTs. Test "supports setting fetcher cache" fails when apache#23734 is not applied and succeeds when apache#23734 is applied. Closes apache#23753 from HeartSaVioR/SPARK-26082-branch-2.4. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This patch makes hardcoded configs in "mesos" module to use ConfigEntry, avoiding issues on mistake like SPARK-26082.
Please note that there're some changes on type while migrating to ConfigEntry: specifically "comma-separated list on a string" becomes "sequence of strings". While SparkConf smoothly handles on the change (comma-separated list on a string is still supported so backward compatible), there're some methods in utility class (
mesospackage private) to depend on the type change, so this patch also modifies the method signature for them a bit.How was this patch tested?
Existing tests.