-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics #22279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics #22279
Conversation
|
Hi @LucaCanali do you have an output current AM metrics? I would like to know what kind of metrics will be output for now. |
| sinks.foreach(_.start) | ||
| } | ||
|
|
||
| // Same as start but this method only registers sinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please explain why only registering sinks could solve the problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am trying to do is to avoid the registration of the "static metrics", for CodeGeneration and HiveExternalCatalog and also for JVM.
|
Hi @jerryshao you can find here below an example of metrics currently reported by applicationMaster, illustrating the issue reported here. You can find there the list of AM metrics reported (with the application ID as a prefix by default). In addition metrics for CodeGeneration and HiveExternalCatalog are also reported, these metrics do not make sense in this context, in addition they have no prefix. Metrics for JVM are reported too (without application_id prefix), which I am not sure it is wanted either. I have used InfluxDB to collect the metrics. This is the output of "show measurements" in InfluxDB: |
|
@jerryshao would you have any additional comments on this? |
|
@attilapiros would you be interested to review this as a follow-up of your work on [SPARK-24594][YARN] Introducing metrics for YARN ? |
| sinks.foreach(_.start) | ||
| } | ||
|
|
||
| // Same as start but this method only registers sinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it is a duplication of the start() method without the registerSources() but what about extending the start() method with a new boolean flag (like registerStaticSources:Boolean = true) and using this flag to decide registerSources() should be called or not. Then in AM you can call it with false. This way there is no code duplication and more clear what is the different between the two usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @attilapiros for looking at this. I agree with your proposal. I'll provide an update to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking a bit more about this PR and in case of client mode it could make sense to have some of this static metrics as the driver and AM are separated. What about (assuming the boolean flag is there) calling the start method with false only in case of cluster mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split the case in 2 topics:
- I belive that CodeGenerator and HiveExternalCatalog metrics don't make sense in the context of AM, so they can be safely removed.
- The JVM metrics may be relevant as you mentioned. Although in the current version I see the problem that the JVM metrics for the AM appear without any application id nor prefix, so they are difficult to process. I guess this part can be improved if we think JVM metrics for AM can be of interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just fully agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to close the loop here: I like the extra flag to start instead of the separate method.
As for AM vs. driver, the way I understand things, in cluster mode, there will be two MetricSystem instances - one for the AM and one for the driver, so you shouldn't lose any driver metrics.
JVM metrics for the client-mode AM can be interesting, but that's generally not a source of problems that I've noticed, so we can probably punt on it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented the extra flag to start and tested the patch on a YARN cluster using a graphite sink for the metrics.
|
ok to test |
|
Test build #99939 has finished for PR 22279 at commit
|
|
Thanks @vanzin for looking at this. |
|
Test build #100017 has finished for PR 22279 at commit
|
|
retest this please |
| running = true | ||
| StaticSources.allSources.foreach(registerSource) | ||
| registerSources() | ||
| if(registerStaticSources) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space before (
|
Test build #100038 has finished for PR 22279 at commit
|
|
Test build #100044 has finished for PR 22279 at commit
|
|
Giving up on flaky tests... interesting ones all passed. Merging to master. |
…r static metrics ## What changes were proposed in this pull request? YARN applicationMaster metrics registration introduced in SPARK-24594 causes further registration of static metrics (Codegenerator and HiveExternalCatalog) and of JVM metrics, which I believe do not belong in this context. This looks like an unintended side effect of using the start method of [[MetricsSystem]]. A possible solution proposed here, is to introduce startNoRegisterSources to avoid these additional registrations of static sources and of JVM sources in the case of YARN applicationMaster metrics (this could be useful for other metrics that may be added in the future). ## How was this patch tested? Manually tested on a YARN cluster, Closes apache#22279 from LucaCanali/YarnMetricsRemoveExtraSourceRegistration. Lead-authored-by: Luca Canali <[email protected]> Co-authored-by: LucaCanali <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
…r static metrics ## What changes were proposed in this pull request? YARN applicationMaster metrics registration introduced in SPARK-24594 causes further registration of static metrics (Codegenerator and HiveExternalCatalog) and of JVM metrics, which I believe do not belong in this context. This looks like an unintended side effect of using the start method of [[MetricsSystem]]. A possible solution proposed here, is to introduce startNoRegisterSources to avoid these additional registrations of static sources and of JVM sources in the case of YARN applicationMaster metrics (this could be useful for other metrics that may be added in the future). ## How was this patch tested? Manually tested on a YARN cluster, Closes apache#22279 from LucaCanali/YarnMetricsRemoveExtraSourceRegistration. Lead-authored-by: Luca Canali <[email protected]> Co-authored-by: LucaCanali <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
|
Hi, All. Sorry for commenting on this old PR. This looks safe and worthy for backporting. |
…r static metrics ## What changes were proposed in this pull request? YARN applicationMaster metrics registration introduced in SPARK-24594 causes further registration of static metrics (Codegenerator and HiveExternalCatalog) and of JVM metrics, which I believe do not belong in this context. This looks like an unintended side effect of using the start method of [[MetricsSystem]]. A possible solution proposed here, is to introduce startNoRegisterSources to avoid these additional registrations of static sources and of JVM sources in the case of YARN applicationMaster metrics (this could be useful for other metrics that may be added in the future). ## How was this patch tested? Manually tested on a YARN cluster, Closes #22279 from LucaCanali/YarnMetricsRemoveExtraSourceRegistration. Lead-authored-by: Luca Canali <[email protected]> Co-authored-by: LucaCanali <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
YARN applicationMaster metrics registration introduced in SPARK-24594 causes further registration of static metrics (Codegenerator and HiveExternalCatalog) and of JVM metrics, which I believe do not belong in this context.
This looks like an unintended side effect of using the start method of [[MetricsSystem]].
A possible solution proposed here, is to introduce startNoRegisterSources to avoid these additional registrations of static sources and of JVM sources in the case of YARN applicationMaster metrics (this could be useful for other metrics that may be added in the future).
How was this patch tested?
Manually tested on a YARN cluster,