-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5847][CORE] Allow for configuring MetricsSystem's use of app ID to namespace all metrics #14270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
A few design choices I made along the way are:
|
|
Test build #62549 has finished for PR 14270 at commit
|
|
Tests passed! I'd really appreciate a review! |
|
It seems that the namespace must be the name of a configuration key, and cannot be an arbitrary string? That sounds a little limiting. You could change the code to use the value of the namespace config as the default, in case there's no config with the name the user provided, to allow some more flexibility. Or maybe if you instead rely on the changes in #14022, then the user would have even more flexibility. |
|
+1 on using the config refs provided by #14022 |
|
ok, thanks, I will rely on changes in #14022 (SPARK-16272), now that it's been committed. |
|
Ok, I have pushed changes to use the expansion capabilities brought in by SPARK-16272. Overall, I think it was a very good call to use that, so thanks for the suggestions! Would appreciate a review. |
|
Test build #62744 has finished for PR 14270 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of createOptional you can use createWithDefault("${spark.app.id}").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but I think that changes the semantics in cases where spark.app.id is not defined.
Currently, the code in this PR creates an Option with value None if spark.app.id is not defined. With the proposed change, it will create an Option with value literal ${spark.app.id} if spark.app.id not defined. And, that changes the existing behavior. Even though in practice, I believe, spark.app.id is always defined, there are some tests in MetrisSystemSuite.scala that test the scenario when spark.app.id is not defined. And, I am hesitant to change that behavior.
I suppose I could use createWithDefault() in package.scala for METRICS_NAMESPACE but then I'd have to check if there are any unresolved spark.app.id references in the value, but that'd be too big of a pain for not much benefit. So, I'd prefer to keep this the way it is.
|
Minor nit otherwise LGTM. Tests could use shorter names. Also, now there's a conflict... |
…D to namespace all metrics Adding a new property to SparkConf called spark.metrics.namespace that allows users to set a custom namespace for executor and driver metrics in the metrics systems. By default, the root namespace used for driver or executor metrics is the value of `spark.app.id`. However, often times, users want to be able to track the metrics across apps for driver and executor metrics, which is hard to do with application ID (i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases, users can set the `spark.metrics.namespace` property to another spark configuration key like `spark.app.name` which is then used to populate the root namespace of the metrics system (with the app name in our example). `spark.metrics.namespace` property can be set to any arbitrary spark property key, whose value would be used to set the root namespace of the metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor does the `spark.metrics.namespace` property have any such affect on such metrics.
|
Fixed the nits, resolved the merge conflict. Shortened some test names. |
…r newly added tests
|
Test build #62903 has finished for PR 14270 at commit
|
|
Test build #62904 has finished for PR 14270 at commit
|
|
LGTM, merging to master. |
|
Thanks a lot, @vanzin! |
|
Can this configuration be set in spark-defaults.conf as spark.metrics.namespace=${spark.app.name} |
|
Sure. |
…use of app ID to namespace all metrics This is a backport of apache#14270. Because the spark.internal.config system does not exists in branch 1.6, a simpler substitution scheme for ${} in the spark.metrics.namespace value, using only Spark configuration had to be added to preserve the behaviour discussed in the tickets and tested. This backport is contributed by Criteo SA under the Apache v2 licence. Adding a new property to SparkConf called spark.metrics.namespace that allows users to set a custom namespace for executor and driver metrics in the metrics systems. By default, the root namespace used for driver or executor metrics is the value of `spark.app.id`. However, often times, users want to be able to track the metrics across apps for driver and executor metrics, which is hard to do with application ID (i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases, users can set the `spark.metrics.namespace` property to any given value or to another spark configuration key reference like `${spark.app.name}` which is then used to populate the root namespace of the metrics system (with the app name in our example). `spark.metrics.namespace` property can be set to any arbitrary spark property key, whose value would be used to set the root namespace of the metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor does the `spark.metrics.namespace` property have any such affect on such metrics. Added new unit tests, modified existing unit tests.
…use of app ID to namespace all metrics This is a backport of apache#14270. Because the spark.internal.config system does not exists in branch 1.6, a simpler substitution scheme for ${} in the spark.metrics.namespace value, using only Spark configuration had to be added to preserve the behaviour discussed in the tickets and tested. This backport is contributed by Criteo SA under the Apache v2 licence. Adding a new property to SparkConf called spark.metrics.namespace that allows users to set a custom namespace for executor and driver metrics in the metrics systems. By default, the root namespace used for driver or executor metrics is the value of `spark.app.id`. However, often times, users want to be able to track the metrics across apps for driver and executor metrics, which is hard to do with application ID (i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases, users can set the `spark.metrics.namespace` property to any given value or to another spark configuration key reference like `${spark.app.name}` which is then used to populate the root namespace of the metrics system (with the app name in our example). `spark.metrics.namespace` property can be set to any arbitrary spark property key, whose value would be used to set the root namespace of the metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor does the `spark.metrics.namespace` property have any such affect on such metrics. Added new unit tests, modified existing unit tests.
|
Hello @markgrover @vanzin ! As this is just a backport of your work, would you please consider reviewing it ? |
…use of app ID to namespace all metrics This is a backport of apache#14270. Because the spark.internal.config system does not exists in branch 1.6, a simpler substitution scheme for ${} in the spark.metrics.namespace value, using only Spark configuration had to be added to preserve the behaviour discussed in the tickets and tested. This backport is contributed by Criteo SA under the Apache v2 licence. Adding a new property to SparkConf called spark.metrics.namespace that allows users to set a custom namespace for executor and driver metrics in the metrics systems. By default, the root namespace used for driver or executor metrics is the value of `spark.app.id`. However, often times, users want to be able to track the metrics across apps for driver and executor metrics, which is hard to do with application ID (i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases, users can set the `spark.metrics.namespace` property to any given value or to another spark configuration key reference like `${spark.app.name}` which is then used to populate the root namespace of the metrics system (with the app name in our example). `spark.metrics.namespace` property can be set to any arbitrary spark property key, whose value would be used to set the root namespace of the metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor does the `spark.metrics.namespace` property have any such affect on such metrics. Added new unit tests, modified existing unit tests.
What changes were proposed in this pull request?
Adding a new property to SparkConf called spark.metrics.namespace that allows users to
set a custom namespace for executor and driver metrics in the metrics systems.
By default, the root namespace used for driver or executor metrics is
the value of
spark.app.id. However, often times, users want to be able to track the metricsacross apps for driver and executor metrics, which is hard to do with application ID
(i.e.
spark.app.id) since it changes with every invocation of the app. For such use cases,users can set the
spark.metrics.namespaceproperty to another spark configuration key likespark.app.namewhich is then used to populate the root namespace of the metrics system(with the app name in our example).
spark.metrics.namespaceproperty can be set to anyarbitrary spark property key, whose value would be used to set the root namespace of the
metrics system. Non driver and executor metrics are never prefixed with
spark.app.id, nordoes the
spark.metrics.namespaceproperty have any such affect on such metrics.How was this patch tested?
Added new unit tests, modified existing unit tests.