Skip to content

Conversation

@markgrover
Copy link
Member

What changes were proposed in this pull request?

Adding a new property to SparkConf called spark.metrics.namespace that allows users to
set a custom namespace for executor and driver metrics in the metrics systems.

By default, the root namespace used for driver or executor metrics is
the value of spark.app.id. However, often times, users want to be able to track the metrics
across apps for driver and executor metrics, which is hard to do with application ID
(i.e. spark.app.id) since it changes with every invocation of the app. For such use cases,
users can set the spark.metrics.namespace property to another spark configuration key like
spark.app.name which is then used to populate the root namespace of the metrics system
(with the app name in our example). spark.metrics.namespace property can be set to any
arbitrary spark property key, whose value would be used to set the root namespace of the
metrics system. Non driver and executor metrics are never prefixed with spark.app.id, nor
does the spark.metrics.namespace property have any such affect on such metrics.

How was this patch tested?

Added new unit tests, modified existing unit tests.

@markgrover
Copy link
Member Author

A few design choices I made along the way are:

  1. Use a SparkConf property to control the namespace instead of a MetricsConfig property. The reason is that metrics config properties mean something particular and follow a particular pattern (see here, for example). And, regardless of what we name this property, it doesn't follow that paradigm - the first part of the property name before the first dot doesn't represent an instance (like master, worker, executor, etc.) like it does for legit metrics properties. Also, the properties defined in MetricsConfig are obtained using the getInstance() call, where the instance is passed and again, such a retrieval doesn't really apply to this configuration property since it doesn't belong to an instance. I understand it's one more property in SparkConf but I really feel its belongs there, right by spark.metrics.conf.
  2. Based on the previous point, this property shouldn't fall under the spark.metrics.conf. prefix. This is because if it does, it's mistakenly understood as a legit metrics property (see code here, for example).
  3. The proposed spark.metrics.namespace property allows one to specify an arbitrary property to be used as namespace. I did consider whitespacing it so users can use spark.app.name or spark.app.id only but then we'd have to deal with magic strings(app.name|app.id), and I didn't really feel inclined to do that. And, @ryan-williams who took a stab at the same issue [SPARK-5847] Allow for namespacing metrics by conf params other than spark.app.id #4632 made the same call and I agree with him.

@SparkQA
Copy link

SparkQA commented Jul 19, 2016

Test build #62549 has finished for PR 14270 at commit 605e690.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markgrover
Copy link
Member Author

Tests passed! I'd really appreciate a review!

@vanzin
Copy link
Contributor

vanzin commented Jul 20, 2016

It seems that the namespace must be the name of a configuration key, and cannot be an arbitrary string?

That sounds a little limiting. You could change the code to use the value of the namespace config as the default, in case there's no config with the name the user provided, to allow some more flexibility. Or maybe if you instead rely on the changes in #14022, then the user would have even more flexibility.

@ericl
Copy link
Contributor

ericl commented Jul 20, 2016

+1 on using the config refs provided by #14022

@markgrover
Copy link
Member Author

ok, thanks, I will rely on changes in #14022 (SPARK-16272), now that it's been committed.

@markgrover
Copy link
Member Author

Ok, I have pushed changes to use the expansion capabilities brought in by SPARK-16272. Overall, I think it was a very good call to use that, so thanks for the suggestions! Would appreciate a review.

@SparkQA
Copy link

SparkQA commented Jul 23, 2016

Test build #62744 has finished for PR 14270 at commit 3c8ea96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of createOptional you can use createWithDefault("${spark.app.id}").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but I think that changes the semantics in cases where spark.app.id is not defined.
Currently, the code in this PR creates an Option with value None if spark.app.id is not defined. With the proposed change, it will create an Option with value literal ${spark.app.id} if spark.app.id not defined. And, that changes the existing behavior. Even though in practice, I believe, spark.app.id is always defined, there are some tests in MetrisSystemSuite.scala that test the scenario when spark.app.id is not defined. And, I am hesitant to change that behavior.

I suppose I could use createWithDefault() in package.scala for METRICS_NAMESPACE but then I'd have to check if there are any unresolved spark.app.id references in the value, but that'd be too big of a pain for not much benefit. So, I'd prefer to keep this the way it is.

@vanzin
Copy link
Contributor

vanzin commented Jul 26, 2016

Minor nit otherwise LGTM. Tests could use shorter names. Also, now there's a conflict...

…D to namespace all metrics

Adding a new property to SparkConf called spark.metrics.namespace that allows users to
set a custom namespace for executor and driver metrics in the metrics systems.

By default, the root namespace used for driver or executor metrics is
the value of `spark.app.id`. However, often times, users want to be able to track the metrics
across apps for driver and executor metrics, which is hard to do with application ID
(i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases,
users can set the `spark.metrics.namespace` property to another spark configuration key like
`spark.app.name` which is then used to populate the root namespace of the metrics system
(with the app name in our example). `spark.metrics.namespace` property can be set to any
arbitrary spark property key, whose value would be used to set the root namespace of the
metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor
does the `spark.metrics.namespace` property have any such affect on such metrics.
@markgrover
Copy link
Member Author

Fixed the nits, resolved the merge conflict. Shortened some test names.

@SparkQA
Copy link

SparkQA commented Jul 27, 2016

Test build #62903 has finished for PR 14270 at commit b9c9a7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2016

Test build #62904 has finished for PR 14270 at commit 8923c58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jul 27, 2016

LGTM, merging to master.

@asfgit asfgit closed this in 70f846a Jul 27, 2016
@markgrover
Copy link
Member Author

Thanks a lot, @vanzin!

@Ianwww
Copy link

Ianwww commented Aug 25, 2016

Can this configuration be set in spark-defaults.conf as spark.metrics.namespace=${spark.app.name}
@markgrover

@markgrover
Copy link
Member Author

Sure.

AnthonyTruchet pushed a commit to AnthonyTruchet/spark that referenced this pull request Sep 1, 2016
…use of app ID to namespace all metrics

This is a backport of apache#14270. Because the spark.internal.config system
does not exists in branch 1.6, a simpler substitution scheme for ${} in
the spark.metrics.namespace value, using only Spark configuration had to
be added to preserve the behaviour discussed in the tickets and tested.

This backport is contributed by Criteo SA under the Apache v2 licence.

Adding a new property to SparkConf called spark.metrics.namespace that allows users to
set a custom namespace for executor and driver metrics in the metrics systems.

By default, the root namespace used for driver or executor metrics is
the value of `spark.app.id`. However, often times, users want to be able to track the metrics
across apps for driver and executor metrics, which is hard to do with application ID
(i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases,
users can set the `spark.metrics.namespace` property to any given value
or to another spark configuration key reference like `${spark.app.name}`
which is then used to populate the root namespace of the metrics system
(with the app name in our example). `spark.metrics.namespace` property can be set to any
arbitrary spark property key, whose value would be used to set the root namespace of the
metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor
does the `spark.metrics.namespace` property have any such affect on such metrics.

Added new unit tests, modified existing unit tests.
AnthonyTruchet pushed a commit to criteo-forks/spark that referenced this pull request Sep 9, 2016
…use of app ID to namespace all metrics

This is a backport of apache#14270. Because the spark.internal.config system
does not exists in branch 1.6, a simpler substitution scheme for ${} in
the spark.metrics.namespace value, using only Spark configuration had to
be added to preserve the behaviour discussed in the tickets and tested.

This backport is contributed by Criteo SA under the Apache v2 licence.

Adding a new property to SparkConf called spark.metrics.namespace that allows users to
set a custom namespace for executor and driver metrics in the metrics systems.

By default, the root namespace used for driver or executor metrics is
the value of `spark.app.id`. However, often times, users want to be able to track the metrics
across apps for driver and executor metrics, which is hard to do with application ID
(i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases,
users can set the `spark.metrics.namespace` property to any given value
or to another spark configuration key reference like `${spark.app.name}`
which is then used to populate the root namespace of the metrics system
(with the app name in our example). `spark.metrics.namespace` property can be set to any
arbitrary spark property key, whose value would be used to set the root namespace of the
metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor
does the `spark.metrics.namespace` property have any such affect on such metrics.

Added new unit tests, modified existing unit tests.
@AnthonyTruchet
Copy link

Hello @markgrover @vanzin ! As this is just a backport of your work, would you please consider reviewing it ?

ianlcsd pushed a commit to ianlcsd/spark that referenced this pull request Jul 1, 2017
…use of app ID to namespace all metrics

This is a backport of apache#14270. Because the spark.internal.config system
does not exists in branch 1.6, a simpler substitution scheme for ${} in
the spark.metrics.namespace value, using only Spark configuration had to
be added to preserve the behaviour discussed in the tickets and tested.

This backport is contributed by Criteo SA under the Apache v2 licence.

Adding a new property to SparkConf called spark.metrics.namespace that allows users to
set a custom namespace for executor and driver metrics in the metrics systems.

By default, the root namespace used for driver or executor metrics is
the value of `spark.app.id`. However, often times, users want to be able to track the metrics
across apps for driver and executor metrics, which is hard to do with application ID
(i.e. `spark.app.id`) since it changes with every invocation of the app. For such use cases,
users can set the `spark.metrics.namespace` property to any given value
or to another spark configuration key reference like `${spark.app.name}`
which is then used to populate the root namespace of the metrics system
(with the app name in our example). `spark.metrics.namespace` property can be set to any
arbitrary spark property key, whose value would be used to set the root namespace of the
metrics system. Non driver and executor metrics are never prefixed with `spark.app.id`, nor
does the `spark.metrics.namespace` property have any such affect on such metrics.

Added new unit tests, modified existing unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants