-
Notifications
You must be signed in to change notification settings - Fork 554
fix: Explore how to achieve telemetry suppression with OTLP #3084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) }; | ||
} | ||
|
||
// #[tokio::main] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user's application relies on tokio, then they can create a Rt themselves, and wrap their main method inside rt.blockon({...app code..}), ensuring telemetry initialization is outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR demonstrates how to suppress telemetry-induced-telemetry loops in OTLP exporters by creating a dedicated Tokio runtime with telemetry suppression enabled. The approach prevents the infinite telemetry generation that occurs when OpenTelemetry exporters using tonic/hyper generate their own telemetry data, which then gets exported again in a loop.
- Creates a dedicated Tokio runtime with thread start/stop hooks that enable telemetry suppression
- Removes manual log filtering that was previously suppressing hyper/tonic logs globally
- Moves all OpenTelemetry initialization to use the dedicated runtime
static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) }; | ||
} | ||
|
||
// #[tokio::main] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Remove the commented out #[tokio::main]
attribute as it's no longer needed and adds unnecessary clutter to the code.
// #[tokio::main] |
Copilot uses AI. Check for mistakes.
.worker_threads(1) // Don't think this matters as no matter how many threads | ||
// are created, we intercept the thread start to set suppress guard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment spans multiple lines but uses single-line comment syntax. Consider using proper multi-line comment format or clarify the reasoning more concisely in a single line.
.worker_threads(1) // Don't think this matters as no matter how many threads | |
// are created, we intercept the thread start to set suppress guard. | |
.worker_threads(1) /* Don't think this matters as no matter how many threads | |
are created, we intercept the thread start to set suppress guard. */ |
Copilot uses AI. Check for mistakes.
}) | ||
.build() | ||
.expect("Failed to create tokio runtime"); | ||
let logger_provider = rt.block_on(async { init_logs() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init_logs()
function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let logger_provider = init_logs();
let logger_provider = rt.block_on(async { init_logs() }); | |
let logger_provider = init_logs(); |
Copilot uses AI. Check for mistakes.
@@ -104,7 +109,7 @@ async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> { | |||
// At this point Logs (OTel Logs and Fmt Logs) are initialized, which will | |||
// allow internal-logs from Tracing/Metrics initializer to be captured. | |||
|
|||
let tracer_provider = init_traces(); | |||
let tracer_provider = rt.block_on(async { init_traces() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init_traces()
function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let tracer_provider = init_traces();
let tracer_provider = rt.block_on(async { init_traces() }); | |
let tracer_provider = init_traces(); |
Copilot uses AI. Check for mistakes.
@@ -113,7 +118,7 @@ async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> { | |||
// shutdown on it when application ends. | |||
global::set_tracer_provider(tracer_provider.clone()); | |||
|
|||
let meter_provider = init_metrics(); | |||
let meter_provider = rt.block_on(async { init_metrics() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init_metrics()
function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let meter_provider = init_metrics();
let meter_provider = rt.block_on(async { init_metrics() }); | |
let meter_provider = init_metrics(); |
Copilot uses AI. Check for mistakes.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3084 +/- ##
=====================================
Coverage 80.1% 80.1%
=====================================
Files 126 126
Lines 21957 21957
=====================================
Hits 17603 17603
Misses 4354 4354 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like I expect this would still require the user to not use a My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments? |
It feels like our first suggestion to users should be:
Then this is is only necessary for a subset of users |
One way of addressing #2877
This PR does not introduce a “fix” inside the OTLP Exporters themselves, but instead demonstrates how users can address the issue without requiring changes in OpenTelemetry.
Background
OpenTelemetry provides a mechanism to suppress telemetry based on the current Context. However, this suppression only works if every component involved properly propagates OpenTelemetry’s Context. Libraries like tonic and hyper are not aware of OTel’s Context and therefore do not propagate it across threads.
As a result, OTel’s suppression can fail, leading to telemetry-induced-telemetry—where the act of exporting telemetry (e.g., sending data via tonic/hyper) itself generates additional telemetry. This newly generated telemetry is then exported again, triggering yet more telemetry in a loop, potentially overwhelming the system.
What this PR does
OTLP/gRPC exporters rely on the tonic client, which captures the current runtime at creation time and uses it to drive futures. Instead of reusing the application’s existing runtime, this PR creates a dedicated Tokio runtime exclusively for the OTLP Exporter.
In this dedicated runtime:
1. We intercept on_start / on_stop events.
2. Sets OTel’s suppression flag in the context.
This ensures that telemetry generated by libraries such as hyper/tonic will be suppressed only within the exporter’s dedicated runtime. If those same libraries are used elsewhere for application logic, they continue to function normally and emit telemetry as expected.
Depending on the feedback, we could either address this purely through documentation and examples, or we could enhance the OTLP Exporter itself to expose a feature flag that, when enabled, would automatically create the tonic client within its own dedicated runtime.