Skip to content

fix: Explore how to achieve telemetry suppression with OTLP #3084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cijothomas
Copy link
Member

@cijothomas cijothomas commented Jul 25, 2025

One way of addressing #2877

This PR does not introduce a “fix” inside the OTLP Exporters themselves, but instead demonstrates how users can address the issue without requiring changes in OpenTelemetry.

Background

OpenTelemetry provides a mechanism to suppress telemetry based on the current Context. However, this suppression only works if every component involved properly propagates OpenTelemetry’s Context. Libraries like tonic and hyper are not aware of OTel’s Context and therefore do not propagate it across threads.

As a result, OTel’s suppression can fail, leading to telemetry-induced-telemetry—where the act of exporting telemetry (e.g., sending data via tonic/hyper) itself generates additional telemetry. This newly generated telemetry is then exported again, triggering yet more telemetry in a loop, potentially overwhelming the system.

What this PR does

OTLP/gRPC exporters rely on the tonic client, which captures the current runtime at creation time and uses it to drive futures. Instead of reusing the application’s existing runtime, this PR creates a dedicated Tokio runtime exclusively for the OTLP Exporter.

In this dedicated runtime:
1. We intercept on_start / on_stop events.
2. Sets OTel’s suppression flag in the context.

This ensures that telemetry generated by libraries such as hyper/tonic will be suppressed only within the exporter’s dedicated runtime. If those same libraries are used elsewhere for application logic, they continue to function normally and emit telemetry as expected.

Depending on the feedback, we could either address this purely through documentation and examples, or we could enhance the OTLP Exporter itself to expose a feature flag that, when enabled, would automatically create the tonic client within its own dedicated runtime.

@cijothomas cijothomas requested a review from a team as a code owner July 25, 2025 16:37
@cijothomas cijothomas changed the title Fix: Explore how to achieve telemetry suppression with OTLP fix: Explore how to achieve telemetry suppression with OTLP Jul 25, 2025
static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) };
}

// #[tokio::main]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user's application relies on tokio, then they can create a Rt themselves, and wrap their main method inside rt.blockon({...app code..}), ensuring telemetry initialization is outside of it.

@cijothomas cijothomas requested a review from Copilot July 25, 2025 16:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR demonstrates how to suppress telemetry-induced-telemetry loops in OTLP exporters by creating a dedicated Tokio runtime with telemetry suppression enabled. The approach prevents the infinite telemetry generation that occurs when OpenTelemetry exporters using tonic/hyper generate their own telemetry data, which then gets exported again in a loop.

  • Creates a dedicated Tokio runtime with thread start/stop hooks that enable telemetry suppression
  • Removes manual log filtering that was previously suppressing hyper/tonic logs globally
  • Moves all OpenTelemetry initialization to use the dedicated runtime

static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) };
}

// #[tokio::main]
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Remove the commented out #[tokio::main] attribute as it's no longer needed and adds unnecessary clutter to the code.

Suggested change
// #[tokio::main]

Copilot uses AI. Check for mistakes.

Comment on lines +70 to +71
.worker_threads(1) // Don't think this matters as no matter how many threads
// are created, we intercept the thread start to set suppress guard.
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment spans multiple lines but uses single-line comment syntax. Consider using proper multi-line comment format or clarify the reasoning more concisely in a single line.

Suggested change
.worker_threads(1) // Don't think this matters as no matter how many threads
// are created, we intercept the thread start to set suppress guard.
.worker_threads(1) /* Don't think this matters as no matter how many threads
are created, we intercept the thread start to set suppress guard. */

Copilot uses AI. Check for mistakes.

})
.build()
.expect("Failed to create tokio runtime");
let logger_provider = rt.block_on(async { init_logs() });
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init_logs() function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let logger_provider = init_logs();

Suggested change
let logger_provider = rt.block_on(async { init_logs() });
let logger_provider = init_logs();

Copilot uses AI. Check for mistakes.

@@ -104,7 +109,7 @@ async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
// At this point Logs (OTel Logs and Fmt Logs) are initialized, which will
// allow internal-logs from Tracing/Metrics initializer to be captured.

let tracer_provider = init_traces();
let tracer_provider = rt.block_on(async { init_traces() });
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init_traces() function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let tracer_provider = init_traces();

Suggested change
let tracer_provider = rt.block_on(async { init_traces() });
let tracer_provider = init_traces();

Copilot uses AI. Check for mistakes.

@@ -113,7 +118,7 @@ async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
// shutdown on it when application ends.
global::set_tracer_provider(tracer_provider.clone());

let meter_provider = init_metrics();
let meter_provider = rt.block_on(async { init_metrics() });
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init_metrics() function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let meter_provider = init_metrics();

Suggested change
let meter_provider = rt.block_on(async { init_metrics() });
let meter_provider = init_metrics();

Copilot uses AI. Check for mistakes.

Copy link

codecov bot commented Jul 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.1%. Comparing base (e9ca158) to head (54e05d5).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3084   +/-   ##
=====================================
  Coverage   80.1%   80.1%           
=====================================
  Files        126     126           
  Lines      21957   21957           
=====================================
  Hits       17603   17603           
  Misses      4354    4354           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scottgerring
Copy link
Contributor

Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like withTelemetrySuppression(_ => { /* setup otel here */ } ) ?

I expect this would still require the user to not use a tokio_main but rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

@scottgerring
Copy link
Contributor

It feels like our first suggestion to users should be:

If you're not using tonic/hyper as a HTTP client in your app, simply tune your tracing subscriber to suppress their telemetry

Then this is is only necessary for a subset of users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants