Skip to content

Conversation

lucamuscat
Copy link
Contributor

@lucamuscat lucamuscat commented Sep 10, 2025

Fixes #3161
Design discussion issue (if applicable) #

Changes

When integrating the opentelemetry crate with the tracing_subscriber crate, calling tracing::event! and its derivates after the SdkLoggerProvider has been called leads to a stack overflow due to BatchLogProcessor::emit using otel_warn! when trying to report the mpsc::TrySendError:Disconnected error.

This PR prevents otel_warn! from being called when handling the mpsc::TrySendError:Disconnected error in BatchLogProcessor::emit if the name of the log record is the same as that emitted by the otel_warn! in the same branch.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lucamuscat lucamuscat requested a review from a team as a code owner September 10, 2025 10:07
Copy link

linux-foundation-easycla bot commented Sep 10, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: lucamuscat / name: Luca Muscat (2f95832)

Copy link

codecov bot commented Sep 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.6%. Comparing base (6f75c58) to head (2f95832).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3162   +/-   ##
=====================================
  Coverage   80.6%   80.6%           
=====================================
  Files        126     126           
  Lines      22195   22198    +3     
=====================================
+ Hits       17898   17901    +3     
  Misses      4297    4297           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lucamuscat lucamuscat force-pushed the fix-stack-overflow-with-tracing-integration branch from f48bcd5 to 407c081 Compare September 10, 2025 10:47
@lucamuscat
Copy link
Contributor Author

I am aware that this is likely not the most optimal solution, as the bug ultimately lies in the integration between the otel_sdk and the tracing subscriber.

An alternative would be to add a public method is_shutdown which returns whether the LogProvider is shutdown (or in the process of being shutdown), which will ultimately be used by the tracing layer to discard any logs produced after shutdown. Of course, this alternative would require adding new public APIs and would require more changes in general.

@lucamuscat lucamuscat force-pushed the fix-stack-overflow-with-tracing-integration branch from 407c081 to 92fa1ba Compare September 10, 2025 12:19
// cycle. This may occur when integrating with the `tracing` crate due to
// `otel_warn!` possibly calling `tracing::warn!` under the hood, which will emit
// the warning back into the otel log processor.
if matches!((record.event_name, record.target()), (Some(event_name), Some(target_name)) if !(event_name == Self::AFTER_SHUTDOWN_WARNING_NAME && target_name == env!("CARGO_CRATE_NAME")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we can hardcode our way out. We need to find why Context based suppression is not working and fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer, it seems that unlike SdkTracer, SdkLogger does not discard logs when the provider is shutdown. The PR has been adjusted accordingly

@lucamuscat lucamuscat force-pushed the fix-stack-overflow-with-tracing-integration branch from 92fa1ba to 63b6e18 Compare September 10, 2025 15:25
// cycle. This may occur when integrating with the `tracing` crate due to
// `otel_warn!` possibly calling `tracing::warn!` under the hood, which will emit
// the warning back into the otel log processor.
if matches!((record.event_name, record.target()), (Some(event_name), Some(target_name)) if !(event_name == Self::AFTER_SHUTDOWN_WARNING_NAME && target_name == env!("CARGO_CRATE_NAME")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer, it seems that unlike SdkTracer, SdkLogger does not discard logs when the provider is shutdown. The PR has been adjusted accordingly

@@ -33,7 +33,7 @@ impl opentelemetry::logs::Logger for SdkLogger {

/// Emit a `LogRecord`.
fn emit(&self, mut record: Self::LogRecord) {
if Context::is_current_telemetry_suppressed() {
if Context::is_current_telemetry_suppressed() || self.provider.is_shutdown() {
Copy link
Contributor Author

@lucamuscat lucamuscat Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also suppresses the warning logs emitted when a log is sent after the LoggerProvider is shutdown, effectively hiding the log.

@lucamuscat lucamuscat force-pushed the fix-stack-overflow-with-tracing-integration branch from 63b6e18 to 4eac469 Compare September 15, 2025 13:26
@lucamuscat
Copy link
Contributor Author

@lalitb @cijothomas The PR is ready for you guys to have another look.

@cijothomas Is this what you had in mind when you mentioned "Context based suppression"?

Do note that since logs sent to a BatchLogProcessor with a shutdown provider are no longer propagated to the underlying loggers, the original log warning of a log being emitted after shutdown is no longer emitted. Is this an acceptable trade off for you guys?

@lucamuscat
Copy link
Contributor Author

Note that the build is currently failing due to a recent version of serde including some breaking changes. This issue is tackled in another PR: #3165

…`tracing::event!` after `SdkLoggerProvider::shutdown`
@lucamuscat lucamuscat force-pushed the fix-stack-overflow-with-tracing-integration branch from 4eac469 to 2f95832 Compare September 15, 2025 15:27
@lucamuscat
Copy link
Contributor Author

lucamuscat commented Sep 17, 2025

Closing in favour of #3172, which follows the maintainer's suggestions of fixing context suppression for BatchLogProcessor::emit.

@lucamuscat lucamuscat closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Stack overflow when calling unfiltered tracing::event! after SdkLogProvider is shutdown.
2 participants