Skip to content

Conversation

lucamuscat
Copy link
Contributor

@lucamuscat lucamuscat commented Sep 17, 2025

Fixes #3161
Design discussion issue (if applicable) #

Changes

The BatchLogProcessor background thread itself runs with a telemetry suppressed context, however, BatchLogProcessor::emit does not use a telemetry suppressed context as it is executed in the caller's thread, as opposed to the suppressed background thread.

Using a telemetry suppressed context whilst calling BatchLogProcessor::emit fixes the issue where telemetry-induced-telemetry is generated whilst calling emit on a shutdown BatchLogProcessor, avoiding a stack overflow.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lucamuscat lucamuscat requested a review from a team as a code owner September 17, 2025 21:43
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.7%. Comparing base (9bd2c1b) to head (72e02aa).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #3172     +/-   ##
=======================================
- Coverage   80.7%   80.7%   -0.1%     
=======================================
  Files        126     126             
  Lines      22328   22329      +1     
=======================================
- Hits       18028   18025      -3     
- Misses      4300    4304      +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lucamuscat lucamuscat force-pushed the fix-buggy-context-suppression-batch-log-processor branch from 970f74c to 96359fc Compare September 17, 2025 21:52
@lucamuscat lucamuscat changed the title Suppress telemetry emitted inside of BatchLogProcessor::emit fix: Suppress telemetry emitted inside of BatchLogProcessor::emit Sep 17, 2025
@lalitb
Copy link
Member

lalitb commented Sep 17, 2025

This approach seems fine to me. Another option would be to use an atomic flag (similar to dropped_logs_count) to prevent flooding. I'm not proposing anything specific, just mentioning it as something we could consider.

@lalitb
Copy link
Member

lalitb commented Sep 18, 2025

Thinking more, I would be cautious about adding any latency to the hot-path in user thread. The BatchLogProcessor::emit() method is called directly from application code for every log record, making performance critical. From the benchmark results, suppressing the entire method would add ~8.3ns on every call.

// | Benchmark | Time |
// |---------------------------------------|--------|
// | enter_telemetry_suppressed_scope | 8.3 ns |
// | normal_attach | 9.1 ns |
// | is_current_telemetry_suppressed_false | 750 ps |
// | is_current_telemetry_suppressed_true | 750 ps |

Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed here - #3172 (comment)

Better to have perf numbers for the change.

@lucamuscat
Copy link
Contributor Author

lucamuscat commented Sep 18, 2025

as discussed here - #3172 (comment)

Better to have perf numbers for the change.

@lalitb

I would like to run some more alternative fixes by you before finalizing this approach.

I had initially prevented the call to otel_warn! (see here) from causing the infinite feedback loop by checking if the warning about to be emitted using otel_warn! (BatchLogProcessor.Emit.AfterShutdown) is the same as the log record in the process of being emitted, however, I was told by a maintainer that we cannot hardcode our way out of the bug.

If you wish, I may move the context suppression in the same branch as the otel_warn! causing the stack overflow to keep the context suppression out of the hot loop. I don't think that you would mind if an error case had slightly more overhead.

@bantonsson
Copy link
Contributor

@lucamuscat I think that moving the suppression around the offending otel_warn! sounds like a good strategy.

…avoid the added overhead of suppressing telemetry in the happy path
@lucamuscat lucamuscat requested a review from lalitb September 18, 2025 17:35
@lucamuscat
Copy link
Contributor Author

Telemetry suppression has been moved out of the happy path and right before the offending otel_warn!. The happy path no longer bears the overhead of telemetry suppression.

@lalitb

@cijothomas
Copy link
Member

@lucamuscat I think that moving the suppression around the offending otel_warn! sounds like a good strategy.

+1. That was the plan for other places hitting (potentially hitting) this issue.
https://github.com/open-telemetry/opentelemetry-rust-contrib/blob/main/opentelemetry-user-events-logs/src/logs/processor.rs#L53-L54

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lalitb lalitb merged commit 5250df2 into open-telemetry:main Sep 19, 2025
27 checks passed
@lucamuscat lucamuscat deleted the fix-buggy-context-suppression-batch-log-processor branch September 19, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Stack overflow when calling unfiltered tracing::event! after SdkLogProvider is shutdown.
4 participants