Skip to content

Conversation

vaind
Copy link
Collaborator

@vaind vaind commented Mar 2, 2023

Part of #1955

This uses Diagnostic IPC to connect to the CLR in-process, capture Microsoft-DotNETCore-SampleProfiler trace events and transforms them to the format Sentry understands.

In its current state, the feature is in a good shape for alfa testing but there's a major followup to resolve and that is how TraceLog, which is a tool to process raw events and get figure out stack traces, works, i.e. that it requires files and needs to be saved & then loaded from file to be actually usable (see here). I'm starting to lean towards a fork/copy of relevant parts where we gut it to get what we need and drop the rest. That way we could also do without the rundown provider that gives as all past .net runtime events when we start a profile. Functionally (from the user perspective), the profile should look the same, but the implementation should be more efficient than it is now, especially the processing that happens after the profiling stops and also it would be most noticeable for short transactions if we could avoid requesting all past CLR runtime events (for simple app, it's about 3 MB of data).

TODOs

  • test full SDK setup & finish right after a transaction is finished to make sure the processing is waited-for in the BackgroundWorker FlushAsync() and the profile appears in the envelope.
  • handle exceptions gracefully
  • limit long running transactions (30s)
  • maybe make processing asynchronous, or run the whole thing on a different thread
  • fix to support interleaving transactions
  • downsample during processing (currently downsampled while converting nettrace to ETLX)
  • configure in-app

Followups

  • maybe filter-out threads that don't run any user code? marked as in-app=false
  • should we filter-out sentry-specific/profiling-specific samples? marked as in-app=false
  • provide debug_meta images if we want to show line numbers - this may need to be opt-in as it will likely add some runtime overhead
  • downsample during ingest, if possible - currently, the runtime samples at 1000 Hz (1 ms) without any way to configure that yet, but we want it to be closer to 101 Hz - we should be able to filter samples directly in the _session.EventStream before even transforming to ETLX.
  • filter out repeating stack_id for a given thread (i.e. don't send new timestamps unless the stack trace changes) - this may be a follow-up and should also be considered for other SDKs because it may significantly reduce the payload size in multithreaded environments. For example, this thread could be significantly reduced: https://sentry-sdks.sentry.io/profiling/profile/sentry-dotnet/3f359ed615d94111bd927e821917d1c8/flamechart/?colorCoding=by+symbol+name&query=&sorting=call+order&tid=2&view=top+down&xAxis=transaction
  • it may be possible to handle GCstart & GCend events to show garbage collection
  • collect Microsoft-Windows-DotNETRuntime rundown provider (loaded libs, etc) in a separate session and merge with actual samples later during processing, if possible (merging nettrace files, or having a custom provider for TraceLog).
  • Or maybe we don't even need to collect these because we're in the same process? can we hack our way around and get the libraries from the current process instead? Probably not so easy because it also somehow stack walking if I'm not mistaken.
  • Get rid of writing the intermediary files if at all possible - doesn't seem to be doable at the moment, raised Create TraceLog (ETLX) without writing to file microsoft/perfview#1829
  • Figure out thread names (we only have native thread IDs though so it seems to be a problem)
  • Consider capturing only the current "activity", i.e. not all threads but only what belongs to the transaction.
  • profile & optimize the capture and processing

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 9f89b92

@vaind vaind force-pushed the feat/profiling branch 2 times, most recently from 514c623 to a7911d2 Compare March 16, 2023 09:08
@vaind vaind marked this pull request as ready for review March 16, 2023 14:29
@bruno-garcia
Copy link
Member

bruno-garcia commented Mar 20, 2023

Outcome of a chat between @vaind and I:

  • We don't want to publish Sentry.Profiling. We can add it to Sentry but that's be only under netX.0 (.NET Core) target we support. We need to communicate this clearly in docs.
  • There's a lot more we can/want to do but this PR is at a good state to get something alpha and feedback but not through NuGet since the final product will be under Sentry package
  • The option to turn it on should be available on all TFMs but with a [Obsolete or whatever gives a warning/error at compile time that the target is not supported to avoid support request (why option not available).
  • We'll use GlobalMode to set the behavior: all threads in process (global Mode) or only activity/request (server mode)
  • There are changes that need to be done on the TraceEvent so we can have a fork. But we need to figure out if we can have 0 dependencies at the end (right now we have 2 dependencies on Sentry.Profiling) do we need 2 forks or just vendor 2 files? The motivation: We want to possibly have profiling on by default if tracing is turned on. Same approach we use for Ben.Demystifier with changing types to internal so we can change approach without breaking change.
  • We should support the lowest .NET Core version possible but for as long as we have no friction to do so. If supporting .NET 6 is the simplest/what we get OOTB, we should take that route. It's a new feature and we expect new customers/those upgrading to take advantage of. in other words: No .NET Framework or unsupported .NET Core as a goal.
  • To be clear: This is only for client mode right now (desktop/cli). Server mode (Global mode/aspnet) will come in follow up PRs. Mobile isn't supported either since it's not based on CoreCLR. We can revisit this in the future.

@mattjohnsonpint
Copy link
Contributor

Looks like something in the latest changes broke things. Everything else looks good, so after resolving that all that's needed is the license file and and then I will approve. Thanks!!!

@mattjohnsonpint
Copy link
Contributor

Oh, and a changelog entry also. Let's go with "Initial work to support profiling in a future release." - since we're not making it available just yet.

And we should also add a <IsPackable>false</IsPackable> to Sentry.Profiling.csproj so we don't attempt to publish a nupkg for it yet.

@mattjohnsonpint mattjohnsonpint merged commit 3364647 into main Apr 14, 2023
@mattjohnsonpint mattjohnsonpint deleted the feat/profiling branch April 14, 2023 05:31
@mattjohnsonpint
Copy link
Contributor

Merged. Thanks!!!

@filipnavara
Copy link
Contributor

Mobile isn't supported either since it's not based on CoreCLR.

JFYI MonoVM does have the identical sampling profiler mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants