From 2b99161a22d2d0909d4bf280ad150537867fedb8 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Wed, 21 Jul 2021 18:29:35 +0200 Subject: [PATCH 01/19] Document tracing issues related to ingestion model --- src/docs/sdk/research/performance/index.mdx | 34 ++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 34cbc0cfbd..d81b9cee06 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -192,6 +192,38 @@ In the example above, if an error bubbles up the call stack we want to be able t All those different expectations makes it hard to reuse, in an understandable way, the current notion of `scope`, how breadcrumbs are recorded, and how those different concepts interact. +Finally, it is worth noting that the changes to restructure scope management most likely cannot be done without breaking existing SDK APIs. Existing SDK concepts, like hubs, scopes, breadcrumbs, user, tags, contexts, would all have to be remodeled. + ## Span Ingestion Model -Coming soon. +Consider a trace depicted by the following span tree: + +``` +F +├─ B* +│ ├─ B +│ ├─ B +│ ├─ B +│ │ ├─ S* +│ │ ├─ S* +│ ├─ B +│ ├─ B +│ │ ├─ S* +│ ├─ B +│ ├─ B +│ ├─ B +│ │ ├─ S* + +where +F: span created on frontend service +B: span created on backend service +S: span created on storage service +``` + +This trace illustrates 3 services instrumented such that a user clicks a button on a Web page (`F`) and a backend (`B`) performs some work which involves making several queries to a storage service (`S`). Spans that are at the entry point to a given service are marked with a `*` to denote that they are transactions. + +Now we can use this example to compare and understand the difference between Sentry's span ingestion model and the model used by OpenTelemetry and other similar tracing systems. + +In Sentry's span ingestion model, all spans that belong to a transaction must be sent all together in a single request. That means that all `B` spans must be kept in-memory for the whole duration of the `B*` transaction, including time spent on downstream services (the storage service in the example). + +In OpenTelemetry's model, spans are batched together as they are finished and batches are sent as soon as either there is a certain number of spans in the batch or a certain time has passed. In our example, it could mean that the first 3 `B` spans are batched together and sent while the first `S*` transaction is still in progress in the storage service. Subsequently, other `B` spans are batched together and sent as they finish, until eventually the `B*` transaction span is also sent. From 08ebe37aff47011119cc57103b52cfdb988ffba7 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 08:55:49 +0200 Subject: [PATCH 02/19] F* --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index d81b9cee06..b257118de9 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -199,7 +199,7 @@ Finally, it is worth noting that the changes to restructure scope management mos Consider a trace depicted by the following span tree: ``` -F +F* ├─ B* │ ├─ B │ ├─ B From 7ab39801c4a49742fca6ac726e011cd410a2c71b Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 11:53:50 +0200 Subject: [PATCH 03/19] No numbers in headings The numbers seemed fine when there were just 2 items, but when there will be several items under "span ingestion model", numbering all of them is silly and error prone. --- src/docs/sdk/research/performance/index.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index b257118de9..d010c58806 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -66,7 +66,7 @@ The implementation of the actual `trace` function is relatively simple (see [a P The following two examples synthesize the scope propagation issues. -### 1. Cannot Determine Current Span +### Cannot Determine Current Span Consider some auto-instrumentation code that needs to get a reference to the current `span`, a case in which manual scope propagation is not available. @@ -146,7 +146,7 @@ Note that other tracing libraries have the same kind of challenge. There are sev - [OpenTracing shim doesn't change context #2016](https://github.com/open-telemetry/opentelemetry-js/issues/2016) - [Http Spans are not linked / does not set parent span #2333](https://github.com/open-telemetry/opentelemetry-js/issues/2333) -### 2. Conflicting Data Propagation Expectations +### Conflicting Data Propagation Expectations There is a conflict of expectations that appear whenever we add a `trace` function as discussed earlier, or simply try to address scope propagation with Zones. From abd6fec64b71b06d96ec0a411604f6463970f996 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 11:55:53 +0200 Subject: [PATCH 04/19] Span ingestion model issues --- src/docs/sdk/research/performance/index.mdx | 124 ++++++++++++++++++++ 1 file changed, 124 insertions(+) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index d010c58806..04fb029fa8 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -227,3 +227,127 @@ Now we can use this example to compare and understand the difference between Sen In Sentry's span ingestion model, all spans that belong to a transaction must be sent all together in a single request. That means that all `B` spans must be kept in-memory for the whole duration of the `B*` transaction, including time spent on downstream services (the storage service in the example). In OpenTelemetry's model, spans are batched together as they are finished and batches are sent as soon as either there is a certain number of spans in the batch or a certain time has passed. In our example, it could mean that the first 3 `B` spans are batched together and sent while the first `S*` transaction is still in progress in the storage service. Subsequently, other `B` spans are batched together and sent as they finish, until eventually the `B*` transaction span is also sent. + +While transactions are notably useful to group together spans and to explore operations of interest in Sentry, end users and instrumentation developers have an extra cognitive burden to understand and choose between a transaction or a span. + +The issues that follow in the next few sections have been identified in the current ingestion model, and are all related to this dichotomy. + +### Complex JSON Serialization of Transactions + +In OpenTelemetry's model, all [spans follow the same logical format](https://github.com/open-telemetry/opentelemetry-proto/blob/ebef7c999f4dea62b5b033e92a221411c49c0966/opentelemetry/proto/trace/v1/trace.proto#L56-L235). Users and instrumentation libraries can provide more meaning to a span by attaching key-value attributes to any span. The wire protocol uses lists of spans to send data from one system to another. + +Sentry's model, unlike OpenTelemetry's, makes a hard distinction between two types of span: transaction spans (often refered to as "transactions") and regular spans. + +In memory, transaction spans and regular spans have one distinction: transaction spans have one extra attribute, the transaction `name`. + +When serialized as JSON for ingestion, though, the differences are greater. Sentry SDKs must serialize regular spans to JSON in a format that directly resembles the in-memory spans. However, the serialization of a transaction span requires mapping its span attributes to a Sentry `Event` (originally used to report errors, expanded to fit new fields of exclusive use for transactions), while all child spans go as a list embedded in the `Event`. + +### Transaction Spans Gain Event Attributes + +When transactions are transformed from their in-memory representation to an `Event`, they gain more attributes that are not assignable to regular spans, such as `breadcrumbs`, `extra`, `contexts`, `event_id`, `fingerprint`, `release`, `environment`, `user`, etc. + +### Lifecycle Hooks + +Sentry SDKs expose a `BeforeSend` hook for error events, which allows users to modify and/or drop events before they are sent to Sentry. + +When the new type of event was introduced, `transaction`, it was soon decided that they would not go through the `BeforeSend` hook, for essentially two reasons: + +- Prevent user code from relying on the dual form of transactions (sometimes looking like a span, sometimes like an event, see earlier sections above); +- Avoid existing `BeforeSend` functions that were written with only errors in mind from interfering with transactions, be it mutating them accidentally, dropping them altogether, or some other unexpected side-effect. + +However, it was also clear that some form of lifecycle hook was necessary to allow users to do things like updating a transaction name. + +We ended up with the middle ground of allowing the mutation/dropping of transaction events through `EventProcessor` (a more general form of `BeforeSend`), which solves problems by giving users immediate access to their data before it goes out of the SDK, but is also the worse of both worlds in that it exposes the transaction duality that was never intended to leak and is more complicated to use than `BeforeSend`. + +In OpenTelemetry, spans go through span processors, which are two lifecycle hooks: one when a span is started and one when it is ended. + +### Nested Transactions + +On one hand, Sentry's ingestion model does not allow for nested transactions. Transactions are meant to mark service transitions. + +On the other hand, Sentry as a product requires that transactions exist. Data outside of a transaction is lost. + +In practice, SDKs have to way to prevent transactions from becoming nested. The end result is possibly surprising to users, as each transaction starts a new tree. The only way to relate those trees is through the `trace_id`. + +Sentry's billing model is per event/transaction. That means that a transaction within a transaction generates two billable events. + +In SDKs, a transaction within a transaction causes inner spans to be "swallowed" by the innermost transaction surrounding them. In some situations, automatic instrumentation creating spans will only make them appear in one of the two transactions, but not both, causing apparent gaps in instrumentation. + +The product does not know how to deal with nested transactions yet. If they share a `trace_id` they can be visualized in the trace view, but when looking into one transaction it is as if all the others did not exist (either ancestor or descendent transactions). + +There is also user confusion about what one would expect in the UI for such a situation (pseudocode): + +```python +# if do_a_database_query returns 10 results, is the user +# - seeing 11 transactions in the UI? +# - billed for 11 transactions? +# - see spans within create_thumbnail in the innermost transaction only? +with transaction("index-page"): + results = do_a_database_query() + for result in results: + if result["needs_thumbnail"]: + with transaction("create-thumbnail", {"resource": result["id"]}): + create_thumbnail(result) +``` + +### Spans Cannot Exist Outside of a Transaction + +Sentry's tracing experience is centered entirely around the part of a trace that exists inside transactions. This means that data cannot exist outside of a transaction even if it exists in a trace. + +If the SDK does not have a transaction going, a manual instrumentation by the user with a regular span is entirely lost. This is less of a concern on instrumented web servers, where a transaction is live most of the time, because automatic instrumentation starts and finishes a transaction for every incoming request. + +The requirement of a transaction is specially challenging on frontends (browser, mobile, and desktop applications), because in those cases auto-instrumented transactions are less reliable as they typically only last for some time before being automatically finished. + +In our [example trace](#span-ingestion-model), the first span that originates the trace is due to a button click. If the button click `F*` was instrumented as a regular `span`, most likely no data from the frontend would be captured, but the other `B` and `S` spans would still be captured. + +In Sentry's model, if a span is not a transaction and there is no ancestor span that is a transaction, then the span won't be ingested. + +This, in turn, means there are many situations where a trace is missing crucial information that can help debug issues, particularly on the frontend where transactions need to end at some point but execution might continue. + +Automatic and manual instrumentation have a challenge deciding whether to start a span or a transaction, and the decision is particularly difficult considering that poor [scope propagation](#scope-propagation) limits the existence of concurrent transactions. + +### Missing Web Vital Measurements + +Sentry's browser instrumentation collects Web Vital measurements. But, because those measurements are sent along to Sentry using the automatically instrumented transaction as a carrier, measurements that are made available by the browser after the automatic transaction has finished are lost. + +This causes transactions to be missing some Web Vitals or to have non-final measurements, for example for the LCP measurement. + +### Unreliable Frontend Transaction Duration + +Because all data must go in a transaction. Sentry's browser SDK creates a transaction for every page load and every navigation. Those transactions must end at some time. + +If the browser tab is closed before the transaction is finished and sent to Sentry, all collected data is lost. Therefore, the SDK needs to balance the risk of losing all data and capturing as much useful data as possible. + +Transactions are finished after a certain idle time after the last activity has been observed (for example, outgoing HTTP requests). This means that the duration of the page load / navigation transaction is a rather arbitrary value that can't necessarily be compared or improved, as it doesn't indicate any concrete and undestandable duration. + +We counter the limitation by focusing on the LCP Web Vital as the default performance metric for browser, but, as discussed above, the final LCP value might be lost. + +### In-memory Buffering Affects Servers + +Any system that may operate with several concurrent transactions at one time, which is the case of web servers, will be required to buffer complete span trees in memory. + +This means recording 100% of spans and 100% of transactions for many server-side applications, even in a simplified form, is not feasible due to the overhead incurred. + +### Lack of Batch Ingestion + +When multiple concurrent transactions finish nearly at the same time, SDKs must send as many requests as there are transactions. There is no provision to batching multiple transactions into a single request. + +This inneficiency means at least extra bandwidth consumption for frontends (browser, mobile) and extra overhead for backends. + +### Compatibility + +The special treatment of transactions is incompatible with OpenTelemetry which means we cannot implement an OpenTelemetry Exporter that can feed data into Sentry (though we have a [Sentry Exporter with a major correctness limitation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sentryexporter#known-limitations)). Likewise we cannot leverage OpenTelemetry SDKs and instrumentations. + +## Summary + +We have learned a lot with the current tracing implementation at Sentry. This document is an attempt to capture many of the known limitations to serve as basis for future improvement. + +Tracing is a complex subject, and taming that complexity is no easy feat. + +The first group of issues related to **scope propagation** is a concern exclusive to SDKs and how they are designed. Addressing that will require internal architecture changes to all SDKs, including the redesign of old features like breadcrumbs. Those would be pre-requisites to implementing simple-to-use tracing helpers like a `trace` function that works in any context and captures accurate and reliable performance data. + +Note that such changes would almost necessarily mean new major versions of SDKs that break compatibility with existing versions. + +The second group of issues related to the **span ingestion model** is a lot more complex as it affects more parts of the product and requires a coordinated effort from multiple teams to change. + +Nonetheless, changes to the ingestion model would have an immeasurable impact on the product, as it could reduce the cognitive burden of instrumentation, collect more data, improve efficiency. From 43c73d49da33694b22d346906821be420f167aba Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 12:02:32 +0200 Subject: [PATCH 05/19] Fix typo --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 04fb029fa8..e477611bf4 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -267,7 +267,7 @@ On one hand, Sentry's ingestion model does not allow for nested transactions. Tr On the other hand, Sentry as a product requires that transactions exist. Data outside of a transaction is lost. -In practice, SDKs have to way to prevent transactions from becoming nested. The end result is possibly surprising to users, as each transaction starts a new tree. The only way to relate those trees is through the `trace_id`. +In practice, SDKs have no way of preventing transactions from becoming nested. The end result is possibly surprising to users, as each transaction starts a new tree. The only way to relate those trees is through the `trace_id`. Sentry's billing model is per event/transaction. That means that a transaction within a transaction generates two billable events. From d6f7fb2822d6f9d49e4554752b6c87ff671f26d8 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 12:07:33 +0200 Subject: [PATCH 06/19] Review wording --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index e477611bf4..18fc235f55 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -273,7 +273,7 @@ Sentry's billing model is per event/transaction. That means that a transaction w In SDKs, a transaction within a transaction causes inner spans to be "swallowed" by the innermost transaction surrounding them. In some situations, automatic instrumentation creating spans will only make them appear in one of the two transactions, but not both, causing apparent gaps in instrumentation. -The product does not know how to deal with nested transactions yet. If they share a `trace_id` they can be visualized in the trace view, but when looking into one transaction it is as if all the others did not exist (either ancestor or descendent transactions). +The product does not know how to deal with nested transactions yet. If the transactions share a `trace_id` they can be visualized in the trace view. To get to the trace view, one must follow a link from one of the transactions. However, when looking at any one transaction it is as if all the others did not exist (either ancestor or descendent transactions are not represented on screen). There is also user confusion about what one would expect in the UI for such a situation (pseudocode): From 12995f03dd8373c31d4c87effca2606d7a5bb56c Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 12:10:21 +0200 Subject: [PATCH 07/19] Combine paragraphs --- src/docs/sdk/research/performance/index.mdx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 18fc235f55..c762d20e97 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -300,9 +300,7 @@ The requirement of a transaction is specially challenging on frontends (browser, In our [example trace](#span-ingestion-model), the first span that originates the trace is due to a button click. If the button click `F*` was instrumented as a regular `span`, most likely no data from the frontend would be captured, but the other `B` and `S` spans would still be captured. -In Sentry's model, if a span is not a transaction and there is no ancestor span that is a transaction, then the span won't be ingested. - -This, in turn, means there are many situations where a trace is missing crucial information that can help debug issues, particularly on the frontend where transactions need to end at some point but execution might continue. +In Sentry's model, if a span is not a transaction and there is no ancestor span that is a transaction, then the span won't be ingested. This, in turn, means there are many situations where a trace is missing crucial information that can help debug issues, particularly on the frontend where transactions need to end at some point but execution might continue. Automatic and manual instrumentation have a challenge deciding whether to start a span or a transaction, and the decision is particularly difficult considering that poor [scope propagation](#scope-propagation) limits the existence of concurrent transactions. From 09a0465b6dc338abf1dc424bcda237be78c47d3f Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Fri, 23 Jul 2021 16:12:47 +0200 Subject: [PATCH 08/19] Update src/docs/sdk/research/performance/index.mdx Co-authored-by: Abhijeet Prasad --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index c762d20e97..0826d6bb64 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -348,4 +348,4 @@ Note that such changes would almost necessarily mean new major versions of SDKs The second group of issues related to the **span ingestion model** is a lot more complex as it affects more parts of the product and requires a coordinated effort from multiple teams to change. -Nonetheless, changes to the ingestion model would have an immeasurable impact on the product, as it could reduce the cognitive burden of instrumentation, collect more data, improve efficiency. +Nonetheless, changes to the ingestion model would have an immeasurable impact on the product, as it could reduce the cognitive burden of instrumentation, collect more data, and improve efficiency. From acad893a10d9f02bd77c8d4d02f42f98ba48f9a6 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 11:34:57 +0200 Subject: [PATCH 09/19] Apply suggestions from code review Co-authored-by: Katie Byers --- src/docs/sdk/research/performance/index.mdx | 50 ++++++++++----------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 0826d6bb64..1291393b3f 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -148,7 +148,7 @@ Note that other tracing libraries have the same kind of challenge. There are sev ### Conflicting Data Propagation Expectations -There is a conflict of expectations that appear whenever we add a `trace` function as discussed earlier, or simply try to address scope propagation with Zones. +There is a conflict of expectations that appears whenever we add a `trace` function as discussed earlier, or simply try to address scope propagation with Zones. The fact that the current `span` is stored in the `scope`, along with `tags`, `breadcrumbs` and more, makes data propagation messy as some parts of the `scope` are intended to propagate only into inner functions calls (for example, tags), while others are expected to propagate back into callers (for example, breadcrumbs), specially when there is an error. @@ -192,7 +192,7 @@ In the example above, if an error bubbles up the call stack we want to be able t All those different expectations makes it hard to reuse, in an understandable way, the current notion of `scope`, how breadcrumbs are recorded, and how those different concepts interact. -Finally, it is worth noting that the changes to restructure scope management most likely cannot be done without breaking existing SDK APIs. Existing SDK concepts, like hubs, scopes, breadcrumbs, user, tags, contexts, would all have to be remodeled. +Finally, it is worth noting that the changes to restructure scope management most likely cannot be done without breaking existing SDK APIs. Existing SDK concepts - like hubs, scopes, breadcrumbs, user, tags, and contexts - would all have to be remodeled. ## Span Ingestion Model @@ -220,13 +220,13 @@ B: span created on backend service S: span created on storage service ``` -This trace illustrates 3 services instrumented such that a user clicks a button on a Web page (`F`) and a backend (`B`) performs some work which involves making several queries to a storage service (`S`). Spans that are at the entry point to a given service are marked with a `*` to denote that they are transactions. +This trace illustrates 3 services instrumented such that when a user clicks a button on a Web page (`F`), a backend (`B`) performs some work, which then requires making several queries to a storage service (`S`). Spans that are at the entry point to a given service are marked with a `*` to denote that they are transactions. -Now we can use this example to compare and understand the difference between Sentry's span ingestion model and the model used by OpenTelemetry and other similar tracing systems. +We can use this example to compare and understand the difference between Sentry's span ingestion model and the model used by OpenTelemetry and other similar tracing systems. -In Sentry's span ingestion model, all spans that belong to a transaction must be sent all together in a single request. That means that all `B` spans must be kept in-memory for the whole duration of the `B*` transaction, including time spent on downstream services (the storage service in the example). +In Sentry's span ingestion model, all spans that belong to a transaction must be sent all together in a single request. That means that all `B` spans must be kept in memory for the whole duration of the `B*` transaction, including time spent on downstream services (the storage service in the example). -In OpenTelemetry's model, spans are batched together as they are finished and batches are sent as soon as either there is a certain number of spans in the batch or a certain time has passed. In our example, it could mean that the first 3 `B` spans are batched together and sent while the first `S*` transaction is still in progress in the storage service. Subsequently, other `B` spans are batched together and sent as they finish, until eventually the `B*` transaction span is also sent. +In OpenTelemetry's model, spans are batched together as they are finished, and batches are sent as soon as either a) there are a certain number of spans in the batch or b) a certain amount of time has passed. In our example, it could mean that the first 3 `B` spans would be batched together and sent while the first `S*` transaction is still in progress in the storage service. Subsequently, other `B` spans would be batched together and sent as they finish, until eventually the `B*` transaction span is also sent. While transactions are notably useful to group together spans and to explore operations of interest in Sentry, end users and instrumentation developers have an extra cognitive burden to understand and choose between a transaction or a span. @@ -234,7 +234,7 @@ The issues that follow in the next few sections have been identified in the curr ### Complex JSON Serialization of Transactions -In OpenTelemetry's model, all [spans follow the same logical format](https://github.com/open-telemetry/opentelemetry-proto/blob/ebef7c999f4dea62b5b033e92a221411c49c0966/opentelemetry/proto/trace/v1/trace.proto#L56-L235). Users and instrumentation libraries can provide more meaning to a span by attaching key-value attributes to any span. The wire protocol uses lists of spans to send data from one system to another. +In OpenTelemetry's model, all [spans follow the same logical format](https://github.com/open-telemetry/opentelemetry-proto/blob/ebef7c999f4dea62b5b033e92a221411c49c0966/opentelemetry/proto/trace/v1/trace.proto#L56-L235). Users and instrumentation libraries can provide more meaning to any span by attaching key-value attributes to it. The wire protocol uses lists of spans to send data from one system to another. Sentry's model, unlike OpenTelemetry's, makes a hard distinction between two types of span: transaction spans (often refered to as "transactions") and regular spans. @@ -250,14 +250,14 @@ When transactions are transformed from their in-memory representation to an `Eve Sentry SDKs expose a `BeforeSend` hook for error events, which allows users to modify and/or drop events before they are sent to Sentry. -When the new type of event was introduced, `transaction`, it was soon decided that they would not go through the `BeforeSend` hook, for essentially two reasons: +When the new `transaction` type event was introduced, it was soon decided that such events would not go through the `BeforeSend` hook, essentially for two reasons: -- Prevent user code from relying on the dual form of transactions (sometimes looking like a span, sometimes like an event, see earlier sections above); -- Avoid existing `BeforeSend` functions that were written with only errors in mind from interfering with transactions, be it mutating them accidentally, dropping them altogether, or some other unexpected side-effect. +- To prevent user code from relying on the dual form of transactions (sometimes looking like a span, sometimes like an event, as discussed in earlier sections); +- To prevent existing `BeforeSend` functions that were written with only errors in mind from interfering with transactions, be it mutating them accidentally, dropping them altogether, or causing some other unexpected side effect. -However, it was also clear that some form of lifecycle hook was necessary to allow users to do things like updating a transaction name. +However, it was also clear that some form of lifecycle hook was necessary, to allow users to do things like updating a transaction's name. -We ended up with the middle ground of allowing the mutation/dropping of transaction events through `EventProcessor` (a more general form of `BeforeSend`), which solves problems by giving users immediate access to their data before it goes out of the SDK, but is also the worse of both worlds in that it exposes the transaction duality that was never intended to leak and is more complicated to use than `BeforeSend`. +We ended up with the middle ground of allowing the mutation/dropping of transaction events through the use of an `EventProcessor` (a more general form of `BeforeSend`). This solves problems by giving users immediate access to their data before it goes out of the SDK, but it also has drawbacks in that it's more complicated to use than `BeforeSend` and also exposes the transaction duality, which was never intended to leak. In OpenTelemetry, spans go through span processors, which are two lifecycle hooks: one when a span is started and one when it is ended. @@ -269,13 +269,13 @@ On the other hand, Sentry as a product requires that transactions exist. Data ou In practice, SDKs have no way of preventing transactions from becoming nested. The end result is possibly surprising to users, as each transaction starts a new tree. The only way to relate those trees is through the `trace_id`. -Sentry's billing model is per event/transaction. That means that a transaction within a transaction generates two billable events. +Sentry's billing model is per event, be it an error event or a transaction event. That means that a transaction within a transaction generates two billable events. -In SDKs, a transaction within a transaction causes inner spans to be "swallowed" by the innermost transaction surrounding them. In some situations, automatic instrumentation creating spans will only make them appear in one of the two transactions, but not both, causing apparent gaps in instrumentation. +In SDKs, having a transaction within a transaction will cause inner spans to be "swallowed" by the innermost transaction surrounding them. In these situations, the code creating the spans will only add them to one of the two transactions, causing instrumentation gaps in the other. The product does not know how to deal with nested transactions yet. If the transactions share a `trace_id` they can be visualized in the trace view. To get to the trace view, one must follow a link from one of the transactions. However, when looking at any one transaction it is as if all the others did not exist (either ancestor or descendent transactions are not represented on screen). -There is also user confusion about what one would expect in the UI for such a situation (pseudocode): +There is also user confusion about what one would expect in the UI for a situation such as this one (pseudocode): ```python # if do_a_database_query returns 10 results, is the user @@ -296,29 +296,29 @@ Sentry's tracing experience is centered entirely around the part of a trace that If the SDK does not have a transaction going, a manual instrumentation by the user with a regular span is entirely lost. This is less of a concern on instrumented web servers, where a transaction is live most of the time, because automatic instrumentation starts and finishes a transaction for every incoming request. -The requirement of a transaction is specially challenging on frontends (browser, mobile, and desktop applications), because in those cases auto-instrumented transactions are less reliable as they typically only last for some time before being automatically finished. +The requirement of a transaction is especially challenging on frontends (browser, mobile, and desktop applications), because in those cases auto-instrumented transactions less reliably capture all spans, as they only last for a limited time before being automatically finished. -In our [example trace](#span-ingestion-model), the first span that originates the trace is due to a button click. If the button click `F*` was instrumented as a regular `span`, most likely no data from the frontend would be captured, but the other `B` and `S` spans would still be captured. +Another problem arises in situations where the trace starts with an operation which is only instrumented as a span, not a transaction. In our [example trace](#span-ingestion-model), the first span that originates the trace is due to a button click. If the button click `F*` were instrumented as a regular `span` rather than a transaction, most likely no data from the frontend would be captured. The `B` and `S` spans would still be captured, however, leading to an incomplete trace. In Sentry's model, if a span is not a transaction and there is no ancestor span that is a transaction, then the span won't be ingested. This, in turn, means there are many situations where a trace is missing crucial information that can help debug issues, particularly on the frontend where transactions need to end at some point but execution might continue. Automatic and manual instrumentation have a challenge deciding whether to start a span or a transaction, and the decision is particularly difficult considering that poor [scope propagation](#scope-propagation) limits the existence of concurrent transactions. -### Missing Web Vital Measurements +### Missing Web Vitals Measurements -Sentry's browser instrumentation collects Web Vital measurements. But, because those measurements are sent along to Sentry using the automatically instrumented transaction as a carrier, measurements that are made available by the browser after the automatic transaction has finished are lost. +Sentry's browser instrumentation collects Web Vitals measurements. But, because those measurements are sent along to Sentry using the automatically instrumented transaction as a carrier, measurements that are made available by the browser after the automatic transaction has finished are lost. -This causes transactions to be missing some Web Vitals or to have non-final measurements, for example for the LCP measurement. +This causes transactions to be missing some Web Vitals or to have non-final measurements for metrics like LCP. ### Unreliable Frontend Transaction Duration Because all data must go in a transaction. Sentry's browser SDK creates a transaction for every page load and every navigation. Those transactions must end at some time. -If the browser tab is closed before the transaction is finished and sent to Sentry, all collected data is lost. Therefore, the SDK needs to balance the risk of losing all data and capturing as much useful data as possible. +If the browser tab is closed before the transaction is finished and sent to Sentry, all collected data is lost. Therefore, the SDK needs to balance the risk of losing all data with the risk of collecting incomplete and potentially inaccurate data. -Transactions are finished after a certain idle time after the last activity has been observed (for example, outgoing HTTP requests). This means that the duration of the page load / navigation transaction is a rather arbitrary value that can't necessarily be compared or improved, as it doesn't indicate any concrete and undestandable duration. +Transactions are finished after a set time spent idle after the last activity (such as an outgoing HTTP request) is observed. This means that the duration of a page load or navigation transaction is a rather arbitrary value that can't necessarily be improved or compared to that of other transactions, as it doesn't accurately represent the duration of any concrete and understandable process. -We counter the limitation by focusing on the LCP Web Vital as the default performance metric for browser, but, as discussed above, the final LCP value might be lost. +We counter this limitation by focusing on the LCP Web Vital as the default performance metric for browsers. But, as discussed above, the LCP value may be sent before it's final, making this a less than ideal solution. ### In-memory Buffering Affects Servers @@ -338,7 +338,7 @@ The special treatment of transactions is incompatible with OpenTelemetry which m ## Summary -We have learned a lot with the current tracing implementation at Sentry. This document is an attempt to capture many of the known limitations to serve as basis for future improvement. +We have learned a lot through building the current tracing implementation at Sentry. This document is an attempt to capture many of the known limitations, in order to serve as the basis for future improvement. Tracing is a complex subject, and taming that complexity is no easy feat. @@ -348,4 +348,4 @@ Note that such changes would almost necessarily mean new major versions of SDKs The second group of issues related to the **span ingestion model** is a lot more complex as it affects more parts of the product and requires a coordinated effort from multiple teams to change. -Nonetheless, changes to the ingestion model would have an immeasurable impact on the product, as it could reduce the cognitive burden of instrumentation, collect more data, and improve efficiency. +Nonetheless, making changes to the ingestion model would have an immeasurable, positive impact on the product, as doing so would improve efficiency, allow us to collect more data, and reduce the cognitive burden of instrumentation. From 817fbfb15f407d75ae68918004b9b96867ba9a52 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 11:42:10 +0200 Subject: [PATCH 10/19] Revert changes unrelated to ingestion model --- src/docs/sdk/research/performance/index.mdx | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 409a9965f8..ceccd6890b 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -68,7 +68,7 @@ The implementation of the actual `trace` function is relatively simple (see [a P The following two examples synthesize the scope propagation issues. -### Cannot Determine Current Span +### 1. Cannot Determine Current Span Consider some auto-instrumentation code that needs to get a reference to the current `span`, a case in which manual scope propagation is not available. @@ -148,9 +148,9 @@ Note that other tracing libraries have the same kind of challenge. There are sev - [OpenTracing shim doesn't change context #2016](https://github.com/open-telemetry/opentelemetry-js/issues/2016) - [Http Spans are not linked / does not set parent span #2333](https://github.com/open-telemetry/opentelemetry-js/issues/2333) -### Conflicting Data Propagation Expectations +### 2. Conflicting Data Propagation Expectations -There is a conflict of expectations that appears whenever we add a `trace` function as discussed earlier, or simply try to address scope propagation with Zones. +There is a conflict of expectations that appear whenever we add a `trace` function as discussed earlier, or simply try to address scope propagation with Zones. The fact that the current `span` is stored in the `scope`, along with `tags`, `breadcrumbs` and more, makes data propagation messy as some parts of the `scope` are intended to propagate only into inner functions calls (for example, tags), while others are expected to propagate back into callers (for example, breadcrumbs), specially when there is an error. @@ -194,8 +194,6 @@ In the example above, if an error bubbles up the call stack we want to be able t All those different expectations makes it hard to reuse, in an understandable way, the current notion of `scope`, how breadcrumbs are recorded, and how those different concepts interact. -Finally, it is worth noting that the changes to restructure scope management most likely cannot be done without breaking existing SDK APIs. Existing SDK concepts - like hubs, scopes, breadcrumbs, user, tags, and contexts - would all have to be remodeled. - ## Span Ingestion Model Consider a trace depicted by the following span tree: From dbd7e30ffae5ea0a6d8cbfdaf5905e42dcaebaab Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 12:13:23 +0200 Subject: [PATCH 11/19] Rewording --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index ceccd6890b..1ccc453c65 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -228,7 +228,7 @@ In Sentry's span ingestion model, all spans that belong to a transaction must be In OpenTelemetry's model, spans are batched together as they are finished, and batches are sent as soon as either a) there are a certain number of spans in the batch or b) a certain amount of time has passed. In our example, it could mean that the first 3 `B` spans would be batched together and sent while the first `S*` transaction is still in progress in the storage service. Subsequently, other `B` spans would be batched together and sent as they finish, until eventually the `B*` transaction span is also sent. -While transactions are notably useful to group together spans and to explore operations of interest in Sentry, end users and instrumentation developers have an extra cognitive burden to understand and choose between a transaction or a span. +While transactions are notably useful as a way to group together spans and to explore operations of interest in Sentry, the form in which they currently exist imposes extra cognitive burden. Both SDK maintainers and end users have to understand and choose between a transaction or a span when writing instrumentation code. The issues that follow in the next few sections have been identified in the current ingestion model, and are all related to this dichotomy. From 12c3d5827753f3ab3df055b23adb1a45fb651f24 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 12:18:50 +0200 Subject: [PATCH 12/19] Rewording --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 1ccc453c65..1e9b514903 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -240,7 +240,7 @@ Sentry's model, unlike OpenTelemetry's, makes a hard distinction between two typ In memory, transaction spans and regular spans have one distinction: transaction spans have one extra attribute, the transaction `name`. -When serialized as JSON for ingestion, though, the differences are greater. Sentry SDKs must serialize regular spans to JSON in a format that directly resembles the in-memory spans. However, the serialization of a transaction span requires mapping its span attributes to a Sentry `Event` (originally used to report errors, expanded to fit new fields of exclusive use for transactions), while all child spans go as a list embedded in the `Event`. +When serialized as JSON, though, the differences are greater. Sentry SDKs serialize regular spans to JSON in a format that directly resembles the in-memory spans. By contrast, the serialization of a transaction span requires mapping its span attributes to a Sentry `Event` (originally used to report errors, expanded with new fields exclusively used for transactions), with all child spans embedded as a list in the `Event`. ### Transaction Spans Gain Event Attributes From bf2411c40312af1b6a39a9a4d2debc29311f4f8c Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 12:26:06 +0200 Subject: [PATCH 13/19] Rewording --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 1e9b514903..8c637ddd23 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -244,7 +244,7 @@ When serialized as JSON, though, the differences are greater. Sentry SDKs serial ### Transaction Spans Gain Event Attributes -When transactions are transformed from their in-memory representation to an `Event`, they gain more attributes that are not assignable to regular spans, such as `breadcrumbs`, `extra`, `contexts`, `event_id`, `fingerprint`, `release`, `environment`, `user`, etc. +When a transaction is transformed from its in-memory representation to an `Event`, it gains more attributes not assignable to regular spans, such as `breadcrumbs`, `extra`, `contexts`, `event_id`, `fingerprint`, `release`, `environment`, `user`, etc. ### Lifecycle Hooks From c0fc0a92d91236ff7f30594607c2914c71903f9f Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 12:28:44 +0200 Subject: [PATCH 14/19] Rewording --- src/docs/sdk/research/performance/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 8c637ddd23..b01965b408 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -259,7 +259,7 @@ However, it was also clear that some form of lifecycle hook was necessary, to al We ended up with the middle ground of allowing the mutation/dropping of transaction events through the use of an `EventProcessor` (a more general form of `BeforeSend`). This solves problems by giving users immediate access to their data before it goes out of the SDK, but it also has drawbacks in that it's more complicated to use than `BeforeSend` and also exposes the transaction duality, which was never intended to leak. -In OpenTelemetry, spans go through span processors, which are two lifecycle hooks: one when a span is started and one when it is ended. +By constrast, in OpenTelemetry spans go through span processors, which are two lifecycle hooks: one when a span is started and one when it is ended. ### Nested Transactions From 925258f91fb21fa22c6656fff84e6e95df43aea7 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 12:35:03 +0200 Subject: [PATCH 15/19] Rewording --- src/docs/sdk/research/performance/index.mdx | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index b01965b408..0369e18c06 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -259,13 +259,11 @@ However, it was also clear that some form of lifecycle hook was necessary, to al We ended up with the middle ground of allowing the mutation/dropping of transaction events through the use of an `EventProcessor` (a more general form of `BeforeSend`). This solves problems by giving users immediate access to their data before it goes out of the SDK, but it also has drawbacks in that it's more complicated to use than `BeforeSend` and also exposes the transaction duality, which was never intended to leak. -By constrast, in OpenTelemetry spans go through span processors, which are two lifecycle hooks: one when a span is started and one when it is ended. +By contrast, in OpenTelemetry spans go through span processors, which are two lifecycle hooks: one when a span is started and one when it is ended. ### Nested Transactions -On one hand, Sentry's ingestion model does not allow for nested transactions. Transactions are meant to mark service transitions. - -On the other hand, Sentry as a product requires that transactions exist. Data outside of a transaction is lost. +Sentry's ingestion model was not designed for nested transactions within a service. Transactions were meant to mark service transitions. In practice, SDKs have no way of preventing transactions from becoming nested. The end result is possibly surprising to users, as each transaction starts a new tree. The only way to relate those trees is through the `trace_id`. From 02eb411619d429f8001f74eb99840901667aab95 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 13:10:46 +0200 Subject: [PATCH 16/19] More rewrites --- src/docs/sdk/research/performance/index.mdx | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 0369e18c06..a9f6c4732f 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -271,7 +271,7 @@ Sentry's billing model is per event, be it an error event or a transaction event In SDKs, having a transaction within a transaction will cause inner spans to be "swallowed" by the innermost transaction surrounding them. In these situations, the code creating the spans will only add them to one of the two transactions, causing instrumentation gaps in the other. -The product does not know how to deal with nested transactions yet. If the transactions share a `trace_id` they can be visualized in the trace view. To get to the trace view, one must follow a link from one of the transactions. However, when looking at any one transaction it is as if all the others did not exist (either ancestor or descendent transactions are not represented on screen). +Sentry's UI is not designed to deal with nested transactions in a useful way. When looking at any one transaction it is as if all the other transactions in the trace did not exist (no other transaction is directly represented on the tree view). There is a trace view feature to visualize all transactions that share a `trace_id`, but the trace view only gives an overview of the trace by showing transactions and no child spans. There is no way to navigate to the trace view without first visiting some transaction. There is also user confusion about what one would expect in the UI for a situation such as this one (pseudocode): @@ -292,7 +292,7 @@ with transaction("index-page"): Sentry's tracing experience is centered entirely around the part of a trace that exists inside transactions. This means that data cannot exist outside of a transaction even if it exists in a trace. -If the SDK does not have a transaction going, a manual instrumentation by the user with a regular span is entirely lost. This is less of a concern on instrumented web servers, where a transaction is live most of the time, because automatic instrumentation starts and finishes a transaction for every incoming request. +If the SDK does not have a transaction going, regular spans created by instrumentation are entirely lost. That said, this is less of a concern on web servers, where automatically instrumented transactions start and finish with every incoming request. The requirement of a transaction is especially challenging on frontends (browser, mobile, and desktop applications), because in those cases auto-instrumented transactions less reliably capture all spans, as they only last for a limited time before being automatically finished. @@ -300,7 +300,9 @@ Another problem arises in situations where the trace starts with an operation wh In Sentry's model, if a span is not a transaction and there is no ancestor span that is a transaction, then the span won't be ingested. This, in turn, means there are many situations where a trace is missing crucial information that can help debug issues, particularly on the frontend where transactions need to end at some point but execution might continue. -Automatic and manual instrumentation have a challenge deciding whether to start a span or a transaction, and the decision is particularly difficult considering that poor [scope propagation](#scope-propagation) limits the existence of concurrent transactions. +Automatic and manual instrumentation have the challenge of deciding whether to start a span or a transaction, and the decision is particularly difficult considering that: +- If there is no transaction, then the span is lost. +- If there is already a transaction, then there is the [nested transactions](#nested-transactions) issue. ### Missing Web Vitals Measurements From 03815f01de834e0c47c3df2e037d55a91f909eb6 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 13:29:39 +0200 Subject: [PATCH 17/19] Rewrite --- src/docs/sdk/research/performance/index.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index a9f6c4732f..98d99426ce 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -322,15 +322,15 @@ We counter this limitation by focusing on the LCP Web Vital as the default perfo ### In-memory Buffering Affects Servers -Any system that may operate with several concurrent transactions at one time, which is the case of web servers, will be required to buffer complete span trees in memory. +As discussed earlier, the current ingestion model requires Sentry SDKs to observe complete span trees in memory. Applications that operate with a constant flow of concurrent transactions will require considerable system resources to collect and process tracing data. Web servers are the typical case that exhibit this problem. -This means recording 100% of spans and 100% of transactions for many server-side applications, even in a simplified form, is not feasible due to the overhead incurred. +This means that recording 100% of spans and 100% of transactions is not feasible for many server-side applications, because the overhead incurred is just too high. -### Lack of Batch Ingestion +### Inability to Batch Transactions -When multiple concurrent transactions finish nearly at the same time, SDKs must send as many requests as there are transactions. There is no provision to batching multiple transactions into a single request. +Sentry's ingestion model does not support ingesting multiple events at once. In particular, SDKs cannot batch multiple transactions into a single request. -This inneficiency means at least extra bandwidth consumption for frontends (browser, mobile) and extra overhead for backends. +As a result, when multiple transactions finish at nearly the same time, SDKs are required to make a separate request for each transaction. This behavior is at best highly inefficient and at worst a significant and problematic drain on resources such as network bandwidth and CPU cycles. ### Compatibility From 15ac50cfb47628e7e39264724f0047c59da603fb Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 14:07:07 +0200 Subject: [PATCH 18/19] Rewrite --- src/docs/sdk/research/performance/index.mdx | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 98d99426ce..85a8633060 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -334,7 +334,9 @@ As a result, when multiple transactions finish at nearly the same time, SDKs are ### Compatibility -The special treatment of transactions is incompatible with OpenTelemetry which means we cannot implement an OpenTelemetry Exporter that can feed data into Sentry (though we have a [Sentry Exporter with a major correctness limitation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sentryexporter#known-limitations)). Likewise we cannot leverage OpenTelemetry SDKs and instrumentations. +The special treatment of transaction spans is incompatible with OpenTelemetry. Users with existing applications instrumented with OpenTelemetry SDKs cannot easily use Sentry to ingest and analyze their data. + +Sentry does provide a Sentry Exporter for the OpenTelemetry Collector, but, due to the current ingestion model, [the Sentry Exporter has a major correctness limitation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sentryexporter#known-limitations). ## Summary From c88b4928d7a081ed0b60be26b655a3ab764b3592 Mon Sep 17 00:00:00 2001 From: Rodolfo Carvalho Date: Mon, 26 Jul 2021 14:30:03 +0200 Subject: [PATCH 19/19] Rewrite --- src/docs/sdk/research/performance/index.mdx | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/docs/sdk/research/performance/index.mdx b/src/docs/sdk/research/performance/index.mdx index 85a8633060..a80d607d77 100644 --- a/src/docs/sdk/research/performance/index.mdx +++ b/src/docs/sdk/research/performance/index.mdx @@ -344,10 +344,8 @@ We have learned a lot through building the current tracing implementation at Sen Tracing is a complex subject, and taming that complexity is no easy feat. -The first group of issues related to **scope propagation** is a concern exclusive to SDKs and how they are designed. Addressing that will require internal architecture changes to all SDKs, including the redesign of old features like breadcrumbs. Those would be pre-requisites to implementing simple-to-use tracing helpers like a `trace` function that works in any context and captures accurate and reliable performance data. +Issues in the first group - those related to **scope propagation** - are a concern exclusive to SDKs and how they are designed. Addressing them will require internal architecture changes to all SDKs, including the redesign of old features like breadcrumbs, but making such changes is a prerequisite for implementing simple-to-use tracing helpers like a `trace` function that works in any context and captures accurate and reliable performance data. Note that such changes would almost certainly mean releasing new major versions of SDKs that break compatibility with existing versions. -Note that such changes would almost necessarily mean new major versions of SDKs that break compatibility with existing versions. - -The second group of issues related to the **span ingestion model** is a lot more complex as it affects more parts of the product and requires a coordinated effort from multiple teams to change. +Issues in the second group - those related to the **span ingestion model** - are a lot more complex, as any changes made to solve them would affect more parts of the product and require a coordinated effort from multiple teams. Nonetheless, making changes to the ingestion model would have an immeasurable, positive impact on the product, as doing so would improve efficiency, allow us to collect more data, and reduce the cognitive burden of instrumentation.