-
-
Couldn't load subscription status.
- Fork 220
ref(DSC): Only include user_id in DSC if the send-default-pii option is enabled
#625
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
| - `public_key` (string) - Public key from the DSN used by the SDK. It allows Sentry to sample traces spanning multiple projects, by resolving the same set of rules based on the starting project. | ||
| - `sample_rate` (string) - The sample rate as defined by the user on the SDK. This string should always be a number between (and including) 0 and 1 in basic float notation (`0.04242`) - no funky business like exponents or anything similar. If a `tracesSampler` callback was used for the sampling decision, its result should be used for `sample_rate` instead of the `tracesSampleRate` from `SentryOptions`. In case `tracesSampler` returns `True` it should be sent as `1.0`, `False` should be sent as `0.0`. | ||
| - `release` (string) - The release name as specified in client options`. | ||
| - `environment` (string) - The environment name as specified in client options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So only sending this in the envelope header if tracingOrigins allows would mean we have to check tracingOrigins against the DSN or proxy to see if we're allowed to send it, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to user_id? Could you elaborate what you mean here? Generally, in the context of this PR we're not including tracingOrigins in any way or condition. Just wanted to mention in the description that discussions around it are in the works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this referred to user_id. Once discussed we could add something to clarify whether we need to check tracingOrigins against DSN / proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently, the JS SDK's BrowserTracing integration does not check for the DSN (or a proxy) but it only determines if the URL of the outgoing request matches a string or regex specified in tracingOrigins. So far, we haven't considered changing this but sincetracingOrigins is gonna be standardized, we can certainly discuss this. I would suggest though, discussing this in a separate issue about the tracingOrigins spec/changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Java we have our own transport(s) for communication with Sentry so we'd have to separately add the check there if we so choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracingOrigins doesn't really matter when determining what is sent to Sentry. tracingOrigins should simply control which targets should receive a) a sentry-trace header b) DSC in a baggage header.
This shouldn't concern transports but rather request instrumentation. I might be wrong though since I don't know specifics on the Java SDK.
| All of the values in the payloads below are required (non-optional) in a sense, that when they are known to an SDK at the time a transaction envelope is sent to Sentry, or at the time a baggage header is propagated, they must also be included in said envelope or baggage. | ||
|
|
||
| The **only exception** is the `user_id` field. | ||
| To avoid sensitive data being leaked to third parties, `user_id` should only be included, if the <Link to="/sdk/data-handling/#sensitive-data">`send-default-pii`</Link> option was enabled in the init options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify what send-default-pii controls exactly? Only new DSC or also DSC from incoming requests?
I'm really not sure about this one. If we're not propagating user_id when it's incoming, we're mutating the DSC which is bad. On the other hand, if we just propagate it, we give users a PII foot gun in the form of a tracing product - if not even a foot bazooka.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it for new DSC - especially since the data here is based on the head SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also rename this to send_default_pii to match snake case (then folks know to use camel case if needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only new DSC or also DSC from incoming requests
Let's make it for new DSC
Yes, I strongly agree with only new DSC. We have to adapt this in all SDKs and hence, eventually, we should not get a user_id in incoming DSC. Also, I don't want to allow exceptions in the DSC immutability for now.
Can we also rename this to send_default_pii
Certainly, the only reason why I wrote it like this is because it was already written in that format in the dev docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're also good for the case that downstream SDKs set different sendDefaultPii values than the head-of-trace SDK. Users have to set DS rules on the DSC values from the head SDK anyway, meaning this is the only SDK where the Pii decision should matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note that this only applies for the head-of-trace SDK that populates the DSC content. | ||
| In case downstream SDKs receive a `user_id` in incoming DSC, they should continue to propagate it. The rationale behind this is | ||
| that incoming DSC [must be frozen](#freezing-dynamic-sampling-context) and users must make DS rules based on the head SDK's DSC data anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying this!
This PR adds a condition to including the
user_idfield in the dynamic sampling context (DSC) that is sent via thetraceenvelope header to relay or propagated via thebaggageHttp header in outgoing requests.As discussed in TSC and in Slack, adding
user_idto the DSC raises a PII concern because many SDKs currently propagate the DSC viabaggagein all outgoing Http requests. This means, that PII could be sent to third parties. To avoid this, the decision was made to only include theuser_idif thesend-default-piiinit option was set totrue.It's important to note that another way of addressing issues like this one is in discussion, which is making the
tracingOriginsoption part of every SDK. With this option, users can clearly specify where tracing data should be sent to, and thereby also disable the propagation of DSC to third parties. Until this is defined more clearly and ready for development, we propose the solution added with this PR in the meantime.Resolves #626