-
Notifications
You must be signed in to change notification settings - Fork 563
feat(pii): Sanitize URLs in Span description and breadcrumbs #1876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
antonpirker
merged 31 commits into
master
from
antonpirker/1742-remove-sensitive-data-from-urls
Feb 16, 2023
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
74e9c08
Strip sensitive data from URLs. refs 1742
antonpirker 71d5e4a
Better function name
antonpirker 743c3d1
Check send_default_pii before sanitizing url.
antonpirker 0560891
Ignore typing on named tuples
antonpirker 5d25063
Make it run in Python 2
antonpirker a16b5ab
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 63df676
Split url into url, query and fragment
antonpirker 4db535f
Some type fixes
antonpirker ce56e93
Preventing circular import
antonpirker e418033
Fixed some tests
antonpirker 5bbd781
Make url a string to fix tests
antonpirker fcbd8d7
Fixing httpx tests again
antonpirker 2bd870c
Fixing tests
antonpirker 72a4675
Fix tests for old Python versions
antonpirker e8e05e9
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 1639cc4
Fix tests with fragments in old Python versions
antonpirker 11b9bf2
Merge branch 'antonpirker/1742-remove-sensitive-data-from-urls' of gi…
antonpirker c67be60
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 90eb4db
Fixed utf8 chars in Python 2.7
antonpirker 9215f45
Cleanup
antonpirker 68dda23
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 8a17864
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 72a9305
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 5c074d1
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker 1482ac6
Moved import outside of function
antonpirker 6a82959
Revert "Moved import outside of function"
antonpirker 773ed80
Always remove authority, but for now to not filter query values
antonpirker 51ab32d
Moved import to the bottom of file to prevent circular import
antonpirker 4eaafc0
Revert "Moved import to the bottom of file to prevent circular import"
antonpirker 3022143
Moved SENSITIVE_DATA_SUBSTITUTE to utils.py to prevent circular imports
antonpirker 75bea04
Merge branch 'master' into antonpirker/1742-remove-sensitive-data-fro…
antonpirker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,25 @@ | |
import sys | ||
import threading | ||
import time | ||
from collections import namedtuple | ||
|
||
try: | ||
# Python 3 | ||
from urllib.parse import parse_qs | ||
from urllib.parse import unquote | ||
from urllib.parse import urlencode | ||
from urllib.parse import urlsplit | ||
from urllib.parse import urlunsplit | ||
|
||
except ImportError: | ||
# Python 2 | ||
from cgi import parse_qs # type: ignore | ||
from urllib import unquote # type: ignore | ||
from urllib import urlencode # type: ignore | ||
from urlparse import urlsplit # type: ignore | ||
from urlparse import urlunsplit # type: ignore | ||
|
||
|
||
from datetime import datetime | ||
from functools import partial | ||
|
||
|
@@ -43,13 +62,14 @@ | |
|
||
epoch = datetime(1970, 1, 1) | ||
|
||
|
||
# The logger is created here but initialized in the debug support module | ||
logger = logging.getLogger("sentry_sdk.errors") | ||
|
||
MAX_STRING_LENGTH = 1024 | ||
BASE64_ALPHABET = re.compile(r"^[a-zA-Z0-9/+=]*$") | ||
|
||
SENSITIVE_DATA_SUBSTITUTE = "[Filtered]" | ||
|
||
|
||
def json_dumps(data): | ||
# type: (Any) -> bytes | ||
|
@@ -374,8 +394,6 @@ def removed_because_over_size_limit(cls): | |
def substituted_because_contains_sensitive_data(cls): | ||
# type: () -> AnnotatedValue | ||
"""The actual value was removed because it contained sensitive information.""" | ||
from sentry_sdk.consts import SENSITIVE_DATA_SUBSTITUTE | ||
|
||
return AnnotatedValue( | ||
value=SENSITIVE_DATA_SUBSTITUTE, | ||
metadata={ | ||
|
@@ -1163,6 +1181,79 @@ def from_base64(base64_string): | |
return utf8_string | ||
|
||
|
||
Components = namedtuple("Components", ["scheme", "netloc", "path", "query", "fragment"]) | ||
|
||
|
||
def sanitize_url(url, remove_authority=True, remove_query_values=True): | ||
# type: (str, bool, bool) -> str | ||
""" | ||
Removes the authority and query parameter values from a given URL. | ||
""" | ||
parsed_url = urlsplit(url) | ||
query_params = parse_qs(parsed_url.query, keep_blank_values=True) | ||
|
||
# strip username:password (netloc can be usr:[email protected]) | ||
if remove_authority: | ||
netloc_parts = parsed_url.netloc.split("@") | ||
if len(netloc_parts) > 1: | ||
netloc = "%s:%s@%s" % ( | ||
SENSITIVE_DATA_SUBSTITUTE, | ||
SENSITIVE_DATA_SUBSTITUTE, | ||
netloc_parts[-1], | ||
) | ||
else: | ||
netloc = parsed_url.netloc | ||
else: | ||
netloc = parsed_url.netloc | ||
|
||
# strip values from query string | ||
if remove_query_values: | ||
query_string = unquote( | ||
urlencode({key: SENSITIVE_DATA_SUBSTITUTE for key in query_params}) | ||
) | ||
else: | ||
query_string = parsed_url.query | ||
|
||
safe_url = urlunsplit( | ||
Components( | ||
scheme=parsed_url.scheme, | ||
netloc=netloc, | ||
query=query_string, | ||
path=parsed_url.path, | ||
fragment=parsed_url.fragment, | ||
) | ||
) | ||
|
||
return safe_url | ||
|
||
|
||
ParsedUrl = namedtuple("ParsedUrl", ["url", "query", "fragment"]) | ||
|
||
|
||
def parse_url(url, sanitize=True): | ||
|
||
# type: (str, bool) -> ParsedUrl | ||
""" | ||
Splits a URL into a url (including path), query and fragment. If sanitize is True, the query | ||
parameters will be sanitized to remove sensitive data. The autority (username and password) | ||
in the URL will always be removed. | ||
""" | ||
url = sanitize_url(url, remove_authority=True, remove_query_values=sanitize) | ||
|
||
parsed_url = urlsplit(url) | ||
base_url = urlunsplit( | ||
Components( | ||
scheme=parsed_url.scheme, | ||
netloc=parsed_url.netloc, | ||
query="", | ||
path=parsed_url.path, | ||
fragment="", | ||
) | ||
) | ||
|
||
return ParsedUrl(url=base_url, query=parsed_url.query, fragment=parsed_url.fragment) | ||
|
||
|
||
if PY37: | ||
|
||
def nanosecond_time(): | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.