Skip to content

Conversation

@lynnagara
Copy link
Member

@lynnagara lynnagara commented Jul 16, 2019

This is the second part of #13905, which updates the group model to use events from Snuba instead of Postgres. This PR refactors the filter_by_event_id function.

@lynnagara lynnagara requested a review from a team July 16, 2019 17:52
@lynnagara lynnagara changed the base branch from ref/group-model-1 to master July 16, 2019 17:54
@lynnagara lynnagara changed the base branch from master to ref/group-model-1 July 16, 2019 17:55
Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions but overall it seems ok

)
from sentry.utils import snuba

group_ids = set([evt['issue'] for evt in snuba.raw_query(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, I think this code would be more readable without the list comprehension.

from sentry.utils import snuba

group_ids = set([evt['issue'] for evt in snuba.raw_query(
start=datetime.utcfromtimestamp(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the retention policy we have in snuba the same as the one we have in Event and EventMapping table? Snuba will apply can apply a default maximum timespan.
https://github.com/getsentry/snuba/blob/master/snuba/api.py#L170

@JTCunning Do we use that setting in production?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have this setting in Production, but this should be scoped to an organization's retention.

You can find other queries scoped to retention like so:

retention = quotas.get_event_retention(organization=projects[0].organization)
if retention:
retention_window_start = timezone.now() - timedelta(days=retention)
else:
retention_window_start = None
# TODO: This could be optimized when building querysets to identify
# criteria that are logically impossible (e.g. if the upper bound
# for last seen is before the retention window starts, no results
# exist.)
if retention_window_start:
group_queryset = group_queryset.filter(last_seen__gte=retention_window_start)

This also has lead me to believe that we have most likely not properly scoped other SnubaEvent-based queries to organization retention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Slack, Sentry's snuba client shrinks every time window to be inside retention.

# any project will do, as they should all be from the same organization
project = Project.objects.get(pk=project_ids[0])
retention = quotas.get_event_retention(
organization=Organization(project.organization_id)
)
if retention:
start = max(start, datetime.utcnow() - timedelta(days=retention))
if start > end:
raise QueryOutsideRetentionError
# if `shrink_time_window` pushed `start` after `end` it means the user queried
# a Group for T1 to T2 when the group was only active for T3 to T4, so the query
# wouldn't return any results anyway
new_start = shrink_time_window(query_params.filter_keys.get('issue'), start)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which goes back to my original question.
Are we going to return fewer results than in the previous implementation since there is no retention applied to the postgres query ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Those are deleted at 90d every 5 minutes, and if we weren't previously capping the 30d retention, now is the time to.

'event_id': [event_id],
'project_id': project_ids,
},
limit=1000,
Copy link
Contributor

@fpacifici fpacifici Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this query ever return more than 1 value? We may be able to make it more efficient for clickhouse if we could limit the results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically I think it can since event IDs are only unique by project, not org. We only use this code here to detect a direct hit https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/organization_group_index.py#L129 and we end up discarding all other results apart from the first one anyway. I think it would be better to just return a single result here for direct hit, but not sure if we'll want to tackle that here.

Copy link
Member Author

@lynnagara lynnagara Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this function to only look for a single direct hit, since that's the only place where we actually use this code. As discussed on Slack, we need to preserve the existing behavior since event IDs are only unique by project. Updated limit to len(project_ids)

@lynnagara lynnagara changed the base branch from ref/group-model-1 to master July 17, 2019 18:17
@lynnagara lynnagara changed the title feat: Use Snuba for Group.filter_by_event_id feat: Update group direct hit behavior to use Snuba Jul 17, 2019
This is the second part of
#13905, which updates
the group model to use events from Snuba instead of Postgres.
This PR updates the filter_by_event_id function.
@lynnagara lynnagara force-pushed the feat/group-model-2 branch from 99a56f9 to e9403d9 Compare July 17, 2019 22:58
@lynnagara lynnagara changed the title feat: Update group direct hit behavior to use Snuba feat: Update Group.filter_by_event_id to use Snuba Jul 18, 2019
@lynnagara lynnagara merged commit 076da73 into master Jul 19, 2019
@lynnagara lynnagara deleted the feat/group-model-2 branch July 19, 2019 18:31
@github-actions github-actions bot locked and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants