fix(deletions) Remove events when groups/projects are deleted #41937

markstory · 2022-12-01T18:47:23Z

Eagerly purge events when projects are removed. I've also modified the Project delete task so that it uses the existing group deletion task instead of doing half the work as Project deletion task was missing many relations of Group.

Eagerly purge events when projects are removed. I've also modified the Project delete task so that it uses the existing group deletion task instead of doing half the work as Project deletion task was missing *many* relations of Group.

wedamija · 2022-12-01T18:54:36Z

src/sentry/deletions/defaults/group.py

+        eventstream_state = eventstream.start_delete_groups(self.project_id, [self.group_id])
+        eventstream.end_delete_groups(eventstream_state)


I'm not sure if there's a problem with issuing a lot of separate replacements like this, instead of batching them

I'm a bit concerned about the volume of replacements this could generate as well. Projects could have 1000's of issues.

Only send a replacement to eventstream after all the node data has been removed. Removing the event data early results in reports and events being left behind

volokluev · 2022-12-02T23:55:05Z

This has a big risk of plugging up the replacement queue when a project is deleted if we send them 1 by 1. How hard is it to batch these things? We should not merge this as is

markstory · 2022-12-05T15:10:25Z

How hard is it to batch these things?

It isn't simple as ProjectDeletionTask uses the GroupDeletionTask to remove groups one at a time, and thus each group removal task only knows about the current group. Another complication is that the BulkModelDeletionTask implementation doesn't support deleting child relations.

We would have to replace the groupdeletion task with a more specialized implementation that quacked like a bulk deletion but used a mix of instance and bulk deletion for its relations.

wedamija · 2022-12-05T17:42:39Z

The top projects have 1mm+ issues. Is removing that many issues from snuba via replacements even feasible for projects that large?

markstory · 2022-12-05T22:31:50Z

Is removing that many issues from snuba via replacements even feasible for projects that large?

That could be a reason it has not historically been part of project deletion.

wedamija · 2022-12-05T22:35:02Z

Is removing that many issues from snuba via replacements even feasible for projects that large?

That could be a reason it has not historically been part of project deletion.

Yeah... if we batched them in groups of 1k, then we'd still issue 1k replacements. @volokluev I don't really know the full impact of replacements. How expensive is a replacement with 1 group vs a replacement with say 1k-10k groups?

volokluev · 2022-12-06T00:00:06Z

Yes removing that many issues is feasible via replacements. In terms of the impact of batching/not batching replacements it's huge.

In order to get the rows to delete there is a sql query that happens:

https://github.com/getsentry/snuba/blob/master/snuba/datasets/errors_replacer.py#L810-L815

That query will be batched (thus reducing query load).

Then in order to write the "tombstone rows" to mark them for deletion, an insert call will be made

https://github.com/getsentry/snuba/blob/master/snuba/datasets/errors_replacer.py#L637-L642

That insert call will happen once per kafka message on the repacements topic. If you were to do these one by one, it would result in as many insert calls as there are groups. Every insert call is a call to zookeeper and a new created file. We don't want that

wedamija · 2022-12-06T00:06:14Z

Yes removing that many issues is feasible via replacements. In terms of the impact of batching/not batching replacements it's huge.

In order to get the rows to delete there is a sql query that happens:

https://github.com/getsentry/snuba/blob/master/snuba/datasets/errors_replacer.py#L810-L815

That query will be batched (thus reducing query load).

Then in order to write the "tombstone rows" to mark them for deletion, an insert call will be made

https://github.com/getsentry/snuba/blob/master/snuba/datasets/errors_replacer.py#L637-L642

That insert call will happen once per kafka message on the repacements topic. If you were to do these one by one, it would result in as many insert calls as there are groups. Every insert call is a call to zookeeper and a new created file. We don't want that

Is there a recommended max size for a replacement? I assume we can't shove 100k group ids into a single one.

markstory · 2022-12-06T15:00:50Z

I was thinking more about this last night and I have an idea on how to batch these deletions for more than a single group at a time. I'll make the batch size for groups and events something we can easily change so we can tune deletions if batches are too big/small.

By bulk deleting relations with `in` operations we can delete from multiple groups in a blended batch.

markstory · 2022-12-08T03:16:41Z

@wedamija I've revised group deletion so that group deletion can operate in batches of 100 groups. Each batch of 100 groups will emit a single replacement message.

markstory · 2022-12-08T15:28:00Z

/gcbrun

wedamija · 2022-12-08T22:16:14Z

@wedamija I've revised group deletion so that group deletion can operate in batches of 100 groups. Each batch of 100 groups will emit a single replacement message.

I'm going to try to review this today/tomorrow, but just have a few things going at the moment. In the mean time, @volokluev what's a reasonable max number of replacements to generate here?

https://redash.getsentry.net/queries/3373/source: Here's a breakdown of the top projects by active group. Worst case is 5mm, but there's a long tail of 100k+ orgs. So if we batch in groups of 100, we might issue 50k replacements in the worse case. For other large orgs, we might issue 1k. Wondering if we can bump this up to 1k/10k groups batched just to avoid these edge cases?

wedamija

This looks good to me, but I would like to get clarification from SNS on what a reasonable chunk size is here, given the info in my previous comment. Just don't want to have an incident come up because some project with 1mm+ groups gets deleted.

wedamija · 2022-12-22T23:44:11Z

src/sentry/deletions/defaults/project.py

            ]
        )
-
-        model_list = (models.GroupMeta, models.GroupResolution, models.GroupSnooze)


Why do we remove these? Just already handled in the direct group deletion?

Yes. Project now has Group as a child_relation and Group has these models as child_relations.

This was likely a workaround to be able to bulk delete across multiple groups. That isn't necessary anymore as I've changed bulk deletions to support __in lookups as well.

…#41937)" This reverts commit 9618872.

getsentry-bot · 2023-01-06T21:02:11Z

PR reverted: bff380d

…#41937)" This reverts commit 9618872. Co-authored-by: asottile-sentry <[email protected]>

… deleted (#41937)"" This reverts commit bff380d which broke deploys because it used a dev dependency in production code.

Mulligan on #41937 that doesn't use itertools_more as that is a dev only dependency.

markstory requested a review from a team December 1, 2022 18:47

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 1, 2022

vercel bot deployed to Preview – sentry December 1, 2022 18:49 View deployment

vercel bot deployed to Preview – storybook December 1, 2022 18:49 View deployment

wedamija reviewed Dec 1, 2022

View reviewed changes

Remove events at the end.

11c4998

Only send a replacement to eventstream after all the node data has been removed. Removing the event data early results in reports and events being left behind

vercel bot deployed to Preview – sentry December 2, 2022 19:02 View deployment

vercel bot deployed to Preview – storybook December 2, 2022 19:03 View deployment

markstory added 2 commits December 7, 2022 16:06

Improve bulk_delete_objects so that it can do in list deletions

3df566c

Improve group deletion so it can handle batches of groups better.

42cdfe6

By bulk deleting relations with `in` operations we can delete from multiple groups in a blended batch.

vercel bot deployed to Preview – sentry December 8, 2022 03:16 View deployment

vercel bot deployed to Preview – storybook December 8, 2022 03:17 View deployment

wedamija reviewed Dec 22, 2022

View reviewed changes

wedamija approved these changes Dec 22, 2022

View reviewed changes

Merge branch 'master' into fix-deletion-events

04755e0

vercel bot deployed to Preview – sentry January 4, 2023 20:26 View deployment

vercel bot deployed to Preview – storybook January 4, 2023 20:28 View deployment

Increase batch size for groups.

c82f186

vercel bot deployed to Preview – storybook January 6, 2023 18:48 View deployment

vercel bot deployed to Preview – sentry January 6, 2023 18:48 View deployment

markstory merged commit 9618872 into master Jan 6, 2023

markstory deleted the fix-deletion-events branch January 6, 2023 19:23

mattgauntseo-sentry added a commit that referenced this pull request Jan 6, 2023

Adding more-itertools dep which was introduced in #41937

a8894d6

markstory added a commit that referenced this pull request Jan 6, 2023

Revert "fix(deletions) Remove events when groups/projects are deleted (…

34cfe3b

…#41937)" This reverts commit 9618872.

markstory mentioned this pull request Jan 6, 2023

Revert "fix(deletions) Remove events when groups/projects are deleted" #42902

Closed

asottile-sentry added the Trigger: Revert Add to a merged PR to revert it (skips CI) label Jan 6, 2023

getsentry-bot added a commit that referenced this pull request Jan 6, 2023

Revert "fix(deletions) Remove events when groups/projects are deleted (…

bff380d

…#41937)" This reverts commit 9618872. Co-authored-by: asottile-sentry <[email protected]>

markstory added a commit that referenced this pull request Jan 6, 2023

Revert "Revert "fix(deletions) Remove events when groups/projects are…

f2c5d2e

… deleted (#41937)"" This reverts commit bff380d which broke deploys because it used a dev dependency in production code.

markstory mentioned this pull request Jan 6, 2023

fix(deletions) Remove events when groups/projects are deleted #42907

Merged

hubertdeng123 mentioned this pull request Jan 9, 2023

ModuleNotFoundError: No module named 'more_itertools' getsentry/self-hosted#1900

Closed

markstory added a commit that referenced this pull request Jan 9, 2023

fix(deletions) Remove events when groups/projects are deleted (#42907)

5906d35

Mulligan on #41937 that doesn't use itertools_more as that is a dev only dependency.

github-actions bot locked and limited conversation to collaborators Jan 22, 2023

		eventstream_state = eventstream.start_delete_groups(self.project_id, [self.group_id])
		eventstream.end_delete_groups(eventstream_state)

Uh oh!

fix(deletions) Remove events when groups/projects are deleted #41937

fix(deletions) Remove events when groups/projects are deleted #41937

Uh oh!

Conversation

markstory commented Dec 1, 2022

Uh oh!

wedamija Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

markstory Dec 2, 2022

Choose a reason for hiding this comment

Uh oh!

volokluev commented Dec 2, 2022

Uh oh!

markstory commented Dec 5, 2022

Uh oh!

wedamija commented Dec 5, 2022

Uh oh!

markstory commented Dec 5, 2022

Uh oh!

wedamija commented Dec 5, 2022

Uh oh!

volokluev commented Dec 6, 2022

Uh oh!

wedamija commented Dec 6, 2022

Uh oh!

markstory commented Dec 6, 2022

Uh oh!

markstory commented Dec 8, 2022

Uh oh!

markstory commented Dec 8, 2022

Uh oh!

wedamija commented Dec 8, 2022

Uh oh!

wedamija left a comment

Choose a reason for hiding this comment

Uh oh!

wedamija Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

markstory Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

getsentry-bot commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants