-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Reduce and optimize number of product grading calls using a Chord
#12914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Reduce and optimize number of product grading calls using a Chord
#12914
Conversation
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
2193b93
to
0abf18d
Compare
0abf18d
to
6008faf
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
6008faf
to
5bed9bb
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
5bed9bb
to
72e95c3
Compare
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Chord
The current implementation of the import and reimport executes the product grading for each finding that is imported. This results in lots of background celery tasks putting a non-trivial strain on the database. In large imports it could mean minutes of extra processing purely for product grading all of the same product.
The same happens in the
async_dupe_delete
task that runs every minute. And it happens if a product or engagement is deleted.This PR optimizes these cases. The import/reimport case uses a celery chord to reduce this to only a couple of product grading.
By design the chord will only start processing once all tasks have been submitted to it. That's why we ramp up processing with multiple chords starting with small sizes increasing exponentially up to 1024.
If product grading is disabled we use a
group
and no product grading.To be able to use a chord, we need to generate signatures which are similar to lambda's than can be passed around (to celery tasks/chords/groups). This needs a little custom code because we have the
@dojo_async_task
decorator which wraps around@app.task
. If we use similar celery constructs more in the future, we may want to remove@dojo_async_task
and/or replace it with something simpler.Notes: