Skip to content

Conversation

valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Aug 4, 2025

The current implementation of the import and reimport executes the product grading for each finding that is imported. This results in lots of background celery tasks putting a non-trivial strain on the database. In large imports it could mean minutes of extra processing purely for product grading all of the same product.
The same happens in the async_dupe_delete task that runs every minute. And it happens if a product or engagement is deleted.

This PR optimizes these cases. The import/reimport case uses a celery chord to reduce this to only a couple of product grading.

By design the chord will only start processing once all tasks have been submitted to it. That's why we ramp up processing with multiple chords starting with small sizes increasing exponentially up to 1024.
If product grading is disabled we use a group and no product grading.

To be able to use a chord, we need to generate signatures which are similar to lambda's than can be passed around (to celery tasks/chords/groups). This needs a little custom code because we have the @dojo_async_task decorator which wraps around @app.task. If we use similar celery constructs more in the future, we may want to remove @dojo_async_task and/or replace it with something simpler.

Notes:

  • Instead of a chord I also tried just launching the tasks directly and await their results. But this involved setting timeouts which can be hard for large imports or busy servers. I considered batching/chaining, but then we would just be reimplementing celery. So I settled for the multi-chord approach. A single chord approach would defer the start of post processing until after all findings would have been created.
  • I tried to solve this by generating and yielding signatures and use a the generator to cosntruct a chord. But it looks like this used to work in Celery 4, but no longer works in Celery 5. So multi-chord it is.

Copy link
Contributor

github-actions bot commented Aug 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

github-actions bot commented Aug 4, 2025

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link
Contributor

github-actions bot commented Sep 6, 2025

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@valentijnscholten valentijnscholten changed the title Perf4 chord grade Reduce and optimize number of product grading calls using a Chord Sep 13, 2025
@valentijnscholten valentijnscholten marked this pull request as ready for review September 15, 2025 16:57
@valentijnscholten valentijnscholten added this to the 2.51.0 milestone Sep 15, 2025
@valentijnscholten valentijnscholten marked this pull request as draft September 16, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants