Skip to content
This repository was archived by the owner on Apr 25, 2024. It is now read-only.

Evaluate pending nodes in parallel #524

Closed
wants to merge 23 commits into from
Closed

Evaluate pending nodes in parallel #524

wants to merge 23 commits into from

Conversation

iFrostizz
Copy link
Contributor

@iFrostizz iFrostizz commented Jul 5, 2023

When having a lot of branching, it may take a lot of time to evaluate proofs on pending nodes because this is done sequentially.
In this case, we can run them on the pending leaves by running the proofs in parallel at a small cost.
This feature adds a new parameter max_workers with a default of 1 for the old behavior in the advance_proof of the Prover and will cap the amount of proving processes running in parallel whenever there are these pending nodes.

@iFrostizz
Copy link
Contributor Author

This may lead to significant efficiency improvements in proofs that are branching a lot !
For the ArithmeticTest.test_max1 which has one branch:

Before (one kore client executing requests sequentially):
image

After (with 3 clients in parallel, although only two were used):
image

@iFrostizz iFrostizz self-assigned this Jul 6, 2023
@iFrostizz iFrostizz requested a review from tothtamas28 July 6, 2023 21:34
Copy link
Collaborator

@tothtamas28 tothtamas28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. Parallelizing KCFG exploration is a feature we've been contemplating for a long time but so far no one actually attempted to do it. This is a step in the right direction.

Based on your submission though I now think that introducing this feature in the codebase might require further planning, and possibly multiple enabling changes. Below are a few thoughts I have on the issue.


  1. We need an approach that makes locking in the client unnecessary. Each thread in the pool should be assigned its own server (to be more precise, a port number to a running server). Instantiating a client is not a big overhead so it can be done on-demand inside the handler. We can even introduce an abstraction for this so that algorithms can be coded against it (e.g. KoreServerPool).

  2. Class KCFGExplore has too many responsibilities. On one hand, it implements logic, on the other hand, it manages resources. Resource management should be factored out of this class (i.e. KCFGExplore should receive a KoreClient as a constructor argument). If this is done, instantiating KCFGExplore inside the handler will also become an option.

  3. We need a thread-safe data model. It must be ensured that no race condition occurs on a data write.

  4. If I understand correctly, parallelism in this PR is utilized on branching, i.e. at each moment, nodes submitted to the pool have a common parent. The ideal model is where a pending node is submitted to the pool as soon as it is discovered. (This is remark for the future though, the branching parallelism is already an improvement.)

@tothtamas28
Copy link
Collaborator

@ehildenb, this is probably interesting for you too.

@iFrostizz iFrostizz requested a review from tothtamas28 July 10, 2023 09:51
@ehildenb
Copy link
Member

@iFrostizz I discussed with @tothtamas28 a bit this change. I think that the main thing we want is to leave the parallelization up to the client, and have a thread-safe data-model. This is important for not causing race conditions, and getting maximal utilization of the processes available (instead of launching threads for each pending node, and waiting for those to all synchronize).

I think a better design is to break the proof into subproofs based on the pending nodes. So each time we have a branch, if parallelism is turned on, turn it into a subproof. That way, we are synchronizing (and locking) on the Proof datastructure, which should be thread-safe since each proof carries its own KCFG. Then the user can tell us "what is the maximal number of pending nodes allowed before breaking into subproofs?", and once that level of branching is reached, then launch separate processes (as they wish) for each subproof.

I think it's best if @tothtamas28 takes the first attempt at this, because parallel/thread-safe programming is pretty related to data-modelling and that's what he's been working on with these data-structures anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants