Skip to content

Conversation

@AnastasiiaL
Copy link
Contributor

@AnastasiiaL AnastasiiaL commented Jan 26, 2021

Description

With each round of scanning for new nodes to connect, we only take N from that pool, trying to connect to them. This process takes a lot of time as not all peers are easy to connect to and if no nodes we suitable - we have to wait for another round of scanning. And then it's possible that we take N nodes from the new scan and they will contain nodes that were previously tried, but the connection was not successful. So instead of retrying those nodes, first we want to go through the whole list of discovered peers.

Proposed Solution

This PR introduces cashing of the tried nodes with a Map bounded by size to the size of the blacklist. So first we'll be trying to connect to the new nodes, that are not cashed in that map.

If the scanning round didn't find enough nodes for us to be picky - we try to connect to all of them regardless the fact if they were tried before.

Parameters that control blacklisting were reviewed - long blacklisting (in case wrong protocol, incompatible network, timeout during connection) was increased from 30min to 10h, as it's unlikely that nodes from another network/different protocol might suddenly change within 30min timeframe.

@AnastasiiaL AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from b322739 to 567450f Compare January 26, 2021 10:07
@AnastasiiaL AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from 567450f to 0e753a7 Compare January 26, 2021 15:37
Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try running mantis node with those changes to see how fast it traverses the peers it get from discovery ?

@AnastasiiaL
Copy link
Contributor Author

AnastasiiaL commented Jan 27, 2021

did you try running mantis node with those changes to see how fast it traverses the peers it get from discovery ?

I ran to see that nothing got broken, but was not sure how to see if there's any improvement in the speed as it's working only in a specific situation when previously blacklisted modes are tried instead of new ones, for this the network should be large together with a large amount of blacklisted nodes.

Edit: I'm measuring the time to start syncing at the moment

@KonradStaniec
Copy link
Contributor

Time to start syncing is pretty tricky measure which has also other factors in it.

Maybe the best metric would be number of peers we tried to connect to after some time ( maybe as percentage of nodes provided by discovery) ? ( just thinking out loud here as i am also not sure 😅 )

@AnastasiiaL
Copy link
Contributor Author

AnastasiiaL commented Jan 27, 2021

Time to start syncing is pretty tricky measure which has also other factors in it.

Maybe the best metric would be number of peers we tried to connect to after some time ( maybe as percentage of nodes provided by discovery) ? ( just thinking out loud here as i am also not sure 😅 )

Indeed, after running it multiple times I saw the time to sync drastically differs. Ended up looking at the ratio of connected nodes to discovered ones :)

So running on develop branch the percentage of connected nodes after 10min is 0.273%
while on this branch is 0.316 %. That's a 16% improvement.
That's an averaged out percentage out of 5 tries. Not sure it's very reliable but the change definitely didn't make things worse

@AnastasiiaL AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch 3 times, most recently from 1b54e7e to 5ee7d7b Compare January 27, 2021 21:14
Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would nice to check what the ratio looks after 1h i.e when we had acquire more peers, but if it helps even a little bit then LGTM!

When i was thinking about this problem my inital solution was to change our interactions with discovery form: ask discovery for all found peers every 1 minute, and try to connect to X peers
to something like: ask discovery for all found peers, try to connect to all of them (by calling to them in batches as it is now), and ask for discovery for all found peers again only after we check we are sure we checked all peers from previous batch.

But maybe this something for other pr.

@AnastasiiaL AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from 5ee7d7b to 5beb9bd Compare January 28, 2021 11:09
@AnastasiiaL
Copy link
Contributor Author

Just checked the ration after running for 1h:
develop: Discovered 5343 nodes, Blacklisted 1708 nodes, handshaked to 11/80 ----- 0.206%
etcm-540: Discovered 5336 nodes, Blacklisted 2824 nodes, handshaked to 17/80 ------ 0.319%

this difference could be caused by increased time for the blacklist as well, as seen above the number of blacklisted nodes with the new configuration is 65% higher

@AnastasiiaL
Copy link
Contributor Author

@KonradStaniec I like your idea of changing the interaction with discovery as well. No need asking for new nodes while we didn't process the previously discovered ones. The question in this case would be how fast the discovered batch is processed and if it doesn't leave us with an outdated info. I think the follow-up task could be created, WDYT @dzajkowski?

@AnastasiiaL AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from 5beb9bd to 49ba7d3 Compare January 29, 2021 09:29
@AnastasiiaL AnastasiiaL merged commit b003d6c into develop Jan 29, 2021
@AnastasiiaL AnastasiiaL deleted the feature/ETCM-540-improve-discovery branch January 29, 2021 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants