[ETCM-540] Improve peer discovery algorithm #903

AnastasiiaL · 2021-01-26T08:55:50Z

Description

With each round of scanning for new nodes to connect, we only take N from that pool, trying to connect to them. This process takes a lot of time as not all peers are easy to connect to and if no nodes we suitable - we have to wait for another round of scanning. And then it's possible that we take N nodes from the new scan and they will contain nodes that were previously tried, but the connection was not successful. So instead of retrying those nodes, first we want to go through the whole list of discovered peers.

Proposed Solution

This PR introduces cashing of the tried nodes with a Map bounded by size to the size of the blacklist. So first we'll be trying to connect to the new nodes, that are not cashed in that map.

If the scanning round didn't find enough nodes for us to be picky - we try to connect to all of them regardless the fact if they were tried before.

Parameters that control blacklisting were reviewed - long blacklisting (in case wrong protocol, incompatible network, timeout during connection) was increased from 30min to 10h, as it's unlikely that nodes from another network/different protocol might suddenly change within 30min timeframe.

src/main/scala/io/iohk/ethereum/network/PeerManagerActor.scala

src/test/scala/io/iohk/ethereum/network/PeerManagerSpec.scala

KonradStaniec

did you try running mantis node with those changes to see how fast it traverses the peers it get from discovery ?

src/main/scala/io/iohk/ethereum/network/PeerManagerActor.scala

AnastasiiaL · 2021-01-27T10:29:44Z

did you try running mantis node with those changes to see how fast it traverses the peers it get from discovery ?

I ran to see that nothing got broken, but was not sure how to see if there's any improvement in the speed as it's working only in a specific situation when previously blacklisted modes are tried instead of new ones, for this the network should be large together with a large amount of blacklisted nodes.

Edit: I'm measuring the time to start syncing at the moment

KonradStaniec · 2021-01-27T14:42:49Z

Time to start syncing is pretty tricky measure which has also other factors in it.

Maybe the best metric would be number of peers we tried to connect to after some time ( maybe as percentage of nodes provided by discovery) ? ( just thinking out loud here as i am also not sure 😅 )

AnastasiiaL · 2021-01-27T14:54:52Z

Time to start syncing is pretty tricky measure which has also other factors in it.

Maybe the best metric would be number of peers we tried to connect to after some time ( maybe as percentage of nodes provided by discovery) ? ( just thinking out loud here as i am also not sure 😅 )

Indeed, after running it multiple times I saw the time to sync drastically differs. Ended up looking at the ratio of connected nodes to discovered ones :)

So running on develop branch the percentage of connected nodes after 10min is 0.273%
while on this branch is 0.316 %. That's a 16% improvement.
That's an averaged out percentage out of 5 tries. Not sure it's very reliable but the change definitely didn't make things worse

KonradStaniec

It would nice to check what the ratio looks after 1h i.e when we had acquire more peers, but if it helps even a little bit then LGTM!

When i was thinking about this problem my inital solution was to change our interactions with discovery form: ask discovery for all found peers every 1 minute, and try to connect to X peers
to something like: ask discovery for all found peers, try to connect to all of them (by calling to them in batches as it is now), and ask for discovery for all found peers again only after we check we are sure we checked all peers from previous batch.

But maybe this something for other pr.

AnastasiiaL · 2021-01-28T11:15:20Z

Just checked the ration after running for 1h:
develop: Discovered 5343 nodes, Blacklisted 1708 nodes, handshaked to 11/80 ----- 0.206%
etcm-540: Discovered 5336 nodes, Blacklisted 2824 nodes, handshaked to 17/80 ------ 0.319%

this difference could be caused by increased time for the blacklist as well, as seen above the number of blacklisted nodes with the new configuration is 65% higher

AnastasiiaL · 2021-01-28T11:21:13Z

@KonradStaniec I like your idea of changing the interaction with discovery as well. No need asking for new nodes while we didn't process the previously discovered ones. The question in this case would be how fast the discovered batch is processed and if it doesn't leave us with an outdated info. I think the follow-up task could be created, WDYT @dzajkowski?

AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from b322739 to 567450f Compare January 26, 2021 10:07

leo-bogastry reviewed Jan 26, 2021

View reviewed changes

src/main/scala/io/iohk/ethereum/network/PeerManagerActor.scala Outdated Show resolved Hide resolved

leo-bogastry reviewed Jan 26, 2021

View reviewed changes

src/test/scala/io/iohk/ethereum/network/PeerManagerSpec.scala Show resolved Hide resolved

AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from 567450f to 0e753a7 Compare January 26, 2021 15:37

KonradStaniec reviewed Jan 27, 2021

View reviewed changes

src/main/scala/io/iohk/ethereum/network/PeerManagerActor.scala Outdated Show resolved Hide resolved

src/main/scala/io/iohk/ethereum/network/PeerManagerActor.scala Outdated Show resolved Hide resolved

src/main/scala/io/iohk/ethereum/network/PeerManagerActor.scala Show resolved Hide resolved

AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch 3 times, most recently from 1b54e7e to 5ee7d7b Compare January 27, 2021 21:14

KonradStaniec approved these changes Jan 28, 2021

View reviewed changes

AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from 5ee7d7b to 5beb9bd Compare January 28, 2021 11:09

dzajkowski approved these changes Jan 29, 2021

View reviewed changes

[ETCM-540] Improve peer discovery algorithm

49ba7d3

AnastasiiaL force-pushed the feature/ETCM-540-improve-discovery branch from 5beb9bd to 49ba7d3 Compare January 29, 2021 09:29

AnastasiiaL merged commit b003d6c into develop Jan 29, 2021

AnastasiiaL deleted the feature/ETCM-540-improve-discovery branch January 29, 2021 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ETCM-540] Improve peer discovery algorithm #903

[ETCM-540] Improve peer discovery algorithm #903

Uh oh!

AnastasiiaL commented Jan 26, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

KonradStaniec left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnastasiiaL commented Jan 27, 2021 •

edited

Loading

Uh oh!

KonradStaniec commented Jan 27, 2021

Uh oh!

AnastasiiaL commented Jan 27, 2021 •

edited

Loading

Uh oh!

KonradStaniec left a comment

Uh oh!

AnastasiiaL commented Jan 28, 2021

Uh oh!

AnastasiiaL commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ETCM-540] Improve peer discovery algorithm #903

[ETCM-540] Improve peer discovery algorithm #903

Uh oh!

Conversation

AnastasiiaL commented Jan 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Proposed Solution

Uh oh!

Uh oh!

Uh oh!

KonradStaniec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnastasiiaL commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KonradStaniec commented Jan 27, 2021

Uh oh!

AnastasiiaL commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KonradStaniec left a comment

Choose a reason for hiding this comment

Uh oh!

AnastasiiaL commented Jan 28, 2021

Uh oh!

AnastasiiaL commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AnastasiiaL commented Jan 26, 2021 •

edited

Loading

AnastasiiaL commented Jan 27, 2021 •

edited

Loading

AnastasiiaL commented Jan 27, 2021 •

edited

Loading