[ETCM-186] Fix strict pick with too few blocks #723

ntallar · 2020-10-05T15:41:47Z

Description

Fixes branch resolution when triggered at the beginning of the chain, that caused sync to end in irreparable case

Issue

Given 2 nodes

If both have chains of the same weight but forked (top block with number X), when node 1 requests the last blocks from node 2 it won't import any
This step results in node 1 logging: Imported no blocks
Node 2 extends it chain with several mined blocks
Node 1 asks for blocks starting from block X+1, but as node 2 forked they won't be concatenable
This step results in node 1 logging:
```
 Unknown branch, going back to block nr -2 in order to resolve branches
 Invalidate blocks from -2
```
Next request from node 1 will be from an earlier block than X, ideally before the fork, node 1 currently gets stuck here with logging:
```
 Strict Pick blocks from -2 to 10
 Lowest available block is 0
```

How to reproduce

It's very hard to reproduce this automatically in an integration test level (maybe after ETCM-127 it can be easier to include a test)

Change code to:
a. Delay requests for 2 minutes to allow time for mining blocks in between. Replace the fetchHeaders function from BlockFetcher with:

 private def fetchHeaders(state: BlockFetcherState): Unit = {
   val blockNr = state.nextToLastBlock
   val amount = syncConfig.blockHeadersPerRequest

   log.info("Sleeping for 2 minutes before fetching headers")
   Thread.sleep(2 * 60 * 1000)
   fetchHeadersFrom(blockNr, amount) pipeTo self
 }

b. Have not every block be broadcasted so as to simplify the whole process, the last blocks should be broadcasted so as to trigger a new fetch. Replace the broadcastBlock function from BlockBroadcast with:

 def broadcastBlock(newBlock: NewBlock, handshakedPeers: Map[Peer, PeerInfo]): Unit = {
    val peersWithoutBlock = handshakedPeers.collect {
       case (peer, peerInfo) if shouldSendNewBlock(newBlock, peerInfo) => peer }.toSet

    if(newBlock.block.number > 14) {
       broadcastNewBlock(newBlock, peersWithoutBlock)

       if (syncConfig.broadcastNewBlockHashes) {
         // NOTE: the usefulness of this message is debatable, especially in private networks
         broadcastNewBlockHash(newBlock, peersWithoutBlock)
       }
    }
  }

Start at the same time 2 nodes connected to each other
While they are both blocked in their sleep, mine 10 blocks in each
Await for node 1 fetching the 10 blocks from node 2 and failing to import them (with log Imported no blocks)
Mine 10 blocks on node 2, the broadcasting of them should trigger a re-fetch from 1

That will halt node 1 progress with infinite logs:

2020-10-05 10:07:59,343 [i.i.e.b.sync.regular.BlockFetcher] - Strict Pick blocks from -2 to 10
2020-10-05 10:07:59,343 [i.i.e.b.sync.regular.BlockFetcher] - Lowest available block is 0

Solution

Strict pick from value is capped to 1 in case it's lower than it.

Our current code wasn't working due to the condition .filter(_.headOption.exists(block => block.number <= lower)) on strictPickBlocks, which is never true in case lower is a negative number. The latter never happens after capping the from value

Testing

Attempting to reproduce it after the fix should result in node 1 not halting itself and importing the last 10 blocks from node 2

Up to discussion

I'm not sure how the node will handle if resolving branches up to 1000 blocks as configured on the testnet, maybe we should change it to 100 so that it requires a single message for that resolve? Further analysis of the sync process should be done if not

ntallar · 2020-10-06T15:35:20Z

src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcher.scala

    case StrictPickBlocks(from, atLeastWith) =>
-      val minBlock = from.min(atLeastWith).max(1)
-      log.debug("Strict Pick blocks from {} to {}", from, atLeastWith)
+      // FIXME: Consider having StrictPickBlocks calls guaranteeing this


I finally decided not touching this on this PR to minimize the impact it has on our syncing process

kapke · 2020-10-07T12:36:05Z

It all makes sense to me.

Do you think that writing a test in similar fashion to "go back to earlier block in order to find a common parent with new branch" would be too hard to expose that issue?

mirkoAlic · 2020-10-07T18:34:04Z

This is need it as soon as possible for experimental testnet purposes, so i will merge it as it is. Any future improvement could be do it later. cc @jmendiola222, @kapke

ntallar added the bug Something isn't working label Oct 5, 2020

ntallar requested review from KonradStaniec, kapke and mirkoAlic October 5, 2020 15:41

ntallar force-pushed the etcm-186-fix-strict-pick branch from 3c6e02f to cb179e9 Compare October 5, 2020 15:47

ntallar added the WIP label Oct 5, 2020

ntallar force-pushed the etcm-186-fix-strict-pick branch 4 times, most recently from 80e62ce to 2c3352f Compare October 5, 2020 18:34

ntallar changed the title ~~[ETCM-186] Fix strict pick~~ [ETCM-186] Fix strict pick with too few blocks Oct 5, 2020

[ETCM-186] Fix strict pick

7c8dd37

ntallar removed the WIP label Oct 6, 2020

ntallar force-pushed the etcm-186-fix-strict-pick branch from 2c3352f to 7c8dd37 Compare October 6, 2020 15:34

ntallar commented Oct 6, 2020

View reviewed changes

Nicolas Tallar added 2 commits October 7, 2020 12:27

[ECTM-186] Added test for fix

137ca32

Merge branch 'develop' into etcm-186-fix-strict-pick

36e4e9a

mirkoAlic merged commit 327d295 into develop Oct 7, 2020

mirkoAlic deleted the etcm-186-fix-strict-pick branch October 7, 2020 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ETCM-186] Fix strict pick with too few blocks #723

[ETCM-186] Fix strict pick with too few blocks #723

Uh oh!

ntallar commented Oct 5, 2020 •

edited

Loading

Uh oh!

ntallar Oct 6, 2020

Uh oh!

kapke commented Oct 7, 2020

Uh oh!

mirkoAlic commented Oct 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ETCM-186] Fix strict pick with too few blocks #723

[ETCM-186] Fix strict pick with too few blocks #723

Uh oh!

Conversation

ntallar commented Oct 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

How to reproduce

Solution

Testing

Up to discussion

Uh oh!

ntallar Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

kapke commented Oct 7, 2020

Uh oh!

mirkoAlic commented Oct 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ntallar commented Oct 5, 2020 •

edited

Loading