-
Notifications
You must be signed in to change notification settings - Fork 421
Initiate gossip sync only after receiving GossipTimestampFilter. #1452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initiate gossip sync only after receiving GossipTimestampFilter. #1452
Conversation
|
A test case is currently failing since the peers are now signalling support for gossip queries but do not send a |
| pub struct TestRoutingMessageHandler { | ||
| pub chan_upds_recvd: AtomicUsize, | ||
| pub chan_anns_recvd: AtomicUsize, | ||
| pub request_full_sync: AtomicBool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should just fix this to honor request_full_sync and send a gossip filter on connected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the filter is currently already sent in peer_connected() and since request_full_sync was not used at all, I figured it would sense to simply remove it rather than to add much more logic to the so-far pretty slim TestRoutingMessageHandler. Or are you suggesting to introduce request_full_sync to other places, e.g., NetGraphMsgHandler, so it could be used by should_request_full_sync()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, who is sending the filter in peer_connected when we're using a TestRoutingMessageHandler? Its peer_connected method is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I get what you're saying. You're right, its probably easiest here to let the TestRoutingMessageHandler implementation of peer_connected mirror the functionality of NetGraphMsgHandler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went that route (replicating the peer_connected functionality in TestRoutingMessgaeHandler) and it generally seems to work nicely. However, now the second peer receives only 86/100 channel updates and 43/50 channel announcements, i.e., the test now fails at the last two assert_eqs. I currently struggle to find why this would be the case, especially since the logs show that the gossip is indeed just postponed until the peers have received gossip_timestamp_filter messages. I'll come around to continue staring at the code next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and before I adapt the test I at least would like to understand why suddenly not the same number of channel updates are delivered, because its not intuitive to me from the changes I made...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the channel updates will be delivered eventually, but the test currently handles a specific number of round-trips between the peers and then makes sure the messages were all delivered. We now need one more round-trip, I believe (but its not quite in the same set of steps as the existing loop, so we can't just add one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 7a99de2 I now moved some things in the loop around. This makes the test succeed hopefully without reducing the coverage too much (i.e., it now checks for B's empty outbound_data once in the end, not in every iteration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I mean I think that's fine. That said, I'd kinda prefer to update the pre-loop code to add a few message delivery calls that match the new message flow, but if its too annoying to figure that out exactly I'd be okay landing this as-is. Its kinda nice to use this test to ensure that we don't write more gossip messages to the buffer after we reach our limit, even if it makes the test kinda brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. The gossip filter messages basically added half a round trip. a31362a should be a cleaner fix now.
4d647d8 to
b24ee4c
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'd be happy landing this as-is, but see comment.
| pub struct TestRoutingMessageHandler { | ||
| pub chan_upds_recvd: AtomicUsize, | ||
| pub chan_anns_recvd: AtomicUsize, | ||
| pub request_full_sync: AtomicBool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I mean I think that's fine. That said, I'd kinda prefer to update the pre-loop code to add a few message delivery calls that match the new message flow, but if its too annoying to figure that out exactly I'd be okay landing this as-is. Its kinda nice to use this test to ensure that we don't write more gossip messages to the buffer after we reach our limit, even if it makes the test kinda brittle.
7a99de2 to
a31362a
Compare
Codecov Report
@@ Coverage Diff @@
## main #1452 +/- ##
==========================================
+ Coverage 90.87% 92.21% +1.33%
==========================================
Files 75 75
Lines 41474 48770 +7296
Branches 41474 48770 +7296
==========================================
+ Hits 37691 44971 +7280
- Misses 3783 3799 +16
Continue to review full report at Codecov.
|
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough for me. LGTM. Lets get another reviewer, but feel free to squash the fixup commits down to clean up the git history in the meantime.
a31362a to
f8a196c
Compare
|
Squashed! |
arik-so
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| vec![] | ||
| let mut ret = Vec::new(); | ||
| let mut pending_events = self.pending_events.lock().unwrap(); | ||
| core::mem::swap(&mut ret, &mut pending_events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purely out of curiosity, what's the benefit of mem::swap over drain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mem::swap exploits the Rust fact that an object's in-memory representation can always be copied for a move. Thus, this will just copy the (pointer, length, capacity) tuple from one to the other, drain(..) will (or could, I'm not sure how LLVM will optimize it in practice) actually allocate a new vec, move all the old vec elements, and then return that.
| #[cfg(feature = "std")] | ||
| { | ||
| gossip_start_time = SystemTime::now().duration_since(UNIX_EPOCH).expect("Time must be > 1970").as_secs(); | ||
| if should_request_full_sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be more legible to write
let delta = if should_request_full_sync {
60 * 60 * 24 * 7 * 2 // two weeks ago
} else {
60 * 60 // an hour ago
};
gossip_start_time = SystemTime::now().duration_since(UNIX_EPOCH).expect("Time must be > 1970").as_secs() - delta;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it was copied from the non-test code, so presumably fine-enough.
For peers signalling support for gossip queries, this PR delays sending of gossip updates and the initial sync until we receive a
GossipTimestampFilterfrom the counterparty.Closes #1402.
As suggested there, this PR ignores the particular values in the set filter and only considers whether a filter is set or not (i.e., it provides a boolean option to disable receiving gossip messages).