Persist inbound channel peer info #170

jbesraa · 2023-10-03T15:22:33Z

resolves #133

When we get an inbound channel request, we dont persist the
peer info who initiated the request.
In this commit, we use [LdkEvent::ChannelPending]
to track channels that are inbound and their counterparty
node id is not persisted, and if so,
we add their PeerInfo to PeerStore.
Notice that in order to persist this kind of data,
we need first to get the listening address of the counterparty
node, which is not always available, in that case we dont persist
the peer info.

jbesraa · 2023-10-22T17:32:22Z

@tnull would love your input here (:

tnull

Thanks for looking into this and excuse the delay!

I think the approach to persist the peer data when accepting the channel when handling OpenChannelRequest makes sense. However, note that this event won't currently be emitted if the peer doesn't enable 0conf support. For this to work we also need to always enable user_config.manually_accept_inbound_channels in builder.rs, independently from the 0conf case.

Also, can we add a regression test checking that we would reconnect to a peer from which we accepted an inbound channel after we reinit (or at lest stop()/start()) our node?

tnull · 2023-10-31T11:39:30Z

src/event.rs

 							if allow_0conf { " trusted" } else { "" },
 							counterparty_node_id,
 						);
+						// Persist channel


nit: We're actually not persisting the 'channel' here. As the comment is somewhat superfluous, probably best to remove it.

tnull · 2023-10-31T11:41:50Z

src/event.rs

+							};
+							log_info!(
+								self.logger,
+								"Persisted inbound channel from peer {}", counterparty_node_id


nit: Likewise here: we haven't persisted the channel. However, we probably don't need to log this separately in the success case.

tnull · 2023-10-31T11:42:12Z

src/event.rs

+								}
+							};
+							let peer = PeerInfo { node_id: counterparty_node_id, address: NetAddress(address) };
+							if let Err(e) = self.peer_store.add_peer(peer) {


Could you use a match here?

tnull · 2023-10-31T11:43:07Z

src/event.rs

+							let address = match peer.1.clone() {
+								Some(address) => address,
+								None => {
+									log_error!(self.logger, "Failed to add peer to peer store: no network address found");


Should we log the counterparty_node_id in this case, too?

tnull · 2023-11-13T14:25:56Z

@jbesraa Could you make any progress here?

jbesraa · 2023-11-16T10:42:11Z

src/test/functional_tests.rs

+	let node_b_peers = node_b_peers.get(0).unwrap();
+	assert_eq!(node_b_peers.node_id, node_a_id);
+	assert_eq!(node_b_peers.is_persisted, true);
+	assert_eq!(node_b_peers.is_connected, true);


this assertion is failing, I tried to debug it and when I restart the node, it tries to reconnect to peer A but returns ConnectionRefused, specifically it fails here https://github.com/lightningdevkit/ldk-node/pull/170/files#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R584

any hints why this can happen?

Yes, a) reconnecting might take some time, so you might need to a small (500ms, maybe?) sleep before to give the node the chance to reconnect before checking b) When A connects to B the port associated with A's outgoing socket is not its listening port. So rather than just reusing the connected address we'll have to look up the listening address in the network graph, and just not persist the info if we can't find it.

jbesraa · 2023-11-16T11:08:11Z

src/test/functional_tests.rs

+	builder_b.set_esplora_server(esplora_url.clone());
+	let node_b = builder_b.build().unwrap();
+	node_b.start().unwrap();
+	open_channel(&node_a, &node_b, &bitcoind, &electrsd, true);


will add another flow to cover 0allowconf = false as

I don't think we need to cover 0conf as a special case, it shouldn't be connected to persistence at all.

tnull

This still won't work if we don't update our manual-accept settings as mentioned above:

I think the approach to persist the peer data when accepting the channel when handling OpenChannelRequest makes sense. However, note that this event won't currently be emitted if the peer doesn't enable 0conf support. For this to work we also need to always enable user_config.manually_accept_inbound_channels in builder.rs, independently from the 0conf case.

tnull · 2023-11-16T11:25:00Z

src/test/functional_tests.rs

 	do_channel_full_cycle(node_a, node_b, &bitcoind, &electrsd, true)
 }

+fn open_channel<K: KVStore + Sync + Send>(


This does way more than just open the channel. I had a open_channel util implemented in #157 and now pulled it out to #198. Could you rebase once the latter lands (in a bit) and reuse the open_channel util there?

Now merged #198, feel free to rebase on main.

tnull · 2023-11-16T11:25:47Z

src/test/functional_tests.rs

+	builder_b.set_esplora_server(esplora_url.clone());
+	let node_b = builder_b.build().unwrap();
+	node_b.start().unwrap();
+	open_channel(&node_a, &node_b, &bitcoind, &electrsd, true);


I don't think we need to cover 0conf as a special case, it shouldn't be connected to persistence at all.

tnull · 2023-11-16T11:26:31Z

src/test/functional_tests.rs

+	// B should persist A as a peer by adding there
+	// node_id to peer_store
+	let (bitcoind, electrsd) = setup_bitcoind_and_electrsd();
+	println!("== Node A ==");


Please use setup_two_nodes for this.

tnull · 2023-11-16T12:48:37Z

src/event.rs

+							.iter()
+							.find(|(node_id, _)| node_id == &counterparty_node_id)
+						{
+							let address = match peer.1.clone() {


Rather than using this address, we need to retrieve the listening address from the public network graph and not proceed to persist the peer info if we can't find it.

yes was just wondering with the code why its trying to reconnect to the wrong port.
trying to access self.network_graph.read_only().nodes it return an empty map, do I need to setup gossip somewhere?

been trying to resolve this, will get back to this tomorrow.
a couple of questions please:

I feel there is an error in rust-lightnin
in the following logs, the port showing in node_b_peers is not in the node_a listening addresses array.
in this test im not running my code, this is just starting 2 nodes and creating a channel and then listing peers without stopping any node.

== Node A == Setting network: regtest Setting random LDK storage dir: /tmp/xrPtejR Setting random LDK listening addresses: [TcpIpV4 { addr: [127, 0, 0, 1], port: 9937 }, TcpIpV4 { addr: [127, 0, 0, 1], port: 57309 }] == Node B == Setting network: regtest Setting random LDK storage dir: /tmp/uu1Os5I Setting random LDK listening addresses: [TcpIpV4 { addr: [127, 0, 0, 1], port: 12888 }, TcpIpV4 { addr: [127, 0, 0, 1], port: 14086 }] Generating 101 blocks... Done! Generating 1 blocks... Done! A -- connect_open_channel -> B 029fca8b65b2b981ca69872a8ec09efd5d1b618e179a63d05a1dbc460b91d6d945 got event ChannelPending { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(189294706845638377905056823224737092261), former_temporary_channel_id: ChannelId([180, 20, 248, 6, 111, 228, 250, 185, 45, 159, 200, 208, 14, 64, 2, 62, 15, 102, 60, 100, 168, 118, 239, 64, 165, 230, 199, 102, 126, 70, 42, 192]), counterparty_node_id: PublicKey(25fa24b0a53b330dd5e3ebc3a822c1b2cc4b9f5be27667a7b30bf37ad6061bec0d4b751e8f47b46ac7c1b07e5ca045347febf1a78328b974d69a57d6093ecd95), funding_txo: OutPoint { txid: 5d3c28d8ecfadedee5bc21a01696b7901a49c13c2fdd1fb7dfc62f422263d746, vout: 1 } } node_b got event ChannelPending { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(244287129025742860562500776048931337989), former_temporary_channel_id: ChannelId([180, 20, 248, 6, 111, 228, 250, 185, 45, 159, 200, 208, 14, 64, 2, 62, 15, 102, 60, 100, 168, 118, 239, 64, 165, 230, 199, 102, 126, 70, 42, 192]), counterparty_node_id: PublicKey(45d9d6910b46bc1d5ad0639a178e611b5dfd9ec08e2a8769ca81b9b2658bca9fe6f20a47262070c9128a9623d4e7cf46d7fdb21da90d1a54a1c75003ffad9194), funding_txo: OutPoint { txid: 5d3c28d8ecfadedee5bc21a01696b7901a49c13c2fdd1fb7dfc62f422263d746, vout: 1 } } .. generating blocks .. Generating 6 blocks... Done! 029fca8b65b2b981ca69872a8ec09efd5d1b618e179a63d05a1dbc460b91d6d945 got event ChannelReady { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(189294706845638377905056823224737092261), counterparty_node_id: Some(PublicKey(25fa24b0a53b330dd5e3ebc3a822c1b2cc4b9f5be27667a7b30bf37ad6061bec0d4b751e8f47b46ac7c1b07e5ca045347febf1a78328b974d69a57d6093ecd95)) } node_b got event ChannelReady { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(244287129025742860562500776048931337989), counterparty_node_id: Some(PublicKey(45d9d6910b46bc1d5ad0639a178e611b5dfd9ec08e2a8769ca81b9b2658bca9fe6f20a47262070c9128a9623d4e7cf46d7fdb21da90d1a54a1c75003ffad9194)) } [src/test/functional_tests.rs:57] &node_b_peers = PeerDetails { node_id: PublicKey( 45d9d6910b46bc1d5ad0639a178e611b5dfd9ec08e2a8769ca81b9b2658bca9fe6f20a47262070c9128a9623d4e7cf46d7fdb21da90d1a54a1c75003ffad9194, ), address: TcpIpV4 { addr: [ 127, 0, 0, 1, ], port: 43256, }, is_persisted: false, is_connected: true, }

when the event OpenChannelRequest fired, the network_graph is not updated yet, so iam actually not sure we can rely on that..
when moving the code the ChannelReady, I do see the node info and announcements, but this is still a bit tricky because we only listen to announcements from peers who we have channels with.

Maybe we should tackle this here https://github.com/lightningdevkit/ldk-node/blob/main/src/lib.rs#L552 ?

been trying to resolve this, will get back to this tomorrow. a couple of questions please:

I feel there is an error in rust-lightnin
in the following logs, the port showing in node_b_peers is not in the node_a listening addresses array.
in this test im not running my code, this is just starting 2 nodes and creating a channel and then listing peers without stopping any node.

No, this is just how basic TCP connection handling usually works: you bind a socket to a particular port to listen for incoming connections which you then accept and receive a new per-connection socket. If you connect outbound you immediately get a per-connection socket with a random port that is required to be different from your listening port (as you already bound to that one).

when the event OpenChannelRequest fired, the network_graph is not updated yet, so iam actually not sure we can rely on that..
when moving the code the ChannelReady, I do see the node info and announcements, but this is still a bit tricky because we only listen to announcements from peers who we have channels with.

Hm, I don't think this isn't technically true: we should also receive node announcements from peers we're only connected to. That said, I'm afraid we can't do much about this as there really is no other way to be informed of the listening address for inbound connections. While this will be a bit tricky to test, it shouldn't be too big of a deal in practice as we'd generally expect to have synced the gossip state via RGS or P2P during normal operation.

jbesraa · 2023-11-17T14:42:34Z

@tnull
Because we cant find the peer info in network_graph.nodes in the current implementations, we are not able to add the peer to peer_store. I think we might need to make address optional in PeerInfo and not set it in this case - and when reconnecting if address is_none we can try and fetch it from graph. wdyt?

I think we should:

move the functionality to ChannelReady because in a ChannelOpenRequest we are not sure the channel is confirmed, and it could be the case and channel closed, and we dont need that peer data anymore?
on the other hand, if channel is in progress(between ready and request).. we do want to persist the peer.
access network graph in testing so we fix the tests

tnull · 2023-11-21T09:40:51Z

move the functionality to ChannelReady because in a ChannelOpenRequest we are not sure the channel is confirmed, and it could be the case and channel closed, and we dont need that peer data anymore?
on the other hand, if channel is in progress(between ready and request).. we do want to persist the peer.

LDK will only persist channels across restart whose funding is in-flight, however, they are not required to be confirmed. So if we want to move it to another event ChannelPending might be more suitable. Note that we'll have to check ChannelDetails::is_outbound there to make sure we'd only persist for inbound channels.

access network graph in testing so we fix the tests

Hm, I'm not sure what you mean by that? I think have this behavior covered in tests might become quite complicated. We would need to make sure that the node opening a channel already has a publicly announced channel and did broadcast its announcement before, i.e., setup two nodes, open a channel, give them time to exchange messages and then disconnect and close the channel. Then, we could open the actual channel of interest, and test persistence by a) taking the channel-opening node offline, deleting it's peer data so that it won't reconnect itself (i.e., it would need to use FilesystemStore), and bring it up online again at which point the other node should reconnect to it.

TLDR: After thinking about it once more I'm not sure testing this behavior is worth adding all this complexity afterall.

While it's nice to try to reconnect inbound channels if we come back online, it's really not that crucial as peers establishing outbound connections are expected to reconnect eventually and nodes accepting inbound channels are expected routing nodes that have a longer uptime, i.e., it's fine if they reconnect some of their channels only after a slight delay.

jbesraa · 2023-11-21T15:42:47Z

Ok, thanks for the explanation!
Should I then just remove the test and leave the implementation where it is?

tnull · 2023-11-22T09:31:02Z

Ok, thanks for the explanation! Should I then just remove the test and leave the implementation where it is?

Mh, I think moving it to ChannelPending would probably make sense, but yeah, we can remove the test for now I think.

jbesraa · 2023-11-22T15:18:39Z

thanks @tnull
moved the code to ChannelPending and doing this check before trying to add the peer

if !pending_channel.is_outbound && self.peer_store.get_peer(&counterparty_node_id).is_none()

reverted to this as well

if !config.trusted_peers_0conf.is_empty() {
		// Manually accept inbound channels if we expect 0conf channel requests, avoid
		// generating the events otherwise.
		user_config.manually_accept_inbound_channels = true;
}

jbesraa · 2023-11-22T15:20:04Z

src/lib.rs

-												.await;
+												.await {
+													Ok(_) => {
+														log_info!(connect_logger, "Successfully reconnected to peer {}", node_id);


added this while debugging, but I think its nice to have..

Yeah, why not. However, could you please keep a let res = and then match on res. That avoids unnecessary indents.

jbesraa · 2023-11-22T15:39:43Z

No code changes, just pushed with a verified commit

tnull · 2023-11-23T14:35:28Z

src/event.rs

 	runtime: Arc<RwLock<Option<tokio::runtime::Runtime>>>,
 	logger: L,
 	config: Arc<Config>,
+	peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,


Please use the generic L here and below:

Suggested change

peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,

peer_store: Arc<PeerStore<K, L>>,

tnull · 2023-11-23T14:35:45Z

src/event.rs

 		channel_manager: Arc<ChannelManager<K>>, network_graph: Arc<NetworkGraph>,
 		keys_manager: Arc<KeysManager>, payment_store: Arc<PaymentStore<K, L>>,
 		runtime: Arc<RwLock<Option<tokio::runtime::Runtime>>>, logger: L, config: Arc<Config>,
+		peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,


Suggested change

peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,

peer_store: Arc<PeerStore<K, L>>,

tnull · 2023-11-23T14:47:02Z

src/event.rs

+					if !pending_channel.is_outbound
+						&& self.peer_store.get_peer(&counterparty_node_id).is_none()
+					{
+						let nodes = network_graph.nodes();


You can write this much more compactly (and more readable) as:

if let Some(address) = network_graph .nodes() .get(&NodeId::from_pubkey(&counterparty_node_id)) .and_then(|node_info| node_info.announcement_info.as_ref()) .and_then(|ann_info| ann_info.addresses().first()) { let peer = PeerInfo { node_id: counterparty_node_id, address: address.clone(), }; self.peer_store.add_peer(peer).unwrap_or_else(|e| { log_error!( self.logger, "Failed to add peer {} to peer store: {}", counterparty_node_id, e ); }); }

Also, given we don't log peer store operations elsewhere we don't need to log in the success case here (at least for now).

tnull · 2023-11-23T14:52:35Z

src/lib.rs

+														log_info!(connect_logger, "Successfully reconnected to peer {}", node_id);
+													}
+													Err(e) => {
+														log_error!(connect_logger, "Failed to connect to peer: {}", e);


nit: Let's also speak of reconnect here then.

tnull · 2023-11-23T14:53:31Z

src/lib.rs

-												.await;
+												.await {
+													Ok(_) => {
+														log_info!(connect_logger, "Successfully reconnected to peer {}", node_id);


Yeah, why not. However, could you please keep a let res = and then match on res. That avoids unnecessary indents.

When we get an inbound channel request, we dont persist the peer info who initiated the request. In this commit, we use `[LdkEvent::ChannelPending]` to track channels that are inbound and their counterparty node id is not persisted, and if so, we add their `PeerInfo` to `PeerStore`. Notice that in order to persist this kind of data, we need first to get the listening address of the counterparty node, which is not always available, in that case we dont persist the peer info.

jbesraa · 2023-11-28T15:31:22Z

Pushed new version taking into account all the comments mentioned above! thanks for the feedback & patience!

tnull

LGTM.

jbesraa force-pushed the persist-inbound-channels branch from 00bfe53 to f994c45 Compare October 24, 2023 06:16

tnull reviewed Oct 31, 2023

View reviewed changes

jbesraa force-pushed the persist-inbound-channels branch 2 times, most recently from da4621d to 4cbd450 Compare November 16, 2023 10:39

jbesraa commented Nov 16, 2023

View reviewed changes

jbesraa force-pushed the persist-inbound-channels branch from 4cbd450 to 0ea26f9 Compare November 16, 2023 10:45

jbesraa commented Nov 16, 2023

View reviewed changes

jbesraa force-pushed the persist-inbound-channels branch from 0ea26f9 to 2d4b66b Compare November 16, 2023 11:12

tnull reviewed Nov 16, 2023

View reviewed changes

jbesraa force-pushed the persist-inbound-channels branch 2 times, most recently from 7ab6ca9 to 7bebcc0 Compare November 17, 2023 14:37

jbesraa force-pushed the persist-inbound-channels branch from 7bebcc0 to 964ffd6 Compare November 22, 2023 15:07

jbesraa changed the title ~~Persist inbound channel coming through OpenChannelRequest~~ Persist inbound channel peer info Nov 22, 2023

jbesraa force-pushed the persist-inbound-channels branch from 964ffd6 to 8da7500 Compare November 22, 2023 15:14

jbesraa commented Nov 22, 2023

View reviewed changes

jbesraa force-pushed the persist-inbound-channels branch from 8da7500 to add5880 Compare November 22, 2023 15:38

tnull reviewed Nov 23, 2023

View reviewed changes

tnull added this to the 0.2 milestone Nov 27, 2023

jbesraa force-pushed the persist-inbound-channels branch from add5880 to fa404d1 Compare November 28, 2023 15:28

tnull approved these changes Nov 29, 2023

View reviewed changes

tnull merged commit fc09bfb into lightningdevkit:main Nov 29, 2023

jbesraa deleted the persist-inbound-channels branch April 27, 2024 17:27

	peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,
	peer_store: Arc<PeerStore<K, L>>,

Persist inbound channel peer info #170

Persist inbound channel peer info #170

Uh oh!

Conversation

jbesraa commented Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbesraa commented Oct 22, 2023

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnull commented Nov 13, 2023

Uh oh!

jbesraa Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbesraa commented Nov 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnull commented Nov 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbesraa commented Nov 21, 2023

Uh oh!

tnull commented Nov 22, 2023

Uh oh!

jbesraa commented Nov 22, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbesraa commented Nov 22, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jbesraa commented Oct 3, 2023 •

edited

Loading

jbesraa Nov 16, 2023 •

edited

Loading

jbesraa commented Nov 17, 2023 •

edited

Loading

tnull commented Nov 21, 2023 •

edited

Loading