Skip to content

Conversation

@jbesraa
Copy link
Contributor

@jbesraa jbesraa commented Oct 3, 2023

resolves #133

When we get an inbound channel request, we dont persist the
peer info who initiated the request.
In this commit, we use [LdkEvent::ChannelPending]
to track channels that are inbound and their counterparty
node id is not persisted, and if so,
we add their PeerInfo to PeerStore.
Notice that in order to persist this kind of data,
we need first to get the listening address of the counterparty
node, which is not always available, in that case we dont persist
the peer info.

@jbesraa
Copy link
Contributor Author

jbesraa commented Oct 22, 2023

@tnull would love your input here (:

@jbesraa jbesraa force-pushed the persist-inbound-channels branch from 00bfe53 to f994c45 Compare October 24, 2023 06:16
Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and excuse the delay!

I think the approach to persist the peer data when accepting the channel when handling OpenChannelRequest makes sense. However, note that this event won't currently be emitted if the peer doesn't enable 0conf support. For this to work we also need to always enable user_config.manually_accept_inbound_channels in builder.rs, independently from the 0conf case.

Also, can we add a regression test checking that we would reconnect to a peer from which we accepted an inbound channel after we reinit (or at lest stop()/start()) our node?

src/event.rs Outdated
if allow_0conf { " trusted" } else { "" },
counterparty_node_id,
);
// Persist channel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We're actually not persisting the 'channel' here. As the comment is somewhat superfluous, probably best to remove it.

src/event.rs Outdated
};
log_info!(
self.logger,
"Persisted inbound channel from peer {}", counterparty_node_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Likewise here: we haven't persisted the channel. However, we probably don't need to log this separately in the success case.

src/event.rs Outdated
}
};
let peer = PeerInfo { node_id: counterparty_node_id, address: NetAddress(address) };
if let Err(e) = self.peer_store.add_peer(peer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a match here?

src/event.rs Outdated
let address = match peer.1.clone() {
Some(address) => address,
None => {
log_error!(self.logger, "Failed to add peer to peer store: no network address found");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log the counterparty_node_id in this case, too?

@tnull
Copy link
Collaborator

tnull commented Nov 13, 2023

@jbesraa Could you make any progress here?

@jbesraa jbesraa force-pushed the persist-inbound-channels branch 2 times, most recently from da4621d to 4cbd450 Compare November 16, 2023 10:39
let node_b_peers = node_b_peers.get(0).unwrap();
assert_eq!(node_b_peers.node_id, node_a_id);
assert_eq!(node_b_peers.is_persisted, true);
assert_eq!(node_b_peers.is_connected, true);
Copy link
Contributor Author

@jbesraa jbesraa Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion is failing, I tried to debug it and when I restart the node, it tries to reconnect to peer A but returns ConnectionRefused, specifically it fails here https://github.com/lightningdevkit/ldk-node/pull/170/files#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R584

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any hints why this can happen?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a) reconnecting might take some time, so you might need to a small (500ms, maybe?) sleep before to give the node the chance to reconnect before checking b) When A connects to B the port associated with A's outgoing socket is not its listening port. So rather than just reusing the connected address we'll have to look up the listening address in the network graph, and just not persist the info if we can't find it.

@jbesraa jbesraa force-pushed the persist-inbound-channels branch from 4cbd450 to 0ea26f9 Compare November 16, 2023 10:45
builder_b.set_esplora_server(esplora_url.clone());
let node_b = builder_b.build().unwrap();
node_b.start().unwrap();
open_channel(&node_a, &node_b, &bitcoind, &electrsd, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add another flow to cover 0allowconf = false as

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to cover 0conf as a special case, it shouldn't be connected to persistence at all.

@jbesraa jbesraa force-pushed the persist-inbound-channels branch from 0ea26f9 to 2d4b66b Compare November 16, 2023 11:12
Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still won't work if we don't update our manual-accept settings as mentioned above:

I think the approach to persist the peer data when accepting the channel when handling OpenChannelRequest makes sense. However, note that this event won't currently be emitted if the peer doesn't enable 0conf support. For this to work we also need to always enable user_config.manually_accept_inbound_channels in builder.rs, independently from the 0conf case.

do_channel_full_cycle(node_a, node_b, &bitcoind, &electrsd, true)
}

fn open_channel<K: KVStore + Sync + Send>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does way more than just open the channel. I had a open_channel util implemented in #157 and now pulled it out to #198. Could you rebase once the latter lands (in a bit) and reuse the open_channel util there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now merged #198, feel free to rebase on main.

builder_b.set_esplora_server(esplora_url.clone());
let node_b = builder_b.build().unwrap();
node_b.start().unwrap();
open_channel(&node_a, &node_b, &bitcoind, &electrsd, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to cover 0conf as a special case, it shouldn't be connected to persistence at all.

// B should persist A as a peer by adding there
// node_id to peer_store
let (bitcoind, electrsd) = setup_bitcoind_and_electrsd();
println!("== Node A ==");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use setup_two_nodes for this.

src/event.rs Outdated
.iter()
.find(|(node_id, _)| node_id == &counterparty_node_id)
{
let address = match peer.1.clone() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using this address, we need to retrieve the listening address from the public network graph and not proceed to persist the peer info if we can't find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes was just wondering with the code why its trying to reconnect to the wrong port.
trying to access self.network_graph.read_only().nodes it return an empty map, do I need to setup gossip somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been trying to resolve this, will get back to this tomorrow.
a couple of questions please:

  1. I feel there is an error in rust-lightnin
    in the following logs, the port showing in node_b_peers is not in the node_a listening addresses array.
    in this test im not running my code, this is just starting 2 nodes and creating a channel and then listing peers without stopping any node.
== Node A ==
Setting network: regtest
Setting random LDK storage dir: /tmp/xrPtejR
Setting random LDK listening addresses: [TcpIpV4 { addr: [127, 0, 0, 1], port: 9937 }, TcpIpV4 { addr: [127, 0, 0, 1], port: 57309 }]

== Node B ==
Setting network: regtest
Setting random LDK storage dir: /tmp/uu1Os5I
Setting random LDK listening addresses: [TcpIpV4 { addr: [127, 0, 0, 1], port: 12888 }, TcpIpV4 { addr: [127, 0, 0, 1], port: 14086 }]
Generating 101 blocks... Done!

Generating 1 blocks... Done!


A -- connect_open_channel -> B
029fca8b65b2b981ca69872a8ec09efd5d1b618e179a63d05a1dbc460b91d6d945 got event ChannelPending { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(189294706845638377905056823224737092261), former_temporary_channel_id: ChannelId([180, 20, 248, 6, 111, 228, 250, 185, 45, 159, 200, 208, 14, 64, 2, 62, 15, 102, 60, 100, 168, 118, 239, 64, 165, 230, 199, 102, 126, 70, 42, 192]), counterparty_node_id: PublicKey(25fa24b0a53b330dd5e3ebc3a822c1b2cc4b9f5be27667a7b30bf37ad6061bec0d4b751e8f47b46ac7c1b07e5ca045347febf1a78328b974d69a57d6093ecd95), funding_txo: OutPoint { txid: 5d3c28d8ecfadedee5bc21a01696b7901a49c13c2fdd1fb7dfc62f422263d746, vout: 1 } }
node_b got event ChannelPending { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(244287129025742860562500776048931337989), former_temporary_channel_id: ChannelId([180, 20, 248, 6, 111, 228, 250, 185, 45, 159, 200, 208, 14, 64, 2, 62, 15, 102, 60, 100, 168, 118, 239, 64, 165, 230, 199, 102, 126, 70, 42, 192]), counterparty_node_id: PublicKey(45d9d6910b46bc1d5ad0639a178e611b5dfd9ec08e2a8769ca81b9b2658bca9fe6f20a47262070c9128a9623d4e7cf46d7fdb21da90d1a54a1c75003ffad9194), funding_txo: OutPoint { txid: 5d3c28d8ecfadedee5bc21a01696b7901a49c13c2fdd1fb7dfc62f422263d746, vout: 1 } }

 .. generating blocks ..
Generating 6 blocks... Done!

029fca8b65b2b981ca69872a8ec09efd5d1b618e179a63d05a1dbc460b91d6d945 got event ChannelReady { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(189294706845638377905056823224737092261), counterparty_node_id: Some(PublicKey(25fa24b0a53b330dd5e3ebc3a822c1b2cc4b9f5be27667a7b30bf37ad6061bec0d4b751e8f47b46ac7c1b07e5ca045347febf1a78328b974d69a57d6093ecd95)) }
node_b got event ChannelReady { channel_id: ChannelId([70, 215, 99, 34, 66, 47, 198, 223, 183, 31, 221, 47, 60, 193, 73, 26, 144, 183, 150, 22, 160, 33, 188, 229, 222, 222, 250, 236, 216, 40, 60, 92]), user_channel_id: UserChannelId(244287129025742860562500776048931337989), counterparty_node_id: Some(PublicKey(45d9d6910b46bc1d5ad0639a178e611b5dfd9ec08e2a8769ca81b9b2658bca9fe6f20a47262070c9128a9623d4e7cf46d7fdb21da90d1a54a1c75003ffad9194)) }
[src/test/functional_tests.rs:57] &node_b_peers = PeerDetails {
    node_id: PublicKey(
        45d9d6910b46bc1d5ad0639a178e611b5dfd9ec08e2a8769ca81b9b2658bca9fe6f20a47262070c9128a9623d4e7cf46d7fdb21da90d1a54a1c75003ffad9194,
    ),
    address: TcpIpV4 {
        addr: [
            127,
            0,
            0,
            1,
        ],
        port: 43256,
    },
    is_persisted: false,
    is_connected: true,
}
  1. when the event OpenChannelRequest fired, the network_graph is not updated yet, so iam actually not sure we can rely on that..
    when moving the code the ChannelReady, I do see the node info and announcements, but this is still a bit tricky because we only listen to announcements from peers who we have channels with.

Maybe we should tackle this here https://github.com/lightningdevkit/ldk-node/blob/main/src/lib.rs#L552 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been trying to resolve this, will get back to this tomorrow. a couple of questions please:

  1. I feel there is an error in rust-lightnin
    in the following logs, the port showing in node_b_peers is not in the node_a listening addresses array.
    in this test im not running my code, this is just starting 2 nodes and creating a channel and then listing peers without stopping any node.

No, this is just how basic TCP connection handling usually works: you bind a socket to a particular port to listen for incoming connections which you then accept and receive a new per-connection socket. If you connect outbound you immediately get a per-connection socket with a random port that is required to be different from your listening port (as you already bound to that one).

  1. when the event OpenChannelRequest fired, the network_graph is not updated yet, so iam actually not sure we can rely on that..
    when moving the code the ChannelReady, I do see the node info and announcements, but this is still a bit tricky because we only listen to announcements from peers who we have channels with.

Hm, I don't think this isn't technically true: we should also receive node announcements from peers we're only connected to. That said, I'm afraid we can't do much about this as there really is no other way to be informed of the listening address for inbound connections. While this will be a bit tricky to test, it shouldn't be too big of a deal in practice as we'd generally expect to have synced the gossip state via RGS or P2P during normal operation.

@jbesraa jbesraa force-pushed the persist-inbound-channels branch 2 times, most recently from 7ab6ca9 to 7bebcc0 Compare November 17, 2023 14:37
@jbesraa
Copy link
Contributor Author

jbesraa commented Nov 17, 2023

@tnull
Because we cant find the peer info in network_graph.nodes in the current implementations, we are not able to add the peer to peer_store. I think we might need to make address optional in PeerInfo and not set it in this case - and when reconnecting if address is_none we can try and fetch it from graph. wdyt?

I think we should:

  1. move the functionality to ChannelReady because in a ChannelOpenRequest we are not sure the channel is confirmed, and it could be the case and channel closed, and we dont need that peer data anymore?
    on the other hand, if channel is in progress(between ready and request).. we do want to persist the peer.
  2. access network graph in testing so we fix the tests

@tnull
Copy link
Collaborator

tnull commented Nov 21, 2023

  1. move the functionality to ChannelReady because in a ChannelOpenRequest we are not sure the channel is confirmed, and it could be the case and channel closed, and we dont need that peer data anymore?
    on the other hand, if channel is in progress(between ready and request).. we do want to persist the peer.

LDK will only persist channels across restart whose funding is in-flight, however, they are not required to be confirmed. So if we want to move it to another event ChannelPending might be more suitable. Note that we'll have to check ChannelDetails::is_outbound there to make sure we'd only persist for inbound channels.

  1. access network graph in testing so we fix the tests

Hm, I'm not sure what you mean by that? I think have this behavior covered in tests might become quite complicated. We would need to make sure that the node opening a channel already has a publicly announced channel and did broadcast its announcement before, i.e., setup two nodes, open a channel, give them time to exchange messages and then disconnect and close the channel. Then, we could open the actual channel of interest, and test persistence by a) taking the channel-opening node offline, deleting it's peer data so that it won't reconnect itself (i.e., it would need to use FilesystemStore), and bring it up online again at which point the other node should reconnect to it.

TLDR: After thinking about it once more I'm not sure testing this behavior is worth adding all this complexity afterall.

While it's nice to try to reconnect inbound channels if we come back online, it's really not that crucial as peers establishing outbound connections are expected to reconnect eventually and nodes accepting inbound channels are expected routing nodes that have a longer uptime, i.e., it's fine if they reconnect some of their channels only after a slight delay.

@jbesraa
Copy link
Contributor Author

jbesraa commented Nov 21, 2023

Ok, thanks for the explanation!
Should I then just remove the test and leave the implementation where it is?

@tnull
Copy link
Collaborator

tnull commented Nov 22, 2023

Ok, thanks for the explanation! Should I then just remove the test and leave the implementation where it is?

Mh, I think moving it to ChannelPending would probably make sense, but yeah, we can remove the test for now I think.

@jbesraa jbesraa force-pushed the persist-inbound-channels branch from 7bebcc0 to 964ffd6 Compare November 22, 2023 15:07
@jbesraa jbesraa changed the title Persist inbound channel coming through OpenChannelRequest Persist inbound channel peer info Nov 22, 2023
@jbesraa jbesraa force-pushed the persist-inbound-channels branch from 964ffd6 to 8da7500 Compare November 22, 2023 15:14
@jbesraa
Copy link
Contributor Author

jbesraa commented Nov 22, 2023

thanks @tnull
moved the code to ChannelPending and doing this check before trying to add the peer

if !pending_channel.is_outbound && self.peer_store.get_peer(&counterparty_node_id).is_none()

reverted to this as well

if !config.trusted_peers_0conf.is_empty() {
		// Manually accept inbound channels if we expect 0conf channel requests, avoid
		// generating the events otherwise.
		user_config.manually_accept_inbound_channels = true;
}

src/lib.rs Outdated
.await;
.await {
Ok(_) => {
log_info!(connect_logger, "Successfully reconnected to peer {}", node_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this while debugging, but I think its nice to have..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, why not. However, could you please keep a let res = and then match on res. That avoids unnecessary indents.

@jbesraa jbesraa force-pushed the persist-inbound-channels branch from 8da7500 to add5880 Compare November 22, 2023 15:38
@jbesraa
Copy link
Contributor Author

jbesraa commented Nov 22, 2023

No code changes, just pushed with a verified commit

src/event.rs Outdated
runtime: Arc<RwLock<Option<tokio::runtime::Runtime>>>,
logger: L,
config: Arc<Config>,
peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the generic L here and below:

Suggested change
peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,
peer_store: Arc<PeerStore<K, L>>,

src/event.rs Outdated
channel_manager: Arc<ChannelManager<K>>, network_graph: Arc<NetworkGraph>,
keys_manager: Arc<KeysManager>, payment_store: Arc<PaymentStore<K, L>>,
runtime: Arc<RwLock<Option<tokio::runtime::Runtime>>>, logger: L, config: Arc<Config>,
peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
peer_store: Arc<PeerStore<K, Arc<FilesystemLogger>>>,
peer_store: Arc<PeerStore<K, L>>,

src/event.rs Outdated
if !pending_channel.is_outbound
&& self.peer_store.get_peer(&counterparty_node_id).is_none()
{
let nodes = network_graph.nodes();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this much more compactly (and more readable) as:

						if let Some(address) = network_graph
							.nodes()
							.get(&NodeId::from_pubkey(&counterparty_node_id))
							.and_then(|node_info| node_info.announcement_info.as_ref())
							.and_then(|ann_info| ann_info.addresses().first())
						{
							let peer = PeerInfo {
								node_id: counterparty_node_id,
								address: address.clone(),
							};

							self.peer_store.add_peer(peer).unwrap_or_else(|e| {
								log_error!(
									self.logger,
									"Failed to add peer {} to peer store: {}",
									counterparty_node_id,
									e
								);
							});
						}

Also, given we don't log peer store operations elsewhere we don't need to log in the success case here (at least for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

src/lib.rs Outdated
log_info!(connect_logger, "Successfully reconnected to peer {}", node_id);
}
Err(e) => {
log_error!(connect_logger, "Failed to connect to peer: {}", e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's also speak of reconnect here then.

src/lib.rs Outdated
.await;
.await {
Ok(_) => {
log_info!(connect_logger, "Successfully reconnected to peer {}", node_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, why not. However, could you please keep a let res = and then match on res. That avoids unnecessary indents.

@tnull tnull added this to the 0.2 milestone Nov 27, 2023
  When we get an inbound channel request, we dont persist the
  peer info who initiated the request.
  In this commit, we use `[LdkEvent::ChannelPending]`
  to track channels that are inbound and their counterparty
  node id is not persisted, and if so,
  we add their `PeerInfo` to `PeerStore`.
  Notice that in order to persist this kind of data,
  we need first to get the listening address of the counterparty
  node, which is not always available, in that case we dont persist
  the peer info.
@jbesraa jbesraa force-pushed the persist-inbound-channels branch from add5880 to fa404d1 Compare November 28, 2023 15:28
@jbesraa
Copy link
Contributor Author

jbesraa commented Nov 28, 2023

Pushed new version taking into account all the comments mentioned above! thanks for the feedback & patience!

Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tnull tnull merged commit fc09bfb into lightningdevkit:main Nov 29, 2023
@jbesraa jbesraa deleted the persist-inbound-channels branch April 27, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PeerInfo is not persisted if the channel is inbound

2 participants