feat(iroh)!: Emit mDNS expiry events #3409

oscartbeaumont · 2025-07-28T08:50:47Z

Description

I'm working on a project where we want to have a list of peers that can be connected to without actually establishing a connection to each of them. The user can then later select a peer and we will establish a connection to the ones that are required.

For this to work we need to be able to tell our frontend about the available peers on the network. Right now Iroh emits events when peers are discovered but it doesn't provide a mechanism to detect when those peers are no longer online so they remain in the frontend UI until the application is restarted.

This PR implements a system for Iroh's discovery system to emit events when a peer is no-longer aviable. This is implemented into the core discovery system, however the only discovery mechanism that currently emits these are mDNS.

This PR solves #3040.

Breaking Changes

All methods that previously retruned DiscoveryItem now return DiscoveryEvent. This includes the Discovery trait, Endpoint::discovery_stream, etc.

Change checklist

Self-review.
Documentation updates following the style guide, if relevant.
Tests if relevant.
All breaking changes documented.
- List all breaking changes in the above "Breaking Changes" section.

flub · 2025-07-28T09:50:31Z

Thanks for making this PR! The code looks really solid.

Mind adding a description of how mDNS expriry works? E.g. is it configurable, how is it triggered etc? I only know the basics of mDNS and some pointers towards those details would help me understand how this fits into the iroh discovery mechanism.

dignifiedquire · 2025-07-30T17:13:53Z

any specific reason this is still a draft?

oscartbeaumont · 2025-07-30T17:58:16Z

I've still got to finish implementing a proper unit test. For some reason discovery::mdns::tests::run_in_isolation::mdns_expiry the test I wrote isn't passing. I am fairly sure the problem is in the test itself as this PR is working fine as a Cargo patch in one of my projects.

@flub will write up more tomorrow morning!

oscartbeaumont · 2025-07-31T02:36:37Z

Mind adding a description of how mDNS expriry works? E.g. is it configurable, how is it triggered etc? I only know the basics of mDNS and some pointers towards those details would help me understand how this fits into the iroh discovery mechanism.

It's all handled by the swarm-discovery so it's follows the logic defined by them. The expires events I have added are emitted when Peer::is_expiry returns true. The expiry logic is defined in swarm-discovery here but it's a bit convoluted due to it taking into account stuff like the amount of peers on the network to reduce traffic for large networks.

iroh/src/endpoint.rs

flub · 2025-07-31T10:20:53Z

Mind adding a description of how mDNS expriry works? E.g. is it configurable, how is it triggered etc? I only know the basics of mDNS and some pointers towards those details would help me understand how this fits into the iroh discovery mechanism.

It's all handled by the swarm-discovery so it's follows the logic defined by them. The expires events I have added are emitted when Peer::is_expiry returns true. The expiry logic is defined in swarm-discovery here but it's a bit convoluted due to it taking into account stuff like the amount of peers on the network to reduce traffic for large networks.

To build a mental picture of how these expiry events fit into iroh discovery I would like to have some mental picture of how the expiry events are emitted by swarm-discovery here. Superficially it seems like it is just a timeout since they last announced themselves? But I'm not sure if there's any more to it. Searching the crate docs you linked for "expiry" doesn't give me anything clear.

I'd like to understand how this works because this is a breaking change to the discovery trait, and we need to be sure this is generic enough and will be future proof.

For example, if my "this is a simple timeout" reasoning above is correct, does it really make sense to let every discovery mechanism decide it's own timeout? Maybe! Does it make more sense to instead have an (maybe optional) "announced at" time? Maybe! Is this all a special case of the swarm-discovery and does it make more sense to add the expiry stuff to that itself without involving the discovery trait? Maybe! This is why I'd like to get a better picture of the entire system we're involving here.

oscartbeaumont · 2025-08-03T16:30:16Z

If we think about this theoretically I think an expiry event should be emitted under two conditions:

A node doesn't annonces itself for an agreed on timeout (Could happen due to it being force shutdown, disconnected from the network, application crash, etc)
A node annonces it's going offline (The user gracefully shuts the application on a remote machine)

The first condition is just a timer that aligns with the TTL of the DNS record emitted by the mDNS system. DNS records are normally published with a TTL which is the amount of time they are valid to be cached for. This is a reasonable value to use for the purpose of knowing if the node is still available as any resonable implementation would update the network before the TTL elapses.

An argument could be made that condition 2 of a node going offline is the equilevent of it's record expiring (condition 1). The issue with this is if the node information is user-facing (as it is in my usecase) this is a horrible user expirence. When someone quits the application on a remote computer, as long as it gracefully shutdown, you would expect it to instantly disappear from the UI on all other machines on the network.

If we jump back into Iroh. The design i've implemented here with the discovery system emitting DiscoveryEvent's is designed to support both of these usecases.

If we relied on userspace timers, I don't think we would be able to have the instant feedback when a node went offline as the timer isn't deeply tied into the information the discovery system has.

I personally think the individual discovery systems should implement the expiry themselves which means they can use whatever logic they want internally to determine if the node is still available. Although you could argue this is less flexible from a user perspective, I think there is generally only a single correct value for if a node is available or not.

For example in the case of mDNS DNS record, when the TTL expires the client should be treated as expired. If the default TTL was a problem I think this should be adjusted as the point of emitting the DNS record, not on a node which is receiving it.

I have only been using the mDNS discovery system so i'm not certain how this fits into the other systems but I assume all systems would have some form of TTL or expiry signal as you always need way of knowing if the discovery information is still going to be valid.

So tldr: this isn't just a simple timeout, it also takes into account a node going offline near-instantly which is crucial to the UX of the application i'm building.

Let me know if that makes a bit more sense, and anything if you would like me to clarify furthur!

flub · 2025-08-18T12:29:40Z

@oscartbeaumont Thanks for the overview! This helped me a lot. Do you know if mDNS (and the swarm-discovery crate) do announce when they're going offline?

flub · 2025-08-18T12:30:19Z

(Also, apologies for the delay! I've been on holiday and am catching up on things now)

iroh/src/discovery.rs

iroh/src/discovery/static_provider.rs

oscartbeaumont · 2025-08-18T17:40:55Z

Do you know if mDNS (and the swarm-discovery crate) do announce when they're going offline?

I have observed this in testing but I actually can't figure out what is causing it within swarm-discovery. I am fairly sure the swarm discovery GC is probally the cause which isn't exactly a push-based shutdown event like I thought but it's still detecting the device going offline very quickly so from a UX/usecase perspective I think the exact mechanism is irrelevant.

flub · 2025-09-01T09:46:15Z

Hey, a friendly ping that we would appreciate adding this feature. It does address a need for users and would be a great addition. It took me a while to understand the design and implications of these pieces. But a DiscoveryEvent on subscribe which includes an Expired event does make sense to me now. So would be cool to see this adjusted to that.

If you're on holidays or are otherwise just busy and planning to get back to this later: apologies! Don't mean to hurry you up. Only worried that the discussions may have put you off a bit.

oscartbeaumont · 2025-09-01T18:49:14Z

I will do my best to get to this in the next week or so. I have been meaning to get back to it but i've been incredibly busy.

Apologies for the delay!

flub · 2025-09-02T08:41:45Z

@oscartbeaumont All good, take your time! I was just worried I might have put you off with the back-and-forth a bit while trying to understand this.

iroh/src/discovery/mdns.rs

flub · 2025-09-08T14:56:25Z

iroh/src/discovery/static_provider.rs

@@ -67,7 +67,7 @@ use super::{Discovery, DiscoveryError, DiscoveryItem, NodeData, NodeInfo};
 #[derive(Debug, Default, Clone)]
 #[repr(transparent)]
 pub struct StaticProvider {
-    nodes: Arc<RwLock<BTreeMap<NodeId, StoredNodeInfo>>>,
+    nodes: Arc<RwLock<BTreeMap<NodeId, Option<StoredNodeInfo>>>>,


I don't think these changes are still needed are they?

ramfox · 2025-09-12T00:51:05Z

Hey @oscartbeaumont, we are really excited to get these changes into our next release happening early next week. Please let me know if there is anything I can do to help get this merged. For example, if you are okay with me pushing to your branch I can fix the failing checks.

Thanks!

ramfox · 2025-09-12T13:54:08Z

The wasm and cargo deny tests should pass after a rebase, but just a note: the last PR we merged (#3403) also contained mdns changes, so there will be merge conflicts.

Happy to resolve those and push if it's any help!

oscartbeaumont · 2025-09-14T05:46:21Z

Okay, this should be good to rerun CI and get a final review! I merged in main and fixed the final comments!

oscartbeaumont added 2 commits July 28, 2025 16:39

mdns expiry

b382476

wip tests

c1ce6f6

n0bot bot added this to iroh Jul 28, 2025

github-project-automation bot moved this to 🏗 In progress in iroh Jul 28, 2025

flub changed the title ~~Emit mDNS expiry events~~ feat(iroh)!: Emit mDNS expiry events Jul 28, 2025

Merge branch 'main' into mdns-expiry

fc3e87a

oscartbeaumont added 2 commits July 31, 2025 10:29

fix unit test

cae97e3

Merge remote-tracking branch 'origin' into mdns-expiry

e0c5091

oscartbeaumont marked this pull request as ready for review July 31, 2025 02:30

matheus23 reviewed Jul 31, 2025

View reviewed changes

iroh/src/endpoint.rs Show resolved Hide resolved

oscartbeaumont added 2 commits August 4, 2025 00:02

fix comment

4f297df

Merge remote-tracking branch 'origin' into mdns-expiry

88b50fd

flub reviewed Aug 18, 2025

View reviewed changes

iroh/src/discovery.rs Show resolved Hide resolved

iroh/src/discovery.rs Outdated Show resolved Hide resolved

dignifiedquire reviewed Aug 18, 2025

View reviewed changes

iroh/src/discovery/static_provider.rs Outdated Show resolved Hide resolved

oscartbeaumont added 2 commits August 19, 2025 01:28

add expiry to static provider

085f315

Merge branch 'main' into mdns-expiry

a78e6b8

oscartbeaumont added 2 commits September 7, 2025 15:27

apply fixes from PR discussion

80057f1

Merge branch 'main' into mdns-expiry

2ada3d1

flub reviewed Sep 8, 2025

View reviewed changes

oscartbeaumont added 4 commits September 14, 2025 13:23

nit

20eb572

Merge branch 'main' into mdns-expiry

68206ad

move back to AbortOnDrop

3f24253

revert static provider changes

862ecb9

feat(iroh)!: Emit mDNS expiry events #3409

Are you sure you want to change the base?

feat(iroh)!: Emit mDNS expiry events #3409

Conversation

oscartbeaumont commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Breaking Changes

Change checklist

Uh oh!

flub commented Jul 28, 2025

Uh oh!

dignifiedquire commented Jul 30, 2025

Uh oh!

oscartbeaumont commented Jul 30, 2025

Uh oh!

oscartbeaumont commented Jul 31, 2025

Uh oh!

Uh oh!

flub commented Jul 31, 2025

Uh oh!

oscartbeaumont commented Aug 3, 2025

Uh oh!

flub commented Aug 18, 2025

Uh oh!

flub commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oscartbeaumont commented Aug 18, 2025

Uh oh!

flub commented Sep 1, 2025

Uh oh!

oscartbeaumont commented Sep 1, 2025

Uh oh!

flub commented Sep 2, 2025

Uh oh!

Uh oh!

flub Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ramfox commented Sep 12, 2025

Uh oh!

ramfox commented Sep 12, 2025

Uh oh!

oscartbeaumont commented Sep 14, 2025

Uh oh!

Uh oh!

oscartbeaumont commented Jul 28, 2025 •

edited

Loading