Mental model of NetworkBehaviour interaction and state managemnt in rust-libp2p #5883

MatthiasvB · 2025-02-22T14:41:22Z

MatthiasvB
Feb 22, 2025

I migrated this question from the forum because apparently this is the new place to ask questions

I'm trying to better understand how components of a rust-libp2p node interact, especially how network behaviours do. I found this thread very helpful already, but some questions remain open.

As context, I'm currently trying to build a file sharing application that should work well on a large amount of consumer devices that are likely behind NATs and firewalls. I expect to need at least

peer discovery -> kademlia (+ identify?!)
message exchange -> request_response (for now)
direct connection establishment -> dcutr, upnp (other options?)

As explained in the post already mentioned above, the identify protocol should enhance peer discovery, as I understand by increasing the amount of knowledge that nodes in the network have about each other, but it's not required. So far so good. I also understand that this enhancement (a sort of interaction between kademlia and identify) roughly happens as follows:

kademlia opens a bunch of connections to walk the DHT
identify "notices" new connections by listening to network events
identify exchanges peer info with peers behind those new connections if it deems that beneficial
identify updates the routing table (global state), which all (or many) network behaviours use, improving the node's performance

Please correct me if any of that is (very) wrong.

I would characterize this behaviour as "very async". kademlia and identify do not strictly depend on each other in the sense that a specific sequence of events is particularly important to one or the other. Is that right?

I have noticed two interesting things in the file-sharing and dcutr (can't link, only allowed 2 links as new user) examples:

file-sharing does not use dcutr. I suppose that means it only works between peers with publicly available addresses?!
UPDATE: It does not work over the internet at all. The server listens on 127.0.0.1. I'm trying to build that out to actually work over the internet, but no success, yet
dcutr actually consists of two protocols: dcutr and relay. What I kind of expected to find but didn't (though I could have missed it), is a sort of on_dial_attempt_first_hole_punch functionality.

I want to focus on the second point. If such a functionality does not exist, I see two options that this behaviour could fulfill its functions:

It is also totally async, meaning when a dialing attempt is made to a node behind NAT, it would simply pass through the relay (though I believe to have read or heard that relays aren't that "general purpose", but idk). This would just work, just perhaps be slow. While the nodes already communicate, dcutr could do its magic, punch a hole, and then upgrade the connection (I suppose again by manipulating the routing table in some way). The nodes would then eventually smoothly switch to direct communication
It actually hooks into the dialing process, delaying it until hole punching is done. If this is the case, given the event driven nature of rust-libp2p, some fairly intricate inter-network-behaviour communication has to occur for other tasks to wait until dcutr has done its thing.

Could you clarify which of the two is close to the truth, what I still didn't get right, and any additional details you think would be important to understand?

I also don't yet know how the composability of NetworkBehaviours factors into this picture. Could you explain how the view of the network / general capabilities of Behavior B differs in the following two cases?

Swarm
  |- top level Behaviour
    |- Behaviour A
    |- Behaviour B

vs

Swarm
  |- top level Behaviour
    |- Behaviour A
      |- Behaviour B

or in words: When would I nest B in A instead of make B a sibling of A?

And since we are at the topic of kademlia and hole punching, the final piece I'm not clear on is how kademlia acts without any sort of hole punching mechanism: It's purpose is peer discovery. Let's look at the case where I'm looking for some explicit PeerId. It's supposed to give me back a list of network addresses under which I can (probably) reach that peer, right?! For these addresses to be of use, they need to be dialable. So, assuming that the PeerId I'm querying is behind a firewall and no NAT traversal mechanism is enabled, would kademlia simply return no info on that peer, or would it return addresses that I can never dial successfully? And if e.g. dcutr is active, is it correct that it would return an address to the relay the peer has a reservation with, which I could then use to perform hole punching?

I realize this is a huge topic and I'm sorry to cram all that into a single post. I just really hope that someone can connect the dots between these inter-realated topics

drHuangMHT · 2025-02-24T03:11:58Z

drHuangMHT
Feb 24, 2025

Hi! Below are all my personal opinions and I am not a maintainer of the repository, so take them with a huge grain of salt.

4. identify updates the routing table (global state), which all (or many) network behaviours use, improving the node's performance

You may have to manage that "global state" manually.

1. file-sharing does not use dcutr. I suppose that means it only works between peers with publicly available addresses?!

Yes. DCUtR uses a relay(R in DCUtR) to do hole-punching through NAT firewalls. So there needs to be at least three peers, and the process doesn't happen automatically because you need to manually listen on the relay and the other peer needs to dial the listened address. libp2p-rendezvous should help with this.

2. dcutr actually consists of two protocols: dcutr and relay.

That's right.

I want to focus on the second point. If such a functionality does not exist, I see two options that this behaviour could fulfill its functions:
1. It is also totally async, meaning when a dialing attempt is made to a node behind NAT, it would simply pass through the relay 

This typically doesn't happen automatically. At least you should know which relay to connect to, with rendezvous perhaps.

(though I believe to have read or heard that relays aren't that "general purpose", but idk).

libp2p-relay has a default data cap of 128KB, and will disconnect once reached. So if the hole-punch is unsuccessful, you may not be able to maintain the relayed connection for high throughput applications.

This would just work, just perhaps be slow. While the nodes already communicate, dcutr could do its magic, punch a hole, and then upgrade the connection (I suppose again by manipulating the routing table in some way). The nodes would then eventually smoothly switch to direct communication

Swarm doesn't maintain a routing table internally, it only manages connections, like how to open and close them. The concept of routing table primarily exists in kadelima by finding logically closest peers in a "BTree" way(by comparing bits of their peer ID).

2. It actually hooks into the dialing process, delaying it until hole punching is done. If this is the case, given the event driven nature of rust-libp2p, some fairly intricate inter-network-behaviour communication has to occur for other tasks to wait until dcutr has done its thing.

Dialing is an independent process that only performs according to DialOpts. Currently the only way to hook into this process is by implementing handle_pending_outbound_connection to provide more addresses for the dial, or reject the entire dialing process immediately(via Err(ConnectionDenied)).

I also don't yet know how the composability of NetworkBehaviours factors into this picture. Could you explain how the view of the network / general capabilities of Behavior B differs in the following two cases?

I am currently working on an example to showcase the composability of NetworkBehaviours and potentially some underlying mechanism regarding protocol(substream) negotiation.

Swarm
  |- top level Behaviour
    |- Behaviour A
    |- Behaviour B

vs

Swarm
  |- top level Behaviour
    |- Behaviour A
      |- Behaviour B
or in words: When would I nest B in A instead of make B a sibling of A?

As far as I know, if you compose the behaviours correctly, both cases can have the same result. Because it eventually comes down to ConnectionHandler and substream negotiation. When you are composing NetworkBehaviour, you are also composing their ConnectionHandler . As long as the substreams are negotiated and events are delegated correctly, you can compose the behaviours and handlers however you want.

1 reply

drHuangMHT Feb 24, 2025

#5884 is out! Hope you find it somewhat useful!

elenaf9 · 2025-02-24T22:34:41Z

elenaf9
Feb 24, 2025
Maintainer

When would I nest B in A instead of make B a sibling of A?

If A needs to intercept events from B, or needs to call any methods from B.
For example the autonat-v1 protocol internally wraps the request-response protocol for sending and receiving request.

Another example would be when using the allow-block-list behavior together with your own custom protocol, and depending on the events in your behavior you want to block or unblock certain peers.

However, most of our existing protocols are independent, so you usually compose them as siblings.
Does that make sense?

0 replies

elenaf9 · 2025-02-24T23:06:52Z

elenaf9
Feb 24, 2025
Maintainer

And since we are at the topic of kademlia and hole punching, the final piece I'm not clear on is how kademlia acts without any sort of hole punching mechanism: It's purpose is peer discovery. Let's look at the case where I'm looking for some explicit PeerId. It's supposed to give me back a list of network addresses under which I can (probably) reach that peer, right?! For these addresses to be of use, they need to be dialable. So, assuming that the PeerId I'm querying is behind a firewall and no NAT traversal mechanism is enabled, would kademlia simply return no info on that peer, or would it return addresses that I can never dial successfully? And if e.g. dcutr is active, is it correct that it would return an address to the relay the peer has a reservation with, which I could then use to perform hole punching?

Well, it depends on your network. It's a very common case that the nodes in the network are actually all public, like it is in the IPFS network from which libp2p originates. The libp2p spec on kademlia even states that kademlia servers should be publicly reachable.

So, assuming that the PeerId I'm querying is behind a firewall and no NAT traversal mechanism is enabled, would kademlia simply return no info on that peer, or would it return addresses that I can never dial successfully?

It should return no info, because nodes (at least in rust-libp2p) only store a remote peer's address if we dialed it, which can never happen if the peer is not publicly reachable. Only exception is if the peer address was manually added through kad::Behavior::add_address.

And if e.g. dcutr is active, is it correct that it would return an address to the relay the peer has a reservation with, which I could then use to perform hole punching?

Well if relay is active (and the remote peer actually listening via a relay) it would return the relayed address of the remote peer <relay_peer_adress>/p2p_circuit/<remote_peer_id>> with which you can establish a connection to the remote peer through the relay.

If you and the remote peer support the dcutr protocol the dctur behavior will then (after establishing the relayed connection to the remote peer) automatically attempt a hole-punch for direct connection.

on_dial_attempt_first_hole_punch

I am not sure what you mean by this. Do you mean that peers should attempt a hole punch for establishing a relayed connection?
It is not possible to do a hole-punch without an existing relayed connection, because the timing for the hole-punch needs to be coordinated through second channel.

0 replies

retrohacker · 2025-02-25T16:08:45Z

retrohacker
Feb 25, 2025

identify updates the routing table (global state), which all (or many) network behaviours use, improving the node's performance

I'm not 100% confident in this but I'm pretty sure there is no global state shared between behaviours outside of what swarm passes to them as events (and the behaviour is expected to maintain that state internally when receiving the event, the event is transient). Identify emits events when it identifies new address candidates for a peer that, IIUC, you need to handle and wire into Kad to update its knowledge of peer addresses in its internal state.

direct connection establishment -> dcutr, upnp (other options?)

We have had mixed results with NAT traversal. As @elenaf9 points out, relay is necessary to handle the signaling coordination to attempt a direct connection between two peers. In our network so far, we find many peers end up relying on the relay for the entire connection - they never do get through the NAT - though its still early and we don't have enough real world data yet to say this definitively. Also with the pluggable stack rust-libp2p provides, I expect us to be able to significantly reduce the longtail of clients that have to use relays over time. Here is my mental model on how NAT traversal works (in general, not just libp2p): https://www.blankenship.io/essays/2024-06-11/

file-sharing does not use dcutr

IIUC all behaviours that use the swarm lifecycle hooks are transport/muxer agnostic. If you're implementing a file sharing protocol as a behaviour, it will be able to communicate over any end-to-end connection established by your transport layer (including relayed and direct connections if configured). I also haven't found any file sharing protocols that work with rust-libp2p though haven't gone looking in a while - we ended up implementing our own take on BitSwap to move files around our network.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mental model of NetworkBehaviour interaction and state managemnt in rust-libp2p #5883

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Mental model of NetworkBehaviour interaction and state managemnt in rust-libp2p #5883

Uh oh!

Uh oh!

MatthiasvB Feb 22, 2025

Replies: 4 comments · 1 reply

Uh oh!

drHuangMHT Feb 24, 2025

Uh oh!

drHuangMHT Feb 24, 2025

Uh oh!

elenaf9 Feb 24, 2025 Maintainer

Uh oh!

Uh oh!

elenaf9 Feb 24, 2025 Maintainer

Uh oh!

retrohacker Feb 25, 2025

MatthiasvB
Feb 22, 2025

Replies: 4 comments 1 reply

drHuangMHT
Feb 24, 2025

elenaf9
Feb 24, 2025
Maintainer

elenaf9
Feb 24, 2025
Maintainer

retrohacker
Feb 25, 2025