ZmnSCPxj » CLBOSS » Handling Connections

Updated: 2022-04-15

Introduction

This writeup is of interest primarily to node operators. Though some programming is mentioned, even those without programming experience should be able to gather useful information from this writeup.

In order to participate in the Lightning Network, your node must first be able to actually connect to the network.

This writeup will discuss three CLBOSS tasks related to connecting to the Lightning Network:

Note that this writeup will talk about connecting to peers, and not making channels to them. CLBOSS separates the two concepts. First, the node needs to connect to the network so that the node can perform an initial download of the network map. Then when the map is available, other algorithms will figure out where to make channels to.

This writeup is about the connecting to the network, not about opening channels to the network. This distinction is vital to some of my goals for CLBOSS.

Internet Connection Tracking

Before we can connect to other nodes on the network, we need to be connected to the Internet.

In particular, CLBOSS also monitors our channel peers for their uptime, and may optionally close channels with peers whose uptime is very low. However, if it is us that actually has no Internet connectivity, CLBOSS should discount the apparently low uptime of our peers, since their perceived uptime will be low due to us not being able to contact them, because it is us that is actually down.

Just because the CLBOSS is running, presumably as a plugin of some C-Lightning node, does not mean that the Internet connection is working. If your ISP is like mine, then sometimes you just get no service at all for no reason. Thus, CLBOSS also monitors the Internet connectivity, not just its own uptime.

The Boss::Mod::InternetConnectionMonitor is the module responsible for checking Internet connectivity. Every 10 minutes, it performs a series of tests in order to determine if the Internet is accessible. If one test succeeds, then we consider ourselves to be online. Otherwise, it proceeds to the next test.

  1. If there is at least one connected peer, then select one at random and send a Lightning-level ping to it. If it responds within 30.0 seconds, we consider ourself to be online. Otherwise if ping fails (for example if the peer disconnected between when we saw it in the list of peers and when we send the ping request, i.e. time-of-check-vs-time-of-use race), or we reach the 30-second timeout, we assume this test fails and proceed to the next test.

    The Lightning BOLT protocol has its own ping message, which should be responded to by a Lightning BOLT protocol pong message. This is different from the ICMP-level ping you might be familiar with. C-Lightning exposes a ping command on the lightning-rpc, which is what CLBOSS uses.

    Nodes are not obligated to respond to ping requests, so this potentially could hang indefinitely. Thus, the 30-second timeout.

    Due to the way Internet packets work, our local OS might assume the connection is still live, but if no activity occurs on the connection, it might already be disconnected without our local OS realizing it. Sending an application-level ping allows us to notice this. If the selected peer is properly coded (such as C-Lightning) then it should be able to respond to ping messages with a pong in a few dozen milliseconds. If it instead times out, it is likely that the connection has gotten interrupted somewhere.

  2. Select one of a hardcoded set of https servers, including popular sites such as www.google.com and www.alibaba.com, which are unlikely to ever go offline, then try to connect to them. If we can connect, we consider ourselves online, otherwise we move to the next attempt below.

    CLBOSS only does the TCP-level connect to the port, but drops the connection as soon as it is established, to reduce load on the target server. We do not even start SSL tunnel handshaking, just the TCP-level connection establishment is enough for us to consider ourselves online. Connection is done over a proxy if the C-Lightning node has a proxy configured (and C-Lightning assumes the proxy is Tor, which CLBOSS also assumes); this improves the privacy of the node under management if it is configured with a Tor proxy.

    Because we want to support using a (Tor) proxy as above, we cannot use an ICMP-level ping. The proxy C-Lightning supports is SOCKS5h, meaning it can do a standard DNS lookup and do a connect to a hostname or IP address, but cannot do an ICMP-level ping via proxy (and Tor, which is the proxy we expect to use, does not support this at the Tor protocol level). Besides, ICMP access requires root on most Unixen, and we really do not want CLBOSS to run root. We could call into the system ping command, but what if it is not installed, also the ping command output has no standard (various OSs will have various conventions for its output and flags).

    We just want to check that we can connect to the Internet, not DoS the listed servers. Thus, we frst try to ping one of our peers, above, before we fall back to trying to connect to a hardcoded server. We also try only one of the servers listed, to distribute the load among them; as of this writing (mid 2022) we have about two dozen servers hardcoded.

  3. If the above selected server fails, we then try all the hardcoded servers. If we can connect to any of them, we consider ourselves online.

    If we are really disconnected from the Internet, then trying all the hardcoded servers will not affect any of them (because we are offline), so this is not a DoS attack, either.

    However, there is a tiny probability that one of these hugely popular servers is down, but the rest of the Internet is somehow still accessible to us. It is also possible that the local ISP or government or whatever of the node under management wants to censor one of these servers, but the rest of the Interent is still accessible. Hence this fallback.

Note that the above tests are done every 10 minutes if we think we are online. If we think we are offline, we do the tests every 5 seconds instead; if we are really offline, then checking every 5 seconds is not a DoS attack since we are offline. However, this lets us know much more quickly if we are online.

Boss::Mod::InternetConnectionMonitor will then broadcast Boss::Msg::InternetOnline. This messsage has a single .online field, which is true if we are online and false if we are offline. This message is sent only if we changed from offline to online, or online to offline; we assume at CLBOSS startup that we are offline, and also do a check as soon as CLBOSS starts up, in case we are actually online. Other modules will then listen for the Boss::Msg::InternetOnline message and update their own local variable tracking onlineness.

If we moved from offline to online, then Boss::Mod::InternetConnectionMonitor will also broadcast Boss::Msg::NeedsConnect. Due to CLBOSS starting with the state “offline” and then immediately checking if we actually have Internet connectivity, this behavior also handles the case where a new CLBOSS is set up to manage a C-Lightning node.

This leads to the next section, where we describe how we select nodes to actually connect to.

Connecting to Lightning Network Nodes

In order to channel to the network, first we have to connect to the network.

Connecting to the network allows us to get the very important network map via the gossip protocol. Without the map, we cannot route payments, and more importantly, the network map is one of the resources we use to discover nodes we want to make channels to.

The Boss::Mod::NeedsConnectSolicitor module reacts to the Boss::Msg::NeedsConnect message. As of this writing (mid 2022) this message is broadcast in two conditions. One of them was described in the previous section: it is raised when we move from “offline” to “online”. The other will be described in a subsection.

When Boss::Mod::NeedsConnectSolicitor finds that we need to connect, it solicits for connection candidates on the bus. This means it broadcasts a message on the bus, which other modules should respond to by broadcasting a providing moessage. This allows the soliciting module to be extended in the future by adding new modules that provide information to the soliciting module.

Boss::Mod::NeedsConnectSolicitor broadcasts a Boss::Msg::SolicitConnectCandidates message to solict potential connection candidates from other modules, then processes Boss::Msg::ProposeConnectCandidates messages from other modules, adding them to a set of candidates.

Then it takes this set of candidates and shuffles them, distributing them into two queues.

It then processes both queues, trying candidates one at a time. If we are able to connect to a candidate, we finish processing for that queue. Otherwise, we move to the next candidate on the queue.

As of this writing (mid 2022), the following modules propose candidates:

These have important privacy concerns and eclipsing concerns.

If we rely only on the DNS servers, we should note that there are only a small number of entities running BOLT 10 DNS servers. It is thus possible for all of them to be taken over so that the servers return only nodes from a curated list, effectively eclipsing the real network and censoring the Lightning Network. Thus, we also rely on a hardcoded list of nodes, in the hope that even if the DNS servers are taken over, perhaps one or more of the hardcoded list is not taken over.

Of course, since CLBOSS is open-source, someone who wants to eclipse the network could also look at the open-source hardcoded list of nodes, and target them for takeover. I have thus taken the effort to find nodes from various countries, with various political systems, so that it is difficult to take all of them over. Nodes in Western democracies still dominate the list, however.

Sometimes a node on the list shuts down permanently. Thus, once or twice a year I go over the list and try to connect to them manually. If they are offline, I remove them from the list. Then I add more nodes on this hardcoded list.

Most ISPs have awful default DNS resolvers. For example, lseed.darosior.ninja is often unusable from my ISP default DNS resolver. For this reason, I also hardcode a reliable DNS resolver together with the actual BOLT 10 DNS server.

A problem here is that the DNS resolver you use gets to know your IP address. Thus, if a Tor proxy is configured, we use DNS-over-TCP, and use the torify command to wrap TCP access over Tor. Note that Tor itself only supports TCP transport, so we have to specifically use DNS-over-TCP, and not whatever UDP-based protocol DNS uses by default (I know less about DNS than I ever thought I did).

The Reconnector

Most node software, it seems, expect connections to be “used”, by which they mean that they expect channels to be made some time after you connect to them. If you connect to them, but after some time, do not make a channel, they will disconnect.

However, it is important for CLBOSS to not become a kingmaker, i.e. if many people start run CLBOSS near-simultaneously, no single node on the network should suddenly get more unmerited channels and / or payment traffic, as that hurts decentralization and makes the position of such a “king” into a target for those who want to monitor or control the Lightning Network. Thus, just because CLBOSS, for example, hardcodes some node in its list of nodes, it should not have special bonuses in being considered up for channelling.

This means that Boss::Mod::ConnectFinderByHardcode is likely to connect to a node, but the rest of CLBOSS will decide not to make a channel with it after all (since the hardcoded nodes should not be promoted more to avoid becoming a kingmaker).

If that node then decides that you not making a channel with them means they should disconnect from you, then a CLBOSS-managed node can end up with no connections, returning us back to our pre-managed state where C-Lightning just sits there twiddling its thumbs.

Thus, CLBOSS has the Boss::Mod::Reconnector ,odule. This module simply listens for disconnection notices from the node under management, and then checks if the node has no more connections. If the node loses its last connection, it raises the Boss::Msg::NeedsConnect message to find more nodes to connect to, so that the node always has some connections.

Disconnecting From Lightning Network Nodes

One purpose for connecting to the network is to download an up-to-date gossip map, so that we can run our heuristics that use the map as reference in order to make channels with.

However, there are also other reasons for CLBOSS To connect.

A later writeup will delve into more detail, but in addition to the Boss::Mod::NeedsConnectSolicitor module, another module also connects to nodes. The purpose for this connection is to determine the uptime of other nodes; if we are able to connect to such nodes, then we can judge that the remote node has high uptime (probably).

This means in particular that once the node has a good amount of the gossip map, CLBOSS will start connecting to many nodes, due to checking their uptime.

However, keeping up-to-date with the gossip, once you have already downloaded much of the gossip map, should not require more than a few live connections.

Thus, the Boss::Mod::AutoDisconnector module will limit the number of connections. This triggers every 10 minutes. If there are more than 3 connections without channels, then it will select all but three of them at random, and disconnect the selected ones.

Connections to peers with channels are ignored by Boss::Mod::AutoDisconnector. Connections to those peers are needed to keep up to date with the status of the channel, and to be ready for any forwards via that channel.

The combination of Boss::Mod::Reconnector and Boss::Mod::AutoDisconnector ensures that the node has between 1 to 3 connections to nodes that it has no channels with, which helps propagate gossip. This is important in case your peers-with-channels are trying to do an eclipse attack on you and trying to prevent knowledge about some node from you (for example, if somebody actually owns a large number of nodes on the network, and wants to redirect payments to go through their nodes by denying knowledge of gossip about other nodes from your node). At the same time, it also keeps the number of “useless” connections (i.e. those without channels) low.