Updated: 2022-04-15
Introduction
This writeup is of interest primarily to node operators. Though some programming is mentioned, even those without programming experience should be able to gather useful information from this writeup.
In order to participate in the Lightning Network, your node must first be able to actually connect to the network.
This writeup will discuss three CLBOSS tasks related to connecting to the Lightning Network:
- Tracking Internet connectivity.
- Connecting to Lightning Network peers.
- Disconnecting from Lightning Network peers.
Note that this writeup will talk about connecting to peers, and not making channels to them. CLBOSS separates the two concepts. First, the node needs to connect to the network so that the node can perform an initial download of the network map. Then when the map is available, other algorithms will figure out where to make channels to.
This writeup is about the connecting to the network, not about opening channels to the network. This distinction is vital to some of my goals for CLBOSS.
Internet Connection Tracking
Before we can connect to other nodes on the network, we need to be connected to the Internet.
In particular, CLBOSS also monitors our channel peers for their uptime, and may optionally close channels with peers whose uptime is very low. However, if it is us that actually has no Internet connectivity, CLBOSS should discount the apparently low uptime of our peers, since their perceived uptime will be low due to us not being able to contact them, because it is us that is actually down.
Just because the CLBOSS is running, presumably as a plugin of some C-Lightning node, does not mean that the Internet connection is working. If your ISP is like mine, then sometimes you just get no service at all for no reason. Thus, CLBOSS also monitors the Internet connectivity, not just its own uptime.
The Boss::Mod::InternetConnectionMonitor
is the
module responsible for checking Internet connectivity.
Every 10 minutes, it performs a series of tests in order to
determine if the Internet is accessible.
If one test succeeds, then we consider ourselves to be
online.
Otherwise, it proceeds to the next test.
If there is at least one connected peer, then select one at random and send a Lightning-level
ping
to it. If it responds within 30.0 seconds, we consider ourself to be online. Otherwise ifping
fails (for example if the peer disconnected between when we saw it in the list of peers and when we send theping
request, i.e. time-of-check-vs-time-of-use race), or we reach the 30-second timeout, we assume this test fails and proceed to the next test.The Lightning BOLT protocol has its own
ping
message, which should be responded to by a Lightning BOLT protocolpong
message. This is different from the ICMP-levelping
you might be familiar with. C-Lightning exposes aping
command on thelightning-rpc
, which is what CLBOSS uses.Nodes are not obligated to respond to
ping
requests, so this potentially could hang indefinitely. Thus, the 30-second timeout.Due to the way Internet packets work, our local OS might assume the connection is still live, but if no activity occurs on the connection, it might already be disconnected without our local OS realizing it. Sending an application-level
ping
allows us to notice this. If the selected peer is properly coded (such as C-Lightning) then it should be able to respond toping
messages with apong
in a few dozen milliseconds. If it instead times out, it is likely that the connection has gotten interrupted somewhere.Select one of a hardcoded set of https servers, including popular sites such as
www.google.com
andwww.alibaba.com
, which are unlikely to ever go offline, then try to connect to them. If we can connect, we consider ourselves online, otherwise we move to the next attempt below.CLBOSS only does the TCP-level connect to the port, but drops the connection as soon as it is established, to reduce load on the target server. We do not even start SSL tunnel handshaking, just the TCP-level connection establishment is enough for us to consider ourselves online. Connection is done over a proxy if the C-Lightning node has a proxy configured (and C-Lightning assumes the proxy is Tor, which CLBOSS also assumes); this improves the privacy of the node under management if it is configured with a Tor proxy.
Because we want to support using a (Tor) proxy as above, we cannot use an ICMP-level
ping
. The proxy C-Lightning supports is SOCKS5h, meaning it can do a standard DNS lookup and do a connect to a hostname or IP address, but cannot do an ICMP-levelping
via proxy (and Tor, which is the proxy we expect to use, does not support this at the Tor protocol level). Besides, ICMP access requires root on most Unixen, and we really do not want CLBOSS to run root. We could call into the systemping
command, but what if it is not installed, also theping
command output has no standard (various OSs will have various conventions for its output and flags).We just want to check that we can connect to the Internet, not DoS the listed servers. Thus, we frst try to
ping
one of our peers, above, before we fall back to trying to connect to a hardcoded server. We also try only one of the servers listed, to distribute the load among them; as of this writing (mid 2022) we have about two dozen servers hardcoded.If the above selected server fails, we then try all the hardcoded servers. If we can connect to any of them, we consider ourselves online.
If we are really disconnected from the Internet, then trying all the hardcoded servers will not affect any of them (because we are offline), so this is not a DoS attack, either.
However, there is a tiny probability that one of these hugely popular servers is down, but the rest of the Internet is somehow still accessible to us. It is also possible that the local ISP or government or whatever of the node under management wants to censor one of these servers, but the rest of the Interent is still accessible. Hence this fallback.
Note that the above tests are done every 10 minutes if we think we are online. If we think we are offline, we do the tests every 5 seconds instead; if we are really offline, then checking every 5 seconds is not a DoS attack since we are offline. However, this lets us know much more quickly if we are online.
Boss::Mod::InternetConnectionMonitor
will then
broadcast Boss::Msg::InternetOnline
.
This messsage has a single .online
field, which is
true
if we are online and false
if we
are offline.
This message is sent only if we changed from offline to online,
or online to offline; we assume at CLBOSS startup that we are
offline, and also do a check as soon as CLBOSS starts up, in case
we are actually online.
Other modules will then listen for the
Boss::Msg::InternetOnline
message and update their
own local variable tracking onlineness.
If we moved from offline to online, then
Boss::Mod::InternetConnectionMonitor
will also
broadcast Boss::Msg::NeedsConnect
.
Due to CLBOSS starting with the state “offline”
and then immediately checking if we actually have Internet
connectivity, this behavior also handles the case where a new
CLBOSS is set up to manage a C-Lightning node.
Connecting to Lightning Network Nodes
In order to channel to the network, first we have to connect to the network.
Connecting to the network allows us to get the very important network map via the gossip protocol. Without the map, we cannot route payments, and more importantly, the network map is one of the resources we use to discover nodes we want to make channels to.
The Boss::Mod::NeedsConnectSolicitor
module
reacts to the Boss::Msg::NeedsConnect
message.
As of this writing (mid 2022) this message is broadcast in
two conditions.
One of them was described in the previous section: it is
raised when we move from “offline” to
“online”.
The other will be described in a subsection.
When Boss::Mod::NeedsConnectSolicitor
finds that we need to connect, it solicits
for connection candidates on the bus.
This means it broadcasts a message on the bus, which
other modules should respond to by broadcasting a
providing moessage.
This allows the soliciting module to be extended in
the future by adding new modules that provide
information to the soliciting module.
Boss::Mod::NeedsConnectSolicitor
broadcasts a Boss::Msg::SolicitConnectCandidates
message to solict potential connection candidates from
other modules, then processes
Boss::Msg::ProposeConnectCandidates
messages
from other modules, adding them to a set of candidates.
Then it takes this set of candidates and shuffles them, distributing them into two queues.
It then processes both queues, trying candidates one at a time. If we are able to connect to a candidate, we finish processing for that queue. Otherwise, we move to the next candidate on the queue.
As of this writing (mid 2022), the following modules propose candidates:
Boss::Mod::ConnectFinderByHardcode
— propose two nodes at random from a hardcoded list of nodes.Boss::Mod::ConnectFinderByDns
— selects a BOLT 10 DNS server from a harcoded list, then proposes the list of nodes and IP addresses the DNS server provides.
These have important privacy concerns and eclipsing concerns.
If we rely only on the DNS servers, we should note that there are only a small number of entities running BOLT 10 DNS servers. It is thus possible for all of them to be taken over so that the servers return only nodes from a curated list, effectively eclipsing the real network and censoring the Lightning Network. Thus, we also rely on a hardcoded list of nodes, in the hope that even if the DNS servers are taken over, perhaps one or more of the hardcoded list is not taken over.
Of course, since CLBOSS is open-source, someone who wants to eclipse the network could also look at the open-source hardcoded list of nodes, and target them for takeover. I have thus taken the effort to find nodes from various countries, with various political systems, so that it is difficult to take all of them over. Nodes in Western democracies still dominate the list, however.
Sometimes a node on the list shuts down permanently. Thus, once or twice a year I go over the list and try to connect to them manually. If they are offline, I remove them from the list. Then I add more nodes on this hardcoded list.
Most ISPs have awful default DNS resolvers.
For example, lseed.darosior.ninja
is often
unusable from my ISP default DNS resolver.
For this reason, I also hardcode a reliable DNS resolver
together with the actual BOLT 10 DNS server.
A problem here is that the DNS resolver you use gets
to know your IP address.
Thus, if a Tor proxy is configured, we use DNS-over-TCP,
and use the torify
command to wrap TCP
access over Tor.
Note that Tor itself only supports TCP transport, so we
have to specifically use DNS-over-TCP, and not whatever
UDP-based protocol DNS uses by default (I know less about
DNS than I ever thought I did).
The Reconnector
Most node software, it seems, expect connections to be “used”, by which they mean that they expect channels to be made some time after you connect to them. If you connect to them, but after some time, do not make a channel, they will disconnect.
However, it is important for CLBOSS to not become a kingmaker, i.e. if many people start run CLBOSS near-simultaneously, no single node on the network should suddenly get more unmerited channels and / or payment traffic, as that hurts decentralization and makes the position of such a “king” into a target for those who want to monitor or control the Lightning Network. Thus, just because CLBOSS, for example, hardcodes some node in its list of nodes, it should not have special bonuses in being considered up for channelling.
This means that
Boss::Mod::ConnectFinderByHardcode
is likely
to connect to a node, but the rest of CLBOSS will decide
not to make a channel with it after all (since the
hardcoded nodes should not be promoted more to avoid
becoming a kingmaker).
If that node then decides that you not making a channel with them means they should disconnect from you, then a CLBOSS-managed node can end up with no connections, returning us back to our pre-managed state where C-Lightning just sits there twiddling its thumbs.
Thus, CLBOSS has the Boss::Mod::Reconnector
,odule.
This module simply listens for disconnection notices from
the node under management, and then checks if the node
has no more connections.
If the node loses its last connection, it raises the
Boss::Msg::NeedsConnect
message to find more
nodes to connect to, so that the node always has some
connections.
Disconnecting From Lightning Network Nodes
One purpose for connecting to the network is to download an up-to-date gossip map, so that we can run our heuristics that use the map as reference in order to make channels with.
However, there are also other reasons for CLBOSS To connect.
A later writeup will delve into more detail, but in
addition to the
Boss::Mod::NeedsConnectSolicitor
module,
another module also connects to nodes.
The purpose for this connection is to determine the
uptime of other nodes; if we are able to connect to
such nodes, then we can judge that the remote node has
high uptime (probably).
This means in particular that once the node has a good amount of the gossip map, CLBOSS will start connecting to many nodes, due to checking their uptime.
However, keeping up-to-date with the gossip, once you have already downloaded much of the gossip map, should not require more than a few live connections.
Thus, the Boss::Mod::AutoDisconnector
module will limit the number of connections.
This triggers every 10 minutes.
If there are more than 3 connections without channels,
then it will select all but three of them at random,
and disconnect the selected ones.
Connections to peers with channels are
ignored by Boss::Mod::AutoDisconnector
.
Connections to those peers are needed to keep up to
date with the status of the channel, and to be
ready for any forwards via that channel.
The combination of
Boss::Mod::Reconnector
and
Boss::Mod::AutoDisconnector
ensures that
the node has between 1 to 3 connections to nodes that
it has no channels with, which helps propagate gossip.
This is important in case your peers-with-channels
are trying to do an eclipse attack on you and trying
to prevent knowledge about some node from you (for
example, if somebody actually owns a large number of
nodes on the network, and wants to redirect payments
to go through their nodes by denying knowledge of
gossip about other nodes from your node).
At the same time, it also keeps the number of
“useless” connections (i.e. those without
channels) low.