Updated: 2022-04-12
Introduction
This writeup serves not only as a description of the overall architecture of CLBOSS, the C-Lightning Automated Node Manager, it also serves as an introduction to this entire series of writeups about CLBOSS.
First, notice that this document has an “Updated” line at the top. Generally, I think most of the architecture of CLBOSS is unlikely to change, but it can change over time, and this document might thus become dated. Thus, the “Updated” will be present on all documents in this series. While I do not expect CLBOSS to change much in the meantime, if you are reading this several years after the “Updated” date, you might want to take this with some random number of grains of salt.
I intend to write out a few more documents after this, over the next few weeks or months. So, if as of now you are seeing just this document, stay tuned for more.
In this particular writeup, I will describe the overall architecture of CLBOSS. This document is fairly technical, as it mostly describes software architecture, which is generally of interest only to other programmers. However, I hope laymen may glean some more understanding of the overall architecture and some of its advantages and why CLBOSS is structured this way.
TLDR
- The core is a central message bus, which delivers messages
from anything that
.raise()
s a message, to all objects that.subscribe()
d to that message type. - CLBOSS is composed of this central bus, plus a bunch of modules.
- The modules connect to the bus in order to broadcast and listen for messages.
- CLBOSS uses large numbers of so-called “greenthreads”, which are like threads but much more lightweight (no kernelspace involvement, just a few pointers on a stack or heap object). This allows CLBOSS to keep on going on even if some procedure unexpectedly fails, since the failure will (usually) only crash its greenthread, and makes it convenient to write “blocking” code without actually blocking other greenthreads.
- Some modules also affect / are affected by the real world outside of CLBOSS. These modules listen for events on the message bus and affect the real world, and / or respond to some real world event by broadcasting a message on the message bus.
- Most modules just listen for particular messages, update their internal or on-disk state, and then broadcast other messages for the rest of the modules to consume.
- Much of CLBOSS is really controlled by a timer module that just emits messages on a regular timered schedule, and then other modules check the node status and respond to the node status as appropriate to manage the node.
- The modular architecture makes it easy to add new behavior, or disable some behavior.
- The modular architecture makes it easy to test individual modules in isolation.
- The description here is an idealization to be targeted, the actual CLBOSS does not quite match it perfectly. ^^;v
The rest of this writeup is significantly more technical, though the other writeups in this series should not be as technical as the below.
S::Bus
namespace S { class Bus; }
At the very core of CLBOSS is a simple centralized signal bus,
S::Bus
.
This bus is really just a massive centralized publish-subscribe
pattern, intended for use in a single process rather than across
processes or machines.
As a message signalling bus, various clients of the
S::Bus
can .subscribe()
to particular
message keys, providing a function to be executed.
When a message (which must have exactly one message key) is
.raise()
d, all functions registered to that key will
then be invoked, but none of the other function will.
CLBOSS is composed of modules, and each module is
“attached” to the bus.
By “attached” I mean that an instance of the module
class is constructed, and given a non-const
reference
to the central S::Bus
.
The module constructor can then subscribe to particular
messages on the bus, and keep a reference to the bus so the
module can publish its own messages.
The module itself is primarily composed of code and data that the
module needs in order to handle its task.
Messages are keyed by C++ type using type introspection with
C++ std::type_index
.
Thus, messages can be of any arbitrary C++ type (both
built-in or class
-based), the only requirement
imposed is that the type is moveable in C++11 terms (it need not
be copyable).
template<typename T> void can_be_raised_on_s_bus(T& t) { (void) std::move(t); }
For CLBOSS specifically, the messages sent over the bus are
always passive data with only public data members, not active
objects with function members.
However, some messages do have data members of type
std::function
, though the typical intent is that
the function itself is the data to be passed around with the
message, and does not encapsulate the details of the message,
only its own captured variables.
As a convention, CLBOSS modules are placed in the namespace
Boss::Mod
(i.e. all modules have a prefix of
Boss::Mod::
).
Some generic classes that are useful across multiple modules
are placed in the Boss::ModG
namespace, the
G
meaning “generic”.
Data structures intended to be used as messages are placed
in the namespace Boss::Msg
.
Such data structures are plain struct
-like classes
with no function members and only public data members.
This central bus makes CLBOSS easily modular. The bus does not require that particular modules be installed or not installed. New modules can be added with little impact to other modules that have other responsibilities. Old code can be easily removed by simply removing the module, again with little impact to unrelated modules. Modularity means also that behavior can be modified or changed by injecting messages into the bus.
Execution
namespace Ev { template<typename a> class Io; }
The templated class Ev::Io<a>
is used
to represent pending executable code which will yield the
templated type a
later when execution completes.
Ev::Io<a>
is a CPS monad, in Haskell terms
(and is named after the Haskell IO a
type).
In JavaScript terms it is a Promise
.
In C terms, it represents a function that accepts a callback,
and is tied to some context pointer for that function.
It uses a syntax inspired by the JavaScript
Promise
type: the .then()
method /
member function.
The .then()
is equivalent to >>=
in Haskell.
For an object of type Ev::Io<a>
with any
type a
, the .then()
/>>=
will combine that object with a function that accepts a plain
type a
, and returns an object of type
Ev::Io<b>
, where b
can be any
type (and can be the same as, or different from, a
.
The result of the .then()
combinator is of type
Ev::Io<b>
, and executing the result is equivalent
to:
- Executing the original
Ev::Io<a>
object. - Extracting the resulting
a
(in callback terms, getting the resultinga
via an argument to the callback). - Executing the given function, which returns an
Ev::Io<b>
. - Executing the resulting
Ev::Io<b>
, and yielding the resultingb
.
So how do we “execute” an
Ev::Io<a>
?
We invoke its .run()
member function, which requires a
callback (which requires a single argument of type a
and
returns nothing).
When the Ev::Io<a>
finishes execution, then the
callback gets invoked with the result.
At its heart, this is a CPS monad, i.e. this is basically a nice
syntax for C code using callbacks.
If you have seen some of the more complex C plugins built into
C-Lightning, such as multifundchannel
, then you know
how much a PITA callback-using style is to code in C.
The .then()
syntax together with C++11 lambda functions,
makes it look significantly nicer: there is no need to write multiple
short file-local functions with their own long boilerplate to
declare.
This is important since callback-using style (i.e.
continuation-passing style or CPS) allows greenthreads to
be implemented in userspace.
Greenthreads are like threads, but context switching is cooperative
rather than preemptive, and launching a new thread is cheap — it
is just an Ev::Io<a>
having its .run()
member function invoked, and does not involve a
context switch from userspace to kernelspace, having the OS update its
thread tables and allocating a fresh, and large, C stack.
Greenthreads are thus lightweight.
In fact, some higher-level languages use greenthreads in their runtime
for concurrency, and encourage programmers to launch new
greenthreads for all tasks.
CLBOSS makes extensive use of multiple parallel greenthreads running at the same time. All greenthreads in CLBOSS run in a single process-level thread, the main thread of the CLBOSS process.
By using greenthreads, we avoid getting bogged down by the heavy
weight of actual preemptive OS-level threads.
Preemptive OS threads are not only heavyweight, but also require
proper mutex usage — all greenthreads run on the same main
process thread, so do not require mutexes, as long as you do not
run an Ev::Io<a>
that blocks and returns to the
main loop, you have exclusive access to all memory and variables
will not change out from under you.
Another advantage is that it makes integrating into event loops
much neater.
Suppose we want to wait for an input to arrive on some pipe or socket.
If we were to read()
directly, then the entire main
thread blocks and the rest of CLBOSS will stop executing.
However, because Ev::Io<a>
accepts a callback, we
can instead make a read()
-like function that results in
an Ev::Io<a>
.
That Ev::Io<a>
object will then take its callback,
and register it into the event loop as “waiting for this pipe /
socket to be ready for reading”, and then return without invoking
any callback.
This causes execution to resume back to the top-level main event loop,
which then handles the new registration and adds it to its
select()
or poll()
or whatever.
Then when the event loop triggers the waiting event, the callback is
invoked, and the rest of the greenthread resumes execution.
The main event loop can thus handle large numbers of greenthreads
executing simultaneously.
This style allows easily coding CLBOSS as if everything were
blocking instead of asynchronous, even though at the low level we
are actually treating everything as being asynchronous.
The Ev::Io<a>
, plus some small amount of glue
code to connect to the main loop, serves as a bridge between our
apparently blocking multithreaded surface syntax, to our actual
asynchronous non-blocking single-thread event loop implementation.
CLBOSS uses the libev
library for its main loop,
hence why the Ev::Io<a>
type is in the Ev
namespace.
However, the actual type itself does not strictly require
libev
and can be trivially adapted to any event system
main loop library.
Aspiring C++ template metaprogrammers should check out how it
implements type introspection of function return types, too (warning:
template metaprogramming can drive you insane, learn at your own risk,
Ph'nglui mglw'nafh template metaprogramming R'lyeh wgah'nagl fhtagn).
CLBOSS is an automated node manager, and a node manager needs to do many things, and even if it fails at one of those things, it should still try its best to handle all the other things we expect it to do. Greenthreads makes it easy for CLBOSS to handle many things, and exceptions are specially handled so that an exception thrown in one greenthread does not bring down other greenthreads. This makes CLBOSS resilient against failures in one of its tasks; other tasks we have for it remain operational.
Oracle Modules And Compute Modules
Roughly speaking, we can classify CLBOSS modules into two large categories:
- Oracle modules connect to the outside world and cause messages to occur on the bus in reaction to outside-world events, or respond to bus messages by manipulating the outside world.
- Compute modules only “think”, they listen only to on-bus messages, possibly store some local state (or possibly on-database state), perform a computation, and emit messages on the bus depending on what the module is intended to do.
(OOP: There are no base classes for modules, as I prefer composition over inheritance. Thus, there are also no base classes for oracle or compute modules. Whether a module is oracle or compute is based on whether it is possible to test it only by attaching it to the message bus (i.e. compute) together with dummy modules, or if we need to somehow emulate some real outside world (i.e. oracle).)
An example of an oracle module is Boss::Mod::Timers
.
This module simply waits for certain set amounts of time, and
then launches a new greenthread to raise a timer message.
This module is responsible for emitting the messages:
Boss::Msg::Timer10Minutes
, sent once every 10 minutes from the time CLBOSS starts.Boss::Msg::TimerRandomHourly
, sent at approximately once an hour, but with some randomness on its exact timing.Boss::Msg::TimerRandomDaily
, sent at approximately once a day with some randomness.
A lot of “oracle” modules actually just listen for one of the timer events, then execute RPC commands to the managed C-Lightning node in order to check its status. Then based on the status of the C-Lightning node, the module then emits messages reporting the status of the C-Lightning node.
A new greenthread is launched just to broadcast each of
the messages.
This makes the Boss::Mod::Timers
module robust, as
any exceptions caused by code triggered by the timers will not
cause the timer loops to crash, so that the timer will always
raise the message on the next scheduled time.
This makes CLBOSS resilient against temporary failures that
were not handled properly by my code; if something triggered
by a timer fails now, then later the
Boss::Mod::Timers
will re-trigger it again,
hopefully with the failure already resolved, and other parts
of the system will continue to run and operate normally.
There are random hourly and daily messages; for example, the random hourly is broadcast between 30 minutes to 90 minutes from the previous broadcast. The intent is to be resilient in case of observable time-related behavior that might be exploited by sentient nodes / node managers in attacks. No known attacks exist, but it helps to be prepared for them by making at least time-based attacks at least harder by having some randomization. The 10-minute timer is always fixed duration between broadcasts, and is intended to be used for monitoring the node, while changes to the node state (feerates, rebalances, etc.) are triggered by the random hourly and/or random daily.
Having a separate “compute” category makes
testing of algorithms much simpler, in combination with the
central bus architecture.
A test of a single compute module involves just instantiating
the S::Bus
together with the module under test,
plus some dummy modules that emulate the other modules the
compute module would interact with.
Since the oracle modules can be replaced, it is easy to
provide arbitrary stimulus to the compute module under test,
broadcasting arbitrary messages and checking for arbitrary
responses from the module under test.
In short, this just showcases the advantage of the dependency
inversion principle: instead of compute modules depending
directly on oracle modules, or on other compute modules, they
all depend on a common interface, the central message
signalling bus.
The central bus then eases testing of the module, and
replacing its dependencies with alternate versions.
Testing the oracle modules is not as straightforward,
since we would need to somehow emulate the outside world
that the oracle modules talk to.
However, it is acceptable, in this context, to
use “testing on the field” i.e. just test oracle
modules by running CLBOSS on a live system.
Oracle modules tend to have much, much simpler code, they
are effectively “just” translators between the
outer world and the inner S::Bus
world.
If something happens on the node that is supposed to be
noticed by the oracle module, and it does not emit a
message (i.e. none of the already-well-tested compute
modules react to the event) then we know it is the
oracle module at fault, and scanning through the code of
the oracle module is usually sufficient since the oracle
module is typically a straightforward translator.
Non-idealities
The described architecture was developed during the development of the actual CLBOSS program itself. This means that some code in CLBOSS was written before some of the architecture had been solidified.
For example, the central bus, and the dependency inversion it represents, implies that modules should not have references or pointers to other modules. Modules should have references or pointers to (i.e. depend on) only the abstract bus, not on any concrete modules.
However, the Boss::Mod::Rpc
module was
written before this part was extended to all parts of
the overall design.
The Boss::Mod::Rpc
module exposes
the .command()
member function, which
is accessible only via a direct reference to the actual
module.
Thus, modules that want to send RPC commands to the
C-Lightning node being managed have to acquire a
pointer to the RPC object.
A better design would have messages to initiate an
RPC command, and a message to represent the RPC command
result, and have the Boss::Mod::Rpc
instead
listen for those messages over the bus.
The convenience of the .command()
member
function can be reacquired by implementing a
“proxy” in the Boss::ModG
namespace; any RPC-using module could then instantiate
this proxy, which would provide a .command()
member function that emits the do-this-RPC-command
message and listens for the corresponding result.
This would make some “oracle” modules
into “compute” modules, since they can
now be tested independently of the real
Boss::Mod::Rpc
, due to the dependency
decoupling.
It would also allow other modules to introspect on
what commands are being sent by the rest of CLBOSS,
by simply listening for the RPC messages on the bus.