A few days ago I posted about how the differences betweenTor and Lightning Network topologies might undermine the privacy that users are able to achieve with the Lightning implemenation of onion routing. Despite many disparaging remarks about my intentions, both Adam Back (u/adam3us) and Rusty Russell (u/rustyreddit) have replied and indicated that there is at least some validity to the concerns raised. Additional discussion of this topic in various comment threads has inevitably led to questions about what is at risk and what users can to do minimize those risks. I’ve had time to formulate the beginnings of a response to the former, which is a necessary precursor to eventually answering the latter. It may not be very satisfying to get the answers in this order, but it is the natural result of posting this work as it evolves. So let’s get right down to it and explore some of the risk areas I’ve been able to identify for Lightning Network operators.
Lightning Network results in many opportunities for an analyst to correlate data across several domains and tie them back to a single pseudonym. Let’s call this single identity the operator’s nym. For purposes of this post, the nym represents the complete anonymous persona of its associated operator – every Lightning operator has only one nym. An analyst may end up identifying multiple sub-nyms until they’re able to link them to a single operator.
The primary sub-nym on Lightning Network is the node. Nodes have many properties which can uniquely identify them over time and space. This is necessary to ensure you're transacting with whom you intend, even if you don't know their real identity. Long-term node identities are also a requirement for payment channels. Because these properties cover different domains but all link back to the same node identity, deanonymization in one domain affects activity across all domains that can be associated with the node. These properties can also be leveraged by an analyst to associate sub-nyms with their operator’s nym.
So what makes up a node identity?
- Node ID
- IP addresses
- Node customizations
- On-chain transactions
- Lightning Transactions
Node ID. The most obvious identifier for a node is its
node_id, the public key the node uses when signing messages on the network. A node’s
node_id is known by all of its peers. It is not necessary for a transaction sender to expose their
node_id, however, the sender must know the receiver’s. A node desiring to service third-party transactions must broadcast
channel_announcement messages for the channels which can be used for routing, which exposes the
node_id to the whole network.
IP addresses. While the
node_id is certainly the strongest node identifier, it is not the only property that could identify a node or link multiple nodes to a nym. Lightning transactions are active, requiring bi-directional communication to complete. To communicate with peers on the internet, nodes require an IP address. At a minimum, this IP address is known to a node’s peers and, if the operator wants to invite other nodes to open channels, it may be broadcast to the network in
node_announcement messages. Although IP addresses do not prove who is behind them, they can provide a lot of information about the operator’s identity and link multiple nodes to a single nym. Connecting to Lightning over anonymizing solutions such as VPNs and Tor can assist in disassociating the IP addresses from the operator, but also introduce new correlation data for observers of those domains.
Node customizations. The
node_announcement messages carry some customizable fields (
features) which are not unique, but could still serve to fingerprint nodes if an operator regularly uses a unique or identifiable combination.
Channels. Nodes can be uniquely identified by their set of channels. Channels which are open at the same time are obvious correlation points; less obvious is the fact that channel relationships are transitive. For instance, if a node initially opens chA and chB, an analyst can easily identified them as belonging to the same node. chA isn’t very reliable so the operator closes that channel and some time later opens chC. The analyst, who has been observing the network, can now associate chC with chA through their shared concurrent channel, chB. If the operator then closes chB and later opens chD, the analyst can link all four channels to the single node thanks to this transitive nature, even though chA and chD were never open at the same time nor share any concurrent channels.
On-chain transactions. Each channel a node participates in will have several addresses which may associate back to the operator’s nym. Inputs to the funding transaction and outputs from the commitment transaction are implicitly transitive; there can be some doubt as to the ownership of an output, but there is a known relationship. An analyst monitoring the blockchain activities of a node may be able to use the inputs and outputs to reliably associate channels opened using the proceeds from previously closed channels, even when the channels are associated with different nodes. This is another way in which an analyst might link multiple sub-nyms to a single nym.
Lightning Transactions. A major trade-off that operators make by transacting over Lightning Network instead of on-chain is that of transaction privacy. In exchange for the promise of keeping their transactions off of the blockchain, Lightning imposes higher risk of transaction correlation. If the privacy guarantees that Lightning provides are breached, deanonymizing the sending and receiving nodes, all exposed transactions can be used by an analyst in an attempt to correlate them to a single nym or operator.