Formance - Funds Traceability in Digital Ledgers: Towards a New Data Model for Fintech, Part II

Earlier, we discussed the concept of promises, and how fintechs should see themselves as being in the business of managing warehouses of promises. Warehousing requires the ability to unambiguously trace depositors’ funds to real bank accounts—that is, to say which assets are backing which liabilities. This ability is especially important when things go wrong.

As it happens, classical double-entry ledgers don’t store sufficient information to permit tracing specific assets to specific liabilities except in the simplest of cases. Let’s look at some specific examples to show why that is, and then let’s explore a modification we can make to classical double-entry ledgers that adds the necessary information without fundamentally altering the way such ledgers work. Unfortunately, these modifications come at the cost of increased implementation complexity and decreased performance, as we’ll see.

Backtracing

Imagine a simple situation: we have a neobank, two accounts users:alice and users:bob and we are storing funds at JPMC and Wells Fargo.

First, Alice deposits $100 into their account with us, and we place their money into an account at JP Morgan Chase.
Then, Bob deposits $100 into their account, and we place their money into an account at Wells Fargo.

Our classical double-entry ledger will look like this:

#TXID	Debit	Credit	Amount
0	banks:jpmc		100
0		users:alice	100
1	banks:fargo		100
1		users:bob	100

Now, if we want to determine where each user’s funds are located, we can construct a weighted directed graph over the accounts and transactions. We need to pair up each debit with each credit, which we can in this case by merely looking at the order of the entries in the respective columns. We create a node in our graph for each account, and then draw an arrow from each debit to its corresponding credit, like so:

We can now use backtracing to answer questions about where funds are located. We start at a user’s account, and follow the arrows backward until we land on a bank account. In this case, if we want to know where user Alice’s funds are located, we can follow the arrows backwards to JP Morgan Chase. Question answered.

More complex backtracing

However, using this technique only works under a very narrow, and unrealistic set of conditions. Let’s imagine a different scenario to highlight one way backtracing can fail. Imagine Alice deposits $100 into their account, and we place that in JP Morgan Chase. They then makes a second deposit of $100, but this time we place those funds in Wells Fargo. Finally, Alice transfers $100 directly to Bob.

Here is our ledger:

#TXID	Debit	Credit	Amount
0	banks:jpmc		100
0		users:alice	100
1	banks:fargo		100
1		users:alice	100
2	users:alice		100
2		users:bob	100

And here is the resulting graph to backtrace with:

Now, if we want to know where Bob’s funds are, we run into a problem. There are two ways to backtrace from Bob’s account—one path that ends in JP Morgan Chase, and one that ends in Wells Fargo. We don’t have sufficient additional information to decide which is the correct backtracing path. The assets and liabilities balance, but we don’t know which assets are backing which liabilities.

Again, this might not sound like a problem if you're a bank. You are indeed in the business of making promises, and chartered with a certain degree of agency in the crafting of your unique assets/liabilities backing equation. But if you're acting as a fintech or in the business of merely transmitting money temporarily held on behalf of your app users, the story is different.

Trees to the rescue?

The difficulty in the above scenario is the branching structure of the graph. But we’re not out of options. An arborescence is a special kind of directed graph that has a tree-like structure with a single root node to which you can backtrace down a unique path from each other node in the graph. If we think of each bank account as the root of its own arborescence, we can create more sophisticated graphs that allow us to successfully backtrace even in more complicated scenarios.

But as we’ll see, there is a non-trivial tradeoff required to build arborescence structures from our ledger, namely that it forces us to subdivide actual accounts into an ever-growing list of virtual accounts—something that requires a great deal more complexity in our ledgering.

Let’s use the same scenario as above. When Alice deposits their first $100, we place that in JP Morgan Chase, and in the process establish the first virtual account for Alice. But on their second deposit, because we want to deposit it at Wells Fargo, we have to create a new virtual account, to trace that these funds are going to a different bank. So far, so good.

#TXID	Debit	Credit	Amount
0	banks:jpmc		100
0		users:alice#0	100
1	banks:fargo		100
1		users:alice#1	100

Now, Alice wants to transfer $100 to Bob. At this point we need to look through Alice’s virtual accounts, and find some combination that comes to that amount (as well as apply any business rules for how we want to distribute the resulting assets at our banks). Our ledger now has this appended to it.

#TXID	Debit	Credit	Amount
2	users:alice#1		100
2		users:bob#0	100

Virtual accounts are annotated with a number, and the total balance in a customer’s account is the sum of the amounts in their virtual accounts

The resulting graph is now much simpler however. The customer accounts are no longer technically part of the graph, but are presented in the diagram to help make the situation clearer. You’ll notice in fact that now we have two graphs, one rooted in JP Morgan Chase, and one rooted in Wells Fargo. This allows us to successfully backtrace Bob’s funds to JP Morgan Chase, because we can trace a single straight line against the arrows starting from users:bob#0 to banks:jpmc. So, the problem is solved.

But there is a cost to this solution, in that every time we want to debit a customer account, we need first iterate over their virtual accounts to find sufficient funds—and if those funds are distributed over multiple virtual accounts, we then either need to match them against virtual accounts for the recipient in the same tree, so as to avoid the original problem, or proliferate virtual accounts for the recipient. This complexifies the implementation, and will result in a decrease in performance in order to achieve backtracability.

Conclusion

Classical double-entry ledgers by themselves simply can’t provide sufficient information to permit backtracing balances from depositors to the bank accounts holding the money. We can modify such ledgers to add in the missing information, but only at the expense of increased complexity and decreased performance.

But this is not the only way to provide the semantics of promise warehousing! In our next article, we will look at an entirely different ledgering model that can provide the same precise asset-liability mapping without the engineering tradeoffs.