On Distributed Databases and Distributed Ledgers

Why can’t companies wanting to share business logic and data just install a distributed database? What is the essential difference between a distributed database and a distributed ledger?

Last month, I shared the thinking that led to the design of Corda, which we at R3 will be open sourcing on November 30; and Mike Hearn and I were interviewed by Brian and Meher of Epicenter last week. We’ve been delighted by the response and are looking forward to working with those seek to build on Corda, help influence its direction or contribute to its development and maturation;  there’s a lot of work ahead of us!

But one or two observers have asked a really good question. They asked me: “Aren’t you just reimplementing a distributed database?!”

The question is legitimate: if you strip away the key assumptions underpinning systems like Bitcoin and Ethereum, are you actually left with anything? What is actually different between a distributed ledger platform such as Corda and a traditional distributed database?

The answer lies in the definition I gave in my last blogpost and it is utterly crucial since it defines an entire new category of data management system:

“Distributed ledgers – or decentralised databases – are systems that enable parties who don’t fully trust each other to form and maintain consensus about the existence, status and evolution of a set of shared facts”

“Parties who don’t fully trust each other” is at the heart of this. To see why, let’s compare distributed databases and Corda.

Comparing Corda to a distributed database

In a distributed database, we often have multiple nodes that cooperate to maintain a consistent view for their users.   The nodes may cooperate to maintain partitions of the overall dataset or they may cooperate to maintain consistent replicas but the principle is the same:  a group of computers, invariably under the control of a single organisation, cooperate to maintain their state.  These nodes trust each other.   The trust boundary is between the distributed database system as a whole and its users.    Each node in the system trusts the data that it receives from its peers and nodes are trusted to look after the data they have received from their peers.  You can think of the threat model as all the nodes shouting in unison: “it’s us against the world!”

This diagram is a stylised representation of a distributed database:

 distributed-database

In a distributed database, nodes cooperate to maintain a consistent view that they present to the outside world; they cooperate to maintain rigorous access control and they validate information they receive from the outside world.

So it’s no surprise that distributed databases are invariably operated by a single entity: the nodes of the system assume the other nodes are “just as diligent” as them: they freely share information with each other and take information from each other on trust. A distributed database operated by mutually distrusting entities is almost a contradiction in terms.

And, of course, if you have a business problem where you are happy to rely on a central operator to maintain your records – as you sometimes can in finance it should be said – then a distributed database will do just fine: let the central operator run it for you.  But if you need to maintain your own records, in synchrony with your peers, this architecture simply won’t do.

And there are huge numbers of situations where we need to maintain accurate, shared records with our counterparts. Indeed, a vast amount of the cost and inefficiency in today’s financial markets stems from the fact that it has been so difficult to achieve this. Until now.

Corda helps parties collaborate to maintain shared data without fully trusting each other

Corda is designed to allow parties to collaborate with their peers to maintain shared records, without having to trust each other fully. So Corda faces a very different world to a distributed database.

A Corda node can not assume the data it receives from a peer is valid: the peer is probably operated by a completely different entity and even if they know who that entity is, it’s still extremely prudent to verify the information.   Moreover, if a Corda node sends data to another node, it must assume that node might print it all in an advert on the front page of the New York Times.

The trust boundaries – the red curves in the diagram- are drawn in a completely different place!

decentralised-database

In Corda, nodes are operated by different organisations and do NOT trust each other; but the outcome is still a consistent view of data.

To repeat, because this distinction is utterly fundamental:  nodes of a distributed database trust each other and collaborate with each other to present a consistent, secure face to the rest of the world.   By contrast, Corda nodes can not trust each other and so must independently verify data they receive from each other and only share data they are happy to be broadly shared.

And so we call Corda a distributed ledger, to distinguish it from distributed databases. A distributed ledger that is designed painstakingly for the needs of commercial entities.

Put more simply: you simply can’t build the applications we envisage for Corda with traditional database technology.  And that’s what makes this new field so exciting.

Advertisements

10 thoughts on “On Distributed Databases and Distributed Ledgers

  1. @Richard. I am a 25 year old trying to build a company that intends to build similar technology for companies in Jamaica. I would love to extend this conversation about the steps needed to build out this technology with your wisdom in these matters as a guide. If you are willing to assist here is my email address: andre_stephe@hotmail.com

  2. Great Stuff!

    Is there a central authority who decides who can run a node? If anyone can run a node, then how do “valid” nodes protect against Sybil attack? do they choose their own trusted nodes ie Ripple, or is this where the independent verification comes in?

    Also concerns about timestamping. PoET is great, but running inside of Intel’s SGX must be the definition of uber-centralization. Can different nodes agree on different timestamping methods?

    CORDA is definitely pushing the innovation boundary for the greater good. Looking forward to getting more into it. Especially if there is some way to mash zkSnarks into the mix.

  3. @bitcoina – thanks for the question. Corda is designed to allow parties to record and manage agreements (contracts) that exist between them and so data managed on the platform records who these parties are. ie these contracts are real as in “real world” – not autonomous ethereum-style contracts that act over on-platform cryptocurrency.

    So a deployment of a Corda network requires an identity infrastructure. We provide one (based on certificates signed by a signer that all nodes agree to trust) but other approaches would be supported. We’re not dogmatic here. So sybil attacks are prevented by explicitly agreeing up front (almost as a definition of the network) who is/are the trusted identity provider(s). ie we’re not targetting the “anonymous” use-case.

    On timestamping, it’s up to the parties to agreements to decide amongst themselves who they’re prepared to trust to be authoritative for various facts, one of which is the time. However, note: when you work through the game theory, you need the timestamping authority for any given transaction (TSA) ALSO to be the notary. Now, as we stress in our papers, the notaries can be clustered (including BFT) and there can be multiple notaries on the same network. But to avoid game-playing, you do need the actor who commits to a timestamp (actually a time interval) also to confirm the transaction over which they’re signing to attest the time.

    Re zkSnarks: we do not use them but we have designed Corda so as not to _prevent_ them. In particular, take a look at how we’ve constructed our transaction verification architecture when we release the source (the “verify method”).

  4. Lot of clarity on distributed Leadgr and Distributed Database….if someone put architecture between Corda and etherium ,it will be more clear

  5. Pingback: Layer 2 and settlement | Great Wall of Numbers
  6. Pingback: What’s the difference between a distributed ledger and a blockchain? | Bits on blocks
  7. In a traditional distributed application, the shared set of facts may not extend beyond agreeing on the result of a contract at the other parties’ end. Consider a (fairly) involved business process of applying and activating a SIM card. For the owner of the process (or, even for the applicant), what matters is whether the parties involved in OSS and BSS processed the request or not. For example, a network engineer works towards the ‘fact’ of perhaps, adding the MSISDN into HLR/VLR. A security engineer works towards the ‘fact’ of whether the IMEI is blacklisted or not. These are different facts.

    In case of a DLT (say, Bitcoin), the ‘fact’ is clear: account balances. Were it not for banks, who was to say, double spending wasn’t happening? Every participant in the DLT wanted consensus on one goal: account balance.

    MSISDN: Mobile Subscriber International Subscriber Dialing Number.
    HLR/VLR : Home Location Register/Visitor Location Register.
    IMEI: International Mobile Equipment Identifier.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s