What Slack Can Teach Us About Privacy In Enterprise Blockchains

“Channels” in Hyperledger Fabric don’t work the way you think they do…

Corda and Fabric have very different approaches to delivering privacy. In this post, I compare the models, explain why Corda works the way it does and why I think the Fabric privacy model is flawed. It turns out this can have real-world costly business implications.

But first… let’s set up some intuition.

If you’ve ever used the popular messaging tool, Slack, you might recognise this message…

Slack

If you add a new member to a private channel in Slack, you have two choices: share your entire history or start a fresh, empty, channel. This works for interpersonal comms but it turns out it doesn’t work nearly as well for the “trust but verify” world of enterprise blockchains. 

This message reveals a fundamental truth: if you’ve shared lots of information in private with some people – on Slack or in an email thread, perhaps – then you have to be very careful about adding somebody new to that group, especially if you care about controlling what they can and can’t see. Remember that time you added somebody to a long “reply-to” chain, only to realise there was something at the bottom that you really didn’t want them to see?!  Undo! Undo!! Undo!!!

Famously, there’s no “undo” button on a blockchain, so we have to get things right first time.

In this piece I’ll explain how Fabric’s privacy design turns out to be very similar to Slack channels. However, it also turns out that a model that works superbly for Slack doesn’t work as well in the world of enterprise blockchains for some very common use-cases.

But first, some history.

When we began our architectural journey at R3, we examined a large number of platforms. We concluded that none met our needs and we embarked on the project that culminated in Corda, the industry’s only finance-grade enterprise blockchain platform.

One of the platforms we included in that initial evaluation and rejected as unsuitable for the broad range of needs of sophisticated financial institutions was the first version of the Fabric platform.

But that was then… and a lot of time has passed. It’s always valuable to revisit past decisions in the light of new evidence and, since Fabric has just reached an important milestone, now is a good time to look again.

One of the key changes since 2015 is the introduction of something called “channels”, intended to address the severe privacy shortcomings in the initial design. It turns out that Fabric channels are very similar to an idea we had considered and rejected as too limiting at the start of the design process for Corda.

In this post, I explain why we rejected this design for Corda and what I think some of the key problems will prove to be.

As we know, early blockchain designs sprayed data around the network and everybody received and processed every transaction. This is how Bitcoin and Ethereum work and it is, of course, a fundamental part of how they work.   It’s the right design for those public blockchain platforms.

But that design, which is perfect for those platforms, is just not appropriate for most problems in today’s enterprise world. So the first version of Fabric, which broadcast data globally like many other platforms of the time, had to be extensively redesigned.

The new solution adopted by Fabric is called “channels”. The idea is effectively to let you set up many, many “mini blockchains” – each of which is called a “channel”.  So you and I may share a channel. Perhaps Alice, Bob and Charley share another one. And maybe Alice, me and Ivor share another one.  It’s as if there are many little private blockchains where members of a channel can see everything in that channel but nobody else can.

Simple, right? Elegant, right?

Unfortunately, no.

My biggest worry about the design when we first considered – and rejected – it for Corda is that assets will get stranded.

Imagine you issue a bond to an investor in a private channel between you and them. Remember: the whole point is that it’s private so you wouldn’t want anybody else in that channel. Why would you want them to know about your private deal?

And now you have that channel with your investor, you can use it to engage in some other transactions with them too.  Perhaps you use this bilateral channel to manage some records pertaining to some other deal you’re working on together.  Or maybe they’re also a customer of yours and you want to manage a complex order.  There will invariably be lots of different pieces of information in a channel – different deals, trades, records – all being managed together and commingled.  And this will be repeated across all the other bilateral and multilateral channels in which you participate.

And, because of the nature of the programming and consensus model used by Fabric-style blockchains, all information in a channel inevitably gets commingled with all other information in that channel.  There’s no easy way to extract just some pieces, with history and provenance… a channel is an all or nothing proposition. This is intrinsic to how these sorts of blockchains work and is a reason Corda uses a totally different architecture based on individual “states” representing specific shared facts, each of which can evolve independently.

A good way to think about this problem is as if a channel is like a break-out room at a conference… filled with whiteboards and sticky-notes on the wall. If you’re in the room, you can see and understand everything… but if you were to just take one piece of paper out of the room, it would make no sense to anybody else because they’d need the full history of everything that happened in the room and all the other papers to understand it.

Or, rather, I thought that was a good way to think about it until I sent an earlier draft of this article to some colleagues for review and one of them pointed out that this is precisely what happens when you manage private conversations in Slack!

If you set up a private channel in Slack and then try to add somebody else, they get to see everything that went before.  Or… you have to set up a brand new channel where they get no history, no context, no provenance.  It turns out the Slack app even has a perfect error message that describes this issue:

Slack

Slack knows why a channel architecture is problematic for a distributed ledger

So imagine you want to take a bond you’ve bought from the issuer in that channel and sell it to somebody else.  How would you do it?

  • Well…. you can’t invite them into the channel, because then they’d see all your other private information. A non-starter.  It would be like inviting them into your secret breakout room and hoping they didn’t look at things they weren’t supposed to.
  • And you can’t easily just extract the pieces of history needed to prove the history of that bond, because everything is commingled.
  • And you can’t simply tell them you own the bond. Why would they believe you? The whole point of enterprise blockchains is that each party verifies the information it is given. This is what distinguishes enterprise blockchains from databases, after all.
  • I suppose you could ask the issuer to cancel the issuance in your channel and reissue it in the new buyer’s channel.  But now we’re getting a little bit silly. This would be indistinguishable from simply managing the assets on the books of the issuer. It would defeat the point.

This is not just theoretical: it could have real-world impact.

If it is difficult to move assets between channels with provenance, one has to resort to cumbersome workarounds. Workarounds such as introducing “market makers” who sit between channels and maintain liquidity in both. But this has real costs: additional people to trust, additional fees, additional liquidity needs…

How is Corda different?

As I’ve written in other pieces, we spent a TON of time on Corda’s design: the data model, the fundamental conceptual framework and, critically, our solution to the thorny problem of how to assure privacy whilst allowing parties independently to validate chains of custody and other shared data… the essence of what makes a blockchain a blockchain and not just an expensive distributed database!

Our design addresses the problems in this article head on: data is shared at the level of individual deals or agreements or trades or contracts, with only the transactions needed to verify provenance being shared and no more.  On top of this we layer anonymization and other privacy-enhancing techniques. These techniques build on top of each other. The need to prove provenance never goes away but we do our absolute best only to share the data that is needed to satisfy the recipient.

What’s more, we also built Corda to be able to use Intel’s game-changing SGX technology – without any changes to apps and with Corda’s famous developer-friendly programming model.  So I was delighted that we could announce our partnership with Intel earlier this month.

I’m massively optimistic about the potential of blockchain technology to solve real problems in business. Just make sure you fully understand the pros and cons and different tradeoffs of each before making your selection: as always, one size never fits all.

Post-Script

I should stress that I have a boatload of respect for our friends at IBM – and elsewhere. I think channels are a poor architectural solution but I value immensely the collaboration we have via the Hyperledger Project (where we are both premier members) and beyond.  And I look forward to deepening this collaboration further.   There is more that unites projects such as ours than divides us!

And I should also point out that the Fabric team do know about these issues. For example, see this recent Stack Overflow question:

“How do we enforce privacy while providing tracing of provenance using multiple channels in Hyperledger v1.0?”

The answer is: “At the moment there is no straight forward way of providing provenance across two different channels within Hyperledger [Fabric] 1.0”.

And the answer goes on to reference a design document for a fix. That link is to a very long and complex design document. That tells me that the design problem may be pretty fundamental and can’t be fixed easily. But it’s good news that it is being looked at. We all benefit when platforms develop and evolve and I hope to see significant improvements in this area over time.

 

Update

2017-07-20

IBM’s Dan Selman has taken me to task about this post!  He correctly points out that I didn’t say too much about Corda’s design:

This is because I’ve written about it extensively elsewhere but he’s right: I should have linked.  This video from our Developer Relations team gives a pretty good overview:

And the other videos are pretty good too!

What that video doesn’t say (but should!) is the key point: in real-life scenarios, the dependency tree for any given transaction is invariably a very small subset of the overall set of transactions and so this technique (lazy on-demand provision of just the directly-required dependency tree and no more) gives us the optimal balance.  It is enabled by the transaction design, where each transaction specifically specifies which previous state objects (“shared facts”, if you like) are being superseded.   Put another way: we explicitly declare which parts of the shared state are being updated (actually, replaced) and so we know precisely which proof needs to be provided by one party to another.

In the Slack analogy, it would be equivalent to being able to automatically “lift out” just the pieces of a shared conversation that were directly relevant to the new person you wanted to add to the chat, without also showing them parts of the shared conversation that they have no need to see.

A Simple Explanation of Enterprise Blockchains for Cryptocurrency Experts

And why R3’s Open-Source Corda platform is the one to watch…

We’re doing some really interesting engineering at R3 right now… We have Java running in Intel SGX… We’re hacking a JVM to make it deterministic… We’ve proved you can suspend threads of execution to a database and bring them back to life across restarts as if nothing happened… (We even got emojis to display cleanly on multiple different terminals….)

“It’s just so nice to finally see people doing software engineering in this space” (unsolicited comment at a recent conference I attended)

But why are we doing all this work? After all, public blockchains are red hot right now: prices reaching new highs, “Initial Coin Offerings” showing no sign of slowing and new innovations announced daily. Why is a strange firm called R3, which recently raised $107m to complete the build out of our open source Corda platform, heading off in what seems like a different direction? Aren’t we just building over-complicated centralized databases? Or solving a problem that nobody has?

To be honest, I thought the public blockchain community didn’t have much interest in our work… until Joel from our developer relations team visited San Francisco to deliver our Corda technical training. He had been nervous: many cryptocurrency ‘maximalists’ are West-Coast based… and he thought he would be in for a hard time.

Anyway… he needn’t have worried. The audience was the best he’d ever presented Corda to: uniformly respectful, engaged, questioning and inquisitive. And it made me realise: I’ve done a terrible job of explaining what we’re up to and why we’re taking the route we’re are.

What problem do enterprise blockchains solve?

I wrote about this in more depth when we first announced Corda but, in short, the story is simple:

  • In the beginning there was Bitcoin…
  • … and it was a revelation.
  • Not only because, for the first time ever, we had a censorship-resistant, confiscation-proof, scarce, digital bearer asset…
  • … but because the architecture that Satoshi built to give us this amazing gift taught us something we didn’t previously know. It taught us that:
  • It is now possible to build systems that are operated by multiple parties, none of whom fully trust each other, that nevertheless come into and remain in consensus as to the nature and evolution of a set of shared facts.
  • In Bitcoin, the set of shared facts are: how many bitcoins have been mined and what conditions govern how they can be spent?
  • Newer platforms, such as Ethereum, build on these ideas and expand the set of facts over which we’re coming into consensus; in Ethereum’s case: what is the state of a shared world computer?

As we know, Satoshi set a very high bar for Bitcoin. It works when you don’t know who most of the participants are and it lets miners come and go at will without anybody even knowing who they are.  Like I’ve long said, Bitcoin is a work of genius.

But a key point to stress is that those of us building enterprise blockchain platforms aren’t trying to build a better Bitcoin or even build a better Ethereum. (Why bother? They already exist!) Instead, the thing that interests us is the sentence above that I wrote in italics:

It is now possible to build systems that are operated by multiple parties, none of whom fully trust each other, that nevertheless come into and remain in consensus as to the nature and evolution of a set of shared facts.

This is tantalizing… because it suggests we could completely transform the economics and structure of entire industries. Not by introducing a new currency or decentralized governance model (that’s already being built out by the public blockchains, after all).  But by also massively improving the efficiency of what already exists.

If we knew that all parties were in sync, we could accelerate securities settlement, optimize supply chains, liberate assets stranded in one silo for productive use elsewhere and more.  Anywhere where trade is hampered because of inconsistent systems could be in scope for improvement.  In short, we now have at our hands a new approach to solving one of the trickiest problems in transaction processing: the reconciliation problem.

If we could be sure that one firm’s IT systems were in perfect sync with their counterparts’ systems, it is mindblowing to imagine how much error, risk, duplication, complexity and cost could be eliminated… and how many hitherto impossible transactions became feasible.

In other words: quite separate to the well-understood revolution ushered in by the advent of Bitcoin, an entirely different field also got a massive kickstart. Two revolutions for the price of one.

It doesn’t sound as exciting as a new world currency, to be sure. But it could, in its own way, be utterly transformational. And note: the solution I’m talking about is not the same as a distributed database. And it’s not the same as a fully centralized solution. And it’s not another cryptocurrency or public blockchain. It is something new entirely.

In short, there’s a reason why enterprise blockchain firms like ours look, talk and act differently to cryptocurrency firms and communities: we’re building on some of the same technology, but solving different problems. Problems like managing trade finance relationships, confirming trades and issuing and trading Commercial Paper.

But don’t fall into the trap of thinking all enterprise blockchains are the same. Because they’re not. The Corda introductory whitepaper and Corda technical whitepaper go into this in more depth. But I know you’re busy. So let’s look at just three aspects.

Architecture

First, architecture.  We set ourselves the challenge of building an enterprise blockchain that met the needs for the most demanding clients in business: the financial services industry. It’s why it’s the only enterprise blockchain designed to support interoperable business networks: we want ecosystems to be able to transact and to trade. And it’s why we built a platform that could easily talk to existing business applications and which businesses could actually deploy.

That’s why Corda runs on the Java platform, stores its data, immediately queryable, in a relational database and moves data using message queues.  This facilitates integration, keeps operations people happy in big companies and massively simplifies the design.  Mike Hearn has spoken about this at length.  It sounds so simple – so traditional – yet it’s unbelievably valuable. We scratch our heads every day asking ourselves why everybody else in the enterprise blockchain world left this opportunity wide open for Corda to take!

Contrast that with Fabric, part of the Hyperledger Project: written in Go, it uses a gossip network to spray data indiscriminately around the network, with a cumbersome “channels” abstraction bolted on to fix that problem. And rather than default to a SQL database, it offers a strange choice between a key-value store or a NoSQL document database.

JPMorgan’s Ethereum-fork, Quorum, is still a work in progress so it’s hard to comment but it has a talented team behind it. So I’ll just note in passing that the problem that Ethereum solves so well in the public blockchain world is very different to the problem that large institutions have. So it’s not immediately obvious why you’d use the solution to one as the foundation of the solution to the other.

Privacy

Secondly, privacy.  Corda’s design, from day one, was based on an atomic, need-to-know privacy model that enables multiple different business networks and transaction types to co-exist and interoperate at the same time. Truly an interoperable network of networks is being built out.

Fabric’s design is very different. It has had at least two entirely different privacy designs in its short life and the latest, “channels”, suffers from all the same problems as other coarse-grained approaches to privacy: you end up with lots of mini-global-broadcast blockchains that don’t talk to each other and in which assets will get stranded; the opposite of the vision to which we’re building. You might get away with this design in some simple cases but will then come unstuck when you try to extend the solution.

Quorum’s approach is more innovative but I think it still fundamentally suffers from this problem.

Corda is built for the future

Finally, Corda is built for the future. When we designed Corda, we looked at where the broader industry was going and tried to anticipate those trends.  That’s why Corda is designed to work naturally and seamlessly with Intel’s SGX security technology. It’s why you can write your apps in any JVM language and why we chose to write the base platform in Kotlin, one of the most exciting new languages around – a decision which was vindicated when Google made it a first-class language for its Android platform – and it’s why Corda is jam-packed with cool little developer-friendly features that other platforms simply don’t have, such as our support for reactive programming and support for continuations, which we can automatically persist across JVM restarts.

To be fair to IBM, Fabric is also built for the future. If you assume the future runs on a Mainframe 🙂

Learning More about Corda

Corda is currently in public beta, we’ve just shipped our latest milestone release and you can download our one-click live DemoBench tool to get started in no time.

Or if you’d prefer to look at or contribute to the code, head over to our GitHub repo: https://github.com/corda/corda.

The team does its work in public… join the conversation at slack.corda.net!

 

Finally: a note on terminology. In this post I used Blockchain and Distributed Ledger interchangeably.  I tried for a long time to retain engineering purity (“Corda has chains of transactions but it doesn’t batch them into blocks so we probably shouldn’t call it a blockchain!”) But the reality is that the market uses the term Blockchain to describe all distributed ledger technologies, including ours. So I’m not going to fight it any more…!

 

Free advice can be valuable… but only if you take it

If a client tells you your solution doesn’t solve their problem, it may not be the problem that needs to change…

I often argue for the importance of blockchain and distributed ledger technology by using the following chain of logic:

  • Bitcoin’s architecture solved the problem of censorship-resistant digital cash
  • But few, if any, financial firms are interested in censorship-resistant digital cash
  • So why are they looking at this technology?
  • Because some principles underpinning Bitcoin’s architecture – shared ledgers, for example – could be relevant to problems that banks face.

Sure, a blockchain or a replicated shared ledger could indeed be useful to banks. Perhaps it could reduce the need for reconciliation between firms if they all ran off a single ledger, for example. But this says nothing about whether blockchains are the optimal solution to any particular problem in banking.  That still has to be argued, of course.

Recall: the bitcoin architecture was a solution to a very specific, very carefully framed problem – how to transmit value without the risk of censorship. Just because the underlying architecture could be used to solve some pressing problems in banking doesn’t mean it’s the best way to do so. Indeed, although the interlocking aspects of the Bitcoin solution are in some ways quite elegant, there are also some compromises. After all, it is an engineering solution to a set of very specific constraints and so it has to be demonstrated that it’s the right solution when the constraints are different.

Lee Braine, of Barclays Investment Bank CTO Office, made an important contribution to this debate when he spoke at London Blockchain Conference 2015 recently.  The video is now available and I urge anybody working in this space to watch it and to internalise its message.

[vimeo 137190236]

We all too often “talk past each other” in the distributed ledger world and we are quick to assume the other person just “doesn’t get it”.  I can assure you that Lee does get it and it would be a brave startup in this space that chooses to disregard what he said. He’s giving us free advice! Take it!

Like I say, watch the video for yourselves.

I think another way to capture the chain of logic in the video is as follows:

  • Assume the ongoing interest in the application of blockchain technology continues
  • Assume further that some banks identify some compelling business opportunities in deploying a cryptographically-secure shared ledger between themselves.
  • What is the probability that a derivative of Bitcoin or Ethereum or any other current platform will be the best solution to that specific problem?
  • Given that none of them were invented to solve that problem, surely it’s quite low, right?

So we could find ourselves in the situation that bitcoin and blockchain technology have catalysed an orgy of activity, that this activity has identified countless high-quality business problems and yet none of those opportunities are best addressed with the technology that triggered the excitement in the first place!

The theme of this blog is “free advice” and the free advice I’m taking from Lee’s comments includes:

First, we shouldn’t get enamoured by a particular implementation of a technology. Sure: if you have an implementation then you may have bought a place at the table.  But don’t make the mistake of assuming that if the business problem doesn’t fit the technology then it’s the business problem that needs to change!

Secondly, if you’re working in a financial institution, be careful to distinguish between the principles embodied in these technologies. Shared ledgers? Yes. That seems to be at the heart of this domain. Indiscriminate replication? Perhaps. Cryptographically-secured access down to the “row” level? Probably. And so on.

Thirdly, consider the complexity of banks’ existing IT environments. An idealised, “wouldn’t the world be perfect if…” solution is no use to anybody if it requires the whole world to move at once and/or if there is no credible migration path.  This points to a need to listen to the incumbents when they object.  Furthermore, consider the non-functional requirements which are simply a given in this space.

Fourthly, if we assume that today’s current hyperactivity will lead to a new understanding of the possibilities for banks but don’t assume that today’s blockchain platforms (permissioned or permissionless) are the (whole) answer, then surely we’re back in the land of engineering, architecture and hard work? Perhaps this means that the combination of persistence, data models, APIs, consensus, identity and other components that we need won’t all come from one firm.  So a common language, some common vision and an ability to collaborate may become critical. Where is your distinct differentiation? Where would you fit in an overall stack?