What Slack Can Teach Us About Privacy In Enterprise Blockchains

“Channels” in Hyperledger Fabric don’t work the way you think they do…

Corda and Fabric have very different approaches to delivering privacy. In this post, I compare the models, explain why Corda works the way it does and why I think the Fabric privacy model is flawed. It turns out this can have real-world costly business implications.

But first… let’s set up some intuition.

If you’ve ever used the popular messaging tool, Slack, you might recognise this message…

Slack

If you add a new member to a private channel in Slack, you have two choices: share your entire history or start a fresh, empty, channel. This works for interpersonal comms but it turns out it doesn’t work nearly as well for the “trust but verify” world of enterprise blockchains. 

This message reveals a fundamental truth: if you’ve shared lots of information in private with some people – on Slack or in an email thread, perhaps – then you have to be very careful about adding somebody new to that group, especially if you care about controlling what they can and can’t see. Remember that time you added somebody to a long “reply-to” chain, only to realise there was something at the bottom that you really didn’t want them to see?!  Undo! Undo!! Undo!!!

Famously, there’s no “undo” button on a blockchain, so we have to get things right first time.

In this piece I’ll explain how Fabric’s privacy design turns out to be very similar to Slack channels. However, it also turns out that a model that works superbly for Slack doesn’t work as well in the world of enterprise blockchains for some very common use-cases.

But first, some history.

When we began our architectural journey at R3, we examined a large number of platforms. We concluded that none met our needs and we embarked on the project that culminated in Corda, the industry’s only finance-grade enterprise blockchain platform.

One of the platforms we included in that initial evaluation and rejected as unsuitable for the broad range of needs of sophisticated financial institutions was the first version of the Fabric platform.

But that was then… and a lot of time has passed. It’s always valuable to revisit past decisions in the light of new evidence and, since Fabric has just reached an important milestone, now is a good time to look again.

One of the key changes since 2015 is the introduction of something called “channels”, intended to address the severe privacy shortcomings in the initial design. It turns out that Fabric channels are very similar to an idea we had considered and rejected as too limiting at the start of the design process for Corda.

In this post, I explain why we rejected this design for Corda and what I think some of the key problems will prove to be.

As we know, early blockchain designs sprayed data around the network and everybody received and processed every transaction. This is how Bitcoin and Ethereum work and it is, of course, a fundamental part of how they work.   It’s the right design for those public blockchain platforms.

But that design, which is perfect for those platforms, is just not appropriate for most problems in today’s enterprise world. So the first version of Fabric, which broadcast data globally like many other platforms of the time, had to be extensively redesigned.

The new solution adopted by Fabric is called “channels”. The idea is effectively to let you set up many, many “mini blockchains” – each of which is called a “channel”.  So you and I may share a channel. Perhaps Alice, Bob and Charley share another one. And maybe Alice, me and Ivor share another one.  It’s as if there are many little private blockchains where members of a channel can see everything in that channel but nobody else can.

Simple, right? Elegant, right?

Unfortunately, no.

My biggest worry about the design when we first considered – and rejected – it for Corda is that assets will get stranded.

Imagine you issue a bond to an investor in a private channel between you and them. Remember: the whole point is that it’s private so you wouldn’t want anybody else in that channel. Why would you want them to know about your private deal?

And now you have that channel with your investor, you can use it to engage in some other transactions with them too.  Perhaps you use this bilateral channel to manage some records pertaining to some other deal you’re working on together.  Or maybe they’re also a customer of yours and you want to manage a complex order.  There will invariably be lots of different pieces of information in a channel – different deals, trades, records – all being managed together and commingled.  And this will be repeated across all the other bilateral and multilateral channels in which you participate.

And, because of the nature of the programming and consensus model used by Fabric-style blockchains, all information in a channel inevitably gets commingled with all other information in that channel.  There’s no easy way to extract just some pieces, with history and provenance… a channel is an all or nothing proposition. This is intrinsic to how these sorts of blockchains work and is a reason Corda uses a totally different architecture based on individual “states” representing specific shared facts, each of which can evolve independently.

A good way to think about this problem is as if a channel is like a break-out room at a conference… filled with whiteboards and sticky-notes on the wall. If you’re in the room, you can see and understand everything… but if you were to just take one piece of paper out of the room, it would make no sense to anybody else because they’d need the full history of everything that happened in the room and all the other papers to understand it.

Or, rather, I thought that was a good way to think about it until I sent an earlier draft of this article to some colleagues for review and one of them pointed out that this is precisely what happens when you manage private conversations in Slack!

If you set up a private channel in Slack and then try to add somebody else, they get to see everything that went before.  Or… you have to set up a brand new channel where they get no history, no context, no provenance.  It turns out the Slack app even has a perfect error message that describes this issue:

Slack

Slack knows why a channel architecture is problematic for a distributed ledger

So imagine you want to take a bond you’ve bought from the issuer in that channel and sell it to somebody else.  How would you do it?

  • Well…. you can’t invite them into the channel, because then they’d see all your other private information. A non-starter.  It would be like inviting them into your secret breakout room and hoping they didn’t look at things they weren’t supposed to.
  • And you can’t easily just extract the pieces of history needed to prove the history of that bond, because everything is commingled.
  • And you can’t simply tell them you own the bond. Why would they believe you? The whole point of enterprise blockchains is that each party verifies the information it is given. This is what distinguishes enterprise blockchains from databases, after all.
  • I suppose you could ask the issuer to cancel the issuance in your channel and reissue it in the new buyer’s channel.  But now we’re getting a little bit silly. This would be indistinguishable from simply managing the assets on the books of the issuer. It would defeat the point.

This is not just theoretical: it could have real-world impact.

If it is difficult to move assets between channels with provenance, one has to resort to cumbersome workarounds. Workarounds such as introducing “market makers” who sit between channels and maintain liquidity in both. But this has real costs: additional people to trust, additional fees, additional liquidity needs…

How is Corda different?

As I’ve written in other pieces, we spent a TON of time on Corda’s design: the data model, the fundamental conceptual framework and, critically, our solution to the thorny problem of how to assure privacy whilst allowing parties independently to validate chains of custody and other shared data… the essence of what makes a blockchain a blockchain and not just an expensive distributed database!

Our design addresses the problems in this article head on: data is shared at the level of individual deals or agreements or trades or contracts, with only the transactions needed to verify provenance being shared and no more.  On top of this we layer anonymization and other privacy-enhancing techniques. These techniques build on top of each other. The need to prove provenance never goes away but we do our absolute best only to share the data that is needed to satisfy the recipient.

What’s more, we also built Corda to be able to use Intel’s game-changing SGX technology – without any changes to apps and with Corda’s famous developer-friendly programming model.  So I was delighted that we could announce our partnership with Intel earlier this month.

I’m massively optimistic about the potential of blockchain technology to solve real problems in business. Just make sure you fully understand the pros and cons and different tradeoffs of each before making your selection: as always, one size never fits all.

Post-Script

I should stress that I have a boatload of respect for our friends at IBM – and elsewhere. I think channels are a poor architectural solution but I value immensely the collaboration we have via the Hyperledger Project (where we are both premier members) and beyond.  And I look forward to deepening this collaboration further.   There is more that unites projects such as ours than divides us!

And I should also point out that the Fabric team do know about these issues. For example, see this recent Stack Overflow question:

“How do we enforce privacy while providing tracing of provenance using multiple channels in Hyperledger v1.0?”

The answer is: “At the moment there is no straight forward way of providing provenance across two different channels within Hyperledger [Fabric] 1.0”.

And the answer goes on to reference a design document for a fix. That link is to a very long and complex design document. That tells me that the design problem may be pretty fundamental and can’t be fixed easily. But it’s good news that it is being looked at. We all benefit when platforms develop and evolve and I hope to see significant improvements in this area over time.

 

Update

2017-07-20

IBM’s Dan Selman has taken me to task about this post!  He correctly points out that I didn’t say too much about Corda’s design:

This is because I’ve written about it extensively elsewhere but he’s right: I should have linked.  This video from our Developer Relations team gives a pretty good overview:

And the other videos are pretty good too!

What that video doesn’t say (but should!) is the key point: in real-life scenarios, the dependency tree for any given transaction is invariably a very small subset of the overall set of transactions and so this technique (lazy on-demand provision of just the directly-required dependency tree and no more) gives us the optimal balance.  It is enabled by the transaction design, where each transaction specifically specifies which previous state objects (“shared facts”, if you like) are being superseded.   Put another way: we explicitly declare which parts of the shared state are being updated (actually, replaced) and so we know precisely which proof needs to be provided by one party to another.

In the Slack analogy, it would be equivalent to being able to automatically “lift out” just the pieces of a shared conversation that were directly relevant to the new person you wanted to add to the chat, without also showing them parts of the shared conversation that they have no need to see.

16 thoughts on “What Slack Can Teach Us About Privacy In Enterprise Blockchains

  1. Hi Richard, I have worked a bit on Fabric v1. I get your point about the possibility of stranded assets in Fabric channels. This can hit you when you have multiple assets being stored and channels are designed as per participants. However, I think this issue comes down to the design of Fabric channels for your use case. If the channels are designed in a way that there is a channel for each asset similar to states in Corda, there is no reason why one can’t add future participants to the channel without compromising on the privacy of data. The owner of the asset initially creates the channel for that asset and asks other participants (using off chain messaging) to join channel at appropriate time. The new participant will see only the asset and its history relevant to that channel. In this case, you don’t need to move assets around with provenance across different channels. Let me know if you think this is still an issue.

  2. Hi Praveen – thanks for the comment! Are you suggesting a channel for each issued asset? eg one for every chunk of a bond and one for every unit of a currency? I guess that could work at a very small scale… but it would spiral out of control when the number of subdivisions grew, no? And – perhaps the biggest problem: what happens if you want to exchange assets? eg pay some cash in exchange for a bond. In the model you propose, the cash and the instance of the bond would be on different channels… so now you have the problem of atomically coordinating action between channels… which I don’t think works, does it?

  3. Hi Richard, Yes I was referring to a channel for each asset or combination of assets which would need to be shared among the relevant parties. The granularity of the channel depends on the current and foreseen requirements for sharing with participants. I am coming from reinsurance world, so I was thinking about assets like Risk, Claim, Premium which are not all that granular but will have multiple participants on it. In this case, claims are typically associated with an undersigned risk so those claim assets could go together with the placed risk in the same channel. Agreed, a trading transaction which involves exchanging assets and transactions both ways could be a problem using different channels.

  4. A channel in Fabric 1.0 acts like a separate blockchain from the protocol standpoint, that is, each channel orders transactions within. Multiple channels in Fabric act like a collection of multiple blockchains. Each channel can model one state or more. When multiple channels use the same Fabric-ordering service (as is the case, but this would not be necessary for the goals of intra-channel consistency), then Fabric establishes consensus on the order of events across channels. In this sense Fabric produces “one blockchain”. However, currently Fabric does not have built-in means to express inter-channel dependencies.

    As far as I understand Corda it contains multiple notaries, and each ensures uniqueness (= ordering) of transactions. Each notary produces an independent order for the state it governs (= blockchain). In this way, Corda is not “a blockchain” but a collection of blockchains, just like the multiple channels of Fabric. In contrast to Fabric 1.0 Corda does currently not appear to have built-in protocols to establish consensus across notaries, that is, to provide one truthful record of everything; instead there are many records, one for each “state”. The Corda design argues that achieving such consensus is also not necessary. However, Corda does provide the means to express cryptographically verifiable dependencies across notaries (= inter-blockchain).

    It seems there are more underlying similarities between the two technologies than what the somewhat opinionated blog mentions.

  5. *Correction*: When multiple channels use the same Fabric-ordering service (as is the case, but this would not be necessary), then Fabric establishes consensus on the order of events across channels.
    =>
    Consensus on event order across channels *can be* provided, depending on the implementation of the ordering service (it is not exposed with the Kafka-orderer in Fabric).

  6. Hi Richard, thank you for this interesting comparison. I would love to see more of these, especially Ripple might be an interesting candidate since they also do not have a blockchain or channels. AFAIK they still broadcast every transaction to all participants of their network, so how do they solve privacy?

  7. Nice one Richard. Good read. Lack of standard support for typical enterprise privacy needs remains an achilles heal for many enterprise blockchain use cases. In general we try to avoid on chain encryption, and as a result frequently end up exploring custom messaging approaches to support confidentiality needs.

    One observation re Corda: “…data is shared at the level of individual deals or agreements or trades or contracts…”. This seems somewhat analogous to creating a separate channel per deal using Fabric v1 (assuming the cross chain provenance problem is solvable).

    The requirement I encounter most frequently at the moment is actually a little more granular, let’s say a deal involves 4 parties
    (because it’s actually intended to be a series of transactions strung together), what I need is:
    – Parties 1 and 2 to see all deal data XYZ
    – Party 3 to see subset Y
    – Party 4 to see subset Y and subset Z
    – Party 5 to see no data, nor the identities of the transacting parties
    – All parties need access to sufficient provenance data to validate data they see.

    Once we have this, the requirement evolves yet further, “party 3 should only see their data set once step 2 in the process is reached”…etc. In fact, the direction of travel seems to be that control over who gets/sees what will ultimately need to be determinable ‘per field, per state change’.

    Interested in the extent to which you are looking at this kind of (sub-deal) visibility modelling, and the necessary channel/messaging capabilities that will support it. That same hyperledger design doc you referenced also includes a couple of relevant sections on sub-channels which hint at a similar direction of travel. It is conceivable to me that each chain/channel could be reduced to a single field state change hanging off a (somewhat public?) master event chain – probably hashes only – against which consensus and mathematical integrity are validated, but not much else.

  8. This is a great read Richard. We’ve been researching many DLT solutions, including Fabric and Corda, and had similar concerns to those you point out. In fact, while many of these solutions call themselves “private blockchains”, it is surprising how little privacy they offer to users.
    The Slack analogy is particularly nice. It has been difficult for me to explain my concerns with the privacy models of some of these solutions, and this is a fantastic way to do that.

    However, I found your explanation on Corda’s approach a bit lacking. I feel like it suffers from many of the same problems as Fabric’s. Privacy is extremely important for us, so I thought I’d raise these concerns, hoping you can perhaps alleviate some of them.

    I guess the key point is, that just like in Fabric and in other systems, in order for a peer to verify an incoming transaction, it must be privy to all of its history. How did the inputs get there? Are they all valid? Were any of them double-spent?
    Fabric’s solution is to expose the history of all transactions involving an asset to all users. Corda is indeed more granular, and with the help of its UTXO model it limits the knowledge of peers to only the parts of history they care about: whenever a peer receives a transaction, it receives with it only the relevant history that is required to verify the new state.

    It is interesting that you say in the addendum to this post that “in real-life scenarios, the dependency tree for any given transaction is invariably a very small subset of the overall set of transactions”, while in the video you linked you say that it’s difficult to reason about how much data is revealed to peers when they receive a transaction, as its history is rather arbitrary. I think it remains to be seen that a dependency tree is a small subset of transactions, and even if it was, a motivated party that wishes to learn more about a competitor’s data, could try to strategically ask for multiple payments to try and retrieve more data.

    The key problem with this is that there’s no way for Alice to enforce the secrecy of her data. If she sends a transaction to another user (or institution) Bob, there is no way for her to know when Bob might *legitimately* have to share her data (including where Alice got the asset in the first place, which doesn’t involve Bob at all!). This will fully depend on his future dealings. Therefore, a user that is truly concerned about privacy, will have to assume that *all data is exposed*.

    The attached video on consensus also raises some interesting points, with regards to privacy:

    * It suggests signing key randomization and claims this gives pretty good privacy. However as we know from the public blockchain world, while avoiding address reuse helps, it is nowhere near enough to maintain privacy, and there are many ways keys can be linked together during normal use.

    * It suggests state reissuance to break links in a chain of states, which could work, but you wisely pointed out in the article itself that “This would be indistinguishable from simply managing the assets on the books of the issuer. It would defeat the point.” 😉

    * Finally, it points to work on supporting Intel SGX to allow history verification to be done in a secure enclave, thus hiding it from the peer. It’s an interesting approach, but would be easy for Fabric to implement as well, and of course adds a considerable amount of trust in the chip and the attestation servers. And if parties are OK with that, I would argue there’s a lot more work that can be offloaded to the secure enclave for creating consensus, perhaps removing the need from many of Corda’s current features.

    I’m also wondering about Corda’s notaries (that create what you call Uniqueness Consensus). First of all, some notaries can be verifying notaries which means they have to be privy to all of the history of their assets. But even non-verifying notaries will get quite a lot of metadata for the complete history of an asset. In fact, if I’m not missing anything it seems like the only data they’ll miss out on is the amounts transacted. It also seems like whenever a peer receives a transaction and verifies its history, it will have to ping the notary for uniqueness *of the entire history*, which might help a sophisticated notary learn even more data.

    I’d love to hear your thoughts on this. It seems to me that while Corda does have some improvements on Fabric’s model, it still has the same intrinsic problems, and doesn’t allow users to assume privacy. I think there’s a lot more to learn from the public blockchain world, where the extreme transparency requires some intriguing solutions: Blockstream and Chain did some great work with Confidential Assets that hide both asset type and amounts, while allowing *anyone* to verify integrity. Combining this with mixing (a-la CoinJoin) and one-time addresses (or stealth addresses) can give pretty strong privacy guarantees. And Zcash uses Zero Knowledge proofs not only in their own public chain, but also in their ZSL solution that’s being implemented into JP Morgan’s Quorum – and also allows complete privacy to all participants while allowing them all to verify the integrity of the chain.

  9. @cczurich – thanks for the thoughtful comments. A couple of corrections/clarifications. First, a notary cluster is responsible solely for attesting as to whether any given state object has been consumed or not. There is no attempt (or need) to provide an ordering across states. This is provided by transactions: if you care that two states be consumed/produced at the same time, consume/create them in the same transaction). Secondly, we explicitly do not attempt to coordinate _between_ notary clusters. Instead, we take the far simpler approach of insisting that all states being consumed by a transaction be controlled by the same notary. If they are not, you first move them from one notary to the other as part of a previous transaction. This means we don’t need to worry about complex two-phase commit protocols, etc… and it’s the core of how assets issued by different parties, etc., can nevertheless be combined in new and interesting transactions. Thanks again for the comment.

  10. @supertyler Yes! The Corda “flow framework” is specifically designed for this! It lets you specify precisely to whom data should flow, when and in what way. https://docs.corda.net/key-concepts-flows.html

    But we need to be careful on the partial visibility AND verification side of things. The only way you can do that safely is if you know for sure that the validity of as transaction does not and cannot depend on the nonvisible pieces of data.

  11. @udiWertheimer – thanks for the comments. Quite a lot packed in there! So some quick comments:

    1) The “real-life scenarios” comment. yes – it deserves more detail. I’ll write more up when I get a chance but our experience to date has indeed been that the tradeoff works very well for us.

    2) non-validating notaries see far less than you imply. To perform their “uniqueness attestation” they need to see just the identities of the input objects and a merkle proof that they do indeed exist in the transaction with the claimed ID.

    3) No need to ping a notary to verify the entire history.. the notary signatures are verified as part of resolving and verifying the transaction histories.

    4) ZKP – Corda’s transaction verification approach is designed to allow zero knowledge proof support to be added. Not ruled out at all.

    5) SGX – I wouldn’t be so sure that this would be as easy to add in a way that doesn’t require fundamental redesigns for other architectures.

  12. Pingback: 3 things Digital & Blockchain I learned this week. Stuff worth knowing from The Bankers’ Plumber | 3C Advisory
  13. Richard,

    My name is Heather. I’m the producer for a new show called Curious, and we are trying to complete the lineup for our upcoming season on blockchain/cryptocurrency.

    We are a big fan of your work and your contributions to the body of knowledge surrounding blockchain technology. Based on your expertise and passion, we feel you would be an outstanding addition to the show.

    I would be grateful for the opportunity to provide you with more information in the hopes of having you join us for a brief interview. If you are interested, please contact me at .

    Kindest regards,
    Heather

  14. Pingback: R3’s Richard Brown: IBM should adopt Corda – International Business Times – The Blockchain Daily Reporter and News Digest
  15. This blog seems to have as much to do with business as it does technology. A quiet challenge to the Hyperledger team would have had more to do with technology. Based on the post before mine it is quite an effective marketing technique..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s