Clocks and Watchtowers: March 31, 2019 Snippets

As always, thanks for reading. Want Snippets delivered to your inbox, a whole day earlier? Subscribe here.

This week’s theme: if the Lightning Network can bypass the costly Proof of Work that keeps Bitcoin safe, then how does it stay safe itself? Plus a deep dive into Datacoral and what makes them special.

Welcome back to our Snippets series on the Lightning Network — what is it, why does it matter, and how does it work? Last week we covered the general idea of user-established payment channels that sit on top of an already established cryptocurrency like Bitcoin. But one thing we didn’t talk about is security: how can we guarantee the same degree of robustness we like about the main Bitcoin network, if we’re dispensing with the expensive but necessary Proof of Work mechanism that makes Bitcoin so secure? (If you want a catch-up reminder on what all this means, check out this explainer I wrote, just for you.)

One of the most common questions I hear when I explain Bitcoin to people is actually a very smart one: “It sounds like the big problem with peer to peer currency boils down to the fact that you can’t prevent people double-spending their own coins. Why can’t you just resolve the question by saying, “whichever transaction happened first is the real one, and anything else gets ignored?” This is a good question! The answer is that it’s very hard to conclusively establish what “first” means on a P2P network. There are all kinds of ways by which clever bad actors could mess with timing by telling one version of information to one node and another version to other nodes, leaving it totally unclear what “first” really means. So relying on temporal order does not pass the Byzantine Fault Tolerance test: if bad actors can abuse it, then it doesn’t pass the test of “you can trust messages even if you don’t trust the people sending them.”

So how does Bitcoin solve this? If you’re well-steeped in how the Bitcoin protocol works, you’ll hopefully appreciate that in one sense, the Bitcoin network is acting like a clock with a second hand that ticks in discrete chunks. Instead of treating time as continuous, it treats time as something that starts and stops, synchronizing with each block. Each time someone successfully solves the proof of work challenge and mines a block, it’s like time freezes; we get to all look together to make sure the block is valid; once we agree and commit to begin the next block, time starts back up again. As a bad actor, you can’t play time shenanigans because a) all of the transactions in a block are handled simultaneously, and b) any block buried in the past will be out of reach, because the nature of the proof of work challenge establishes a “time gap” between the blocks. It turns the record of the past (which has to exist in the present) actually behave like the past, in that it’ll be effectively out of reach to anyone in the present; we can observe it, but we can’t change it.

What about in a Lightning channel? In a lightning channel between two people, there’s no Blockchain. There’s no proof of work. There’s no mechanism to “entomb” time that happened in the past, because it’s just a ledger between two people. It’s pretty straightforward to keep track of time when it’s just one balance being passed back and forth between two parties. The problem arises when it’s come time to close the channel: what’s to stop either one of us from posting a past balance, that has since been overwritten, to our illegal benefit? If you broadcast some past state to the main blockchain, and then it gets settled on the main network irreversibly, then you’ve successfully cheated me out of money.

With regular Bitcoin, this would be analogous to holding up a past block and claiming, “this is the present balance”. But no one will believe you, because it won’t be the longest chain. The only balance that anybody cares about is the balance that’s stated on the longest chain of blocks. In a lightning channel, there is no such mechanism to prevent against time shenanigans. The information inside a channel is private; the main network can’t go back and check. Furthermore, the whole point of settling all of these transactions off-chain was so that the main network doesn’t have to safeguard against fraud in every single transaction. So what can we do?

So there’s one straightforward way to guard against this that will work most of the time. Stamp every transaction with a unique hash signature that is like a kind of “serial number”, where each serial number can be irreversibly linked to all previous serial numbers generated in the transaction history. (Cryptographically, there are many ways to do this: one would be to keep hashing the same serial number over and over again in order to generate new ones. If you give me the serial number of transaction 100, and then provide me with a second serial number in question (say number 97), I can prove to a sufficiently high probability whether or not any second serial number does or does not belong to the transaction history in 1 through 99. Conversely, I can prove the opposite as well: whether or not Serial Number 100 is a valid descendent of Serial Number 97.) Any time somebody closes the Lightning Channel, the transaction is frozen for a period of time, giving the other party a chance to review and make sure that the serial number of the final balance being posted is genuinely the last one that was ever generated, and not something from the past. If the serial number is illegitimate, then impose a harsh penalty: give the offended party the opportunity to create a new transaction that deposits them the entire Bitcoin balance from the channel. In other words: if you try to cheat by posting an earlier balance, you can easily get caught (so long as the other person is watching), and if you get caught you’ll lose the entire balance. Not worth it.

Unlike on the Bitcoin main network, where conclusively establishing an order to the transactions is effectively impossible (the ‘clock problem’ from earlier), with two people this isn’t as hard. So our solution should work most of the time. But can anybody spot one big problem? The problem is that you have to be online all the time in order to guard against being cheated. Otherwise, your trading partner would wait until you’re offline for any reason, then quickly post an illegitimate balance and steal your money while you’re not looking. Some people, by the nature of where their lightning node is located, may be online 99% or more of the time. But that’s not something we can count on for everybody. So, as is, this isn’t very safe system design.

How about we incentivize other people to keep watch for us? Paying for someone to professionally keep watch for fraudulent transactions sounds like something I’d be willing to pay for, as a Lightning user! You can imagine a digital node called a “Watchtower” (which is what they’re called on the Lightning Network) whose job is solely to monitor all balances that are getting cleared on the main Blockchain for any signs of misbehaviour. If they ever catch something, have them be authorized to execute the penalty transaction on behalf of the injured party, sending them the BTC balance while keeping some percentage as a bounty fee. (Alternately, you could have a bounty-less business model by simply paying for watchfulness-as-a-service. Plenty of ways to pay for this.)

The challenge, though, is how do we do this in a way that doesn’t a) massively slow everything back down to the speed of the regular blockchain, by having every transaction get checked by others ever time, and also b) reveal the contents of your Lightning Channels, which are private? The answer lies in our serial numbers. If I am a Lightning Network user, I can subscribe to a Watchtower service and then, every time I make a transaction in my private channel, send the Watchtower a copy of the serial number for safekeeping. This way, the Watchtower has a way of knowing whether a transaction is up-to-date or not without knowing what the transactions actually say. Their job becomes a lot more straightforward: scan the serial number of every Lightning settlement on the main Bitcoin network, and look for any that match serial numbers in your archive which aren’t the most up-to-date serial number for that channel. Each serial number isn’t very big: it’s tweet-sized, so storing hundreds of millions or billions of them is totally achievable even for a small operation. From time to time, you can also clear out old serial numbers — whenever a lightning channel gets closed successfully, you can flush out all of its old serial numbers and delete them. Furthermore, this is a great way of making decentralized security scalable: not every Watchtower needs to watch every transaction, or every node!

Ultimately, the Lightning Network deals with the problem of “time cheating” in a very different way from the main Bitcoin network. The main Bitcoin network does so in a way that makes old blocks and old balances irrelevant: anyone can attempt to broadcast old balances if they want to, but no one will care. This is a very robust way to solve the problem: don’t police it; make it irrelevant through system design. The Lightning Network doesn’t have that luxury — not without bringing Bitcoin’s expensive proof of work process to every channel, which would defeat the whole point of the process. So instead, we have to rely on something that’s less ironclad: relying on policing and watchfulness rather than system design is certainly less preferable, but that doesn’t mean it can’t work. The trick will be in making sure that damage can get contained, like in water compartments of a ship’s hull: if something goes wrong on one Channel for whatever reason, don’t let it affect others.

Next week, we’ll talk about how Lightning channels aren’t just for direct transactions — they can also get linked together into a mesh P2P network for payment routing.

In this week’s widely read discussion topic, (and one that’s topical with Lyft’s IPO taking place a few days ago) Jatin Sridhar wrote a quick note about the brutal math of stock options, and why an exit has to be really, really big for startup employees to see a meaningful difference in their lives:

Working for a startup makes increasingly less sense | Jatin Shridhar

Hackers News thread on Working for a startup makes increasingly less sense

As you might imagine, the crux of the problem (and of the misunderstanding between employees’ expectations and the realities of their option cash value) lies not in what is a “fair” percentage of ownership, but rather the problem of dilution. Startups these days are hungry beasts. They consume a lot of venture capital, with today’s environment being such a winner-take-all environment where funds like Softbank (and even your regular mid to late stage VC funds) are more than happy to shower these businesses with as much money as it takes to with. The people who lose out? Unquestionably employees, as their options get diluted down to almost nothing. In this particular case, the scenario described was an employee who joined a startup as the 10th engineer; upon reaching a $200 million acquisition, the employee’s options netted out to around $15,000. Not exactly the life-changing amount of money you were hoping for, and certainly nowhere near enough to contribute to that real holy grail of startup winning: a down payment on a house in the Bay Area. Plenty of people were quick to blame the company for stiffing him on options, but the reality is that most of the time, the biggest difference maker isn’t the option package, it’s the dilution that comes afterwards. Hopefully more startup employees learn to navigate this math in the coming years.

Some big stories unfolding that influence our ability to monitor and protect our air:

EPA panel seeks to bring back fired scientists for clean air review | Sean Reilly, E&E News

Air pollution science under siege at US environment agency | Jeff Tollefson, Nature

Stories from the film industry:

An oral history of 10 Things I Hate About You | Ilana Kaplan, NYT

I’m not a lawyer, I’m an agent | David Simon

How The Matrix built a bulletproof legacy | Brian Raftery, Wired

And elsewhere in media:

How India conquered Youtube | Snigdha Poonam, FT

Nine reasons why Disney+ will succeed (and why four criticisms are overhyped) | Matthew Ball, REDEF

The long, complicated and extremely frustrating history of Medium, 2012-present | Laura Hazard Owen, Nieman Lab

A fascinating aspect of the ETF business I’d never really thought about before: their obscure tax treatment that makes the math work.

ETF Heartbeats | Matt Levine, Bloomberg

This ETF tax dodge is Wall Street’s ‘dirty little secret’ | Zachary R. Mider, Rachel Evans, Carolina Wilson & Christopher Cannon, Bloomberg

NYC takes a big step towards pricing road use:

The era of consequence-free driving in cities is nearing an end | Andrew J Hawkins, The Verge

Congestion pricing could generate billions of dollars, but now the suburbs want a piece | Winnie Hu, NYT

New York City congestion pricing hits speed bump over who gets exemptions | Henry Goldman, Transport Topics

Other reading from around the Internet:

How the world’s biggest brewer killed the craft beer buzz | Dave Infane

New drugs that unleash the immune system on cancers may backfire, fuelling tumour growth | Jocelyn Kaiser, Science

Hackers hijacked Asus software updates to install backdoors on thousands of computers | Kim Zetter, Motherboard

History disappeared when Myspace lost 12 years of music, and it will happen again | Damon Krukowski, Pitchfork

And just for fun, a semi-serious proposal that, at least to me, has merit:

Give the Nobel Prize in literature to Dril | Tom Whyman, The Outline

Also just for fun (it’s a fun week!), this guy’s a hero:

Man stole $122 million from Google and Facebook by sending them random bills, which they dutifully paid | Cory Doctorow

In this week’s news and notes from the Social Capital family, we’re going to do a bit of a deep dive into one company in particular that I’ve heard a lot of questions about recently — Datacoral. People who actually understand what they do are off-the-hook excited about what they’re doing, but a lot of other non-technical people or non-data people don’t really get it. So we’re going to take a little dive into learning what it is that they do, and why it’s such a gargantuan challenge they’re poised to overcome.

Imagine, for a minute, a world where there was no Excel. All of the individual things that you can do in Excel still exist; of course: math still exists, the idea of spreadsheets still exists as a holdout from the analog years; tables still exist; data still exists. All of these things exist in their own individual forms; there’s just no Excel.

In this Excel-less world, let’s say you run a small business; or maybe a big business. A lot of your day job consists of things like: 1) making lots of lists of numbers and names and categories of things, and 2) applying lots of rules to those lists of things, in a way that makes some sort of business logic sense to you. You probably do this on a continuous basis. Every time you want to build a table to keep track of an inventory item, or maybe use a formula to calculate a risk or profit, or automate some sort of recurring number crunching, you’d need to figure out how to build a solution that does this for you. So you’d look at what tools you have available, and in each case, you’d try to do the best you can to select the right tool for the job. Seems pretty reasonable!

The problem is, before too long, the layers and layers of tools and processes and artifacts you’re using begin to accumulate complexity. You start to collect a “rolling hairball” of data that gets progressively more tangled and more horrible to deal with. The answer to any given problem you have will likely present itself in the form of “build a new tool” or “hire more people” or “increase your budget”. This works, but it gives you you diminishing returns and increasing frustration every time. Now what would happen if I showed you Excel? There’s a decent chance you might not immediately appreciate how important this is for you. Sure, you could probably spot fairly quickly some ways that you could use Excel to fix some of your problems. But what’s less obvious isn’t what Excel fixes for you; it’s what Excel enables you to do.

The difference between before-Excel versus after-Excel is that beforehand, every time we wanted to use data, we had to engineer a solution for it. (We even have a name for this: “Data Engineering”. It is as painful as it sounds.) Data shouldn’t be something you have to engineer; data should be something you can program. This is much more than a semantic difference. Data engineering is reductive, laborious and rigid; it takes specialists; it’s thick but brittle, and engineers fight a continuous battle against time, complexity and chaos. Data programming, on the other hand, is lightweight, exploratory, and optimistic. It lets generalists try new things, in a way that just works, and lets them explore new adjacent possibilities and new opportunities that can’t be uncovered by data engineers: the costs are just too high.

Excel is a perfect illustration of how Data Programming can be made easy: once you understand what a sheet is (which can be many different things, but they’re all sheets) and how to use Excel Syntax (which can do lots of different things, but all ultimately operate on and within your sheets), then you have a flexible, powerful, abstract and user-friendly environment where you can try so many new things and get so much more out of your data than you ever could before. This is why people sometimes say “The most popular programming language in the world is Excel.”

So why are we talking about this? If you’re a large company, especially a large software company, you actually face a pretty similar problem to our small business owner with no Excel — except at titanically larger scale. The Data Hairball problem is a very real problem. There are an enormous number of different ways that we collect, analyze, and manage our data: ingestion tools and services, job orchestration systems, data warehouses, query engines, and all kinds of processes and services we use to make our data work for us. We tend to pick the best one for the job each time to the best of our ability; this usually makes sense locally, but it demands that the user have a very high level of expertise from there on out. The more engineered your data stack, the fewer people will be able to use it effectively.

Why can’t we simply train more people on how the data stack works? The problem is that data in a modern business doesn’t exist as stocks of data, like a table in a database that we can easily picture; they’re much more commonly found as flows of data, which are a much bigger challenge to work with. Flows of data moving through a business are coded as data pipelines that require a very deep technical background to even begin to understand. As the business logic and the everyday demands of the user change, these pipelines can become very brittle and threaten to rupture — which, again, makes technical proficiency in even greater demand, and restricts who can effectively make use of the data inside the business.

To truly fix this problem, we need to rearrange the way that data works inside businesses. We need to do it in a way that makes working with data easier, more powerful, more flexible, and more opportunistic. We need less data engineering, and more data programming. What we need is Datacoral.

Datacoral: innovations to unleash the power of data | Raghu Murthy

The best way to understand how revolutionary Datacoral can be is to go back to our analogy about Excel. Excel is a powerful tool, but it ultimately boils down to those two important things that everybody understands how to use: sheets and syntax. We need to recreate the equivalent of what Excel did for the ordinary data-cruncher, but for huge internet companies with massively complicated data challenges that are typically arranged in terms of flows rather than in stocks. Excel gave us sheets and syntax; Datacoral gives us slices and a new kind of coding language: Data Programming Language, or DPL. It’s an SQL-like language that lets data programmers easily manipulate and process streams of data without having to understand or ever put at risk the underlying data flows, or infrastructure; just like how messing with an Excel spreadsheet doesn’t put you at risk of breaking Excel itself.

Data Programming Language, like a supercharged version of SQL, lets data professionals manage and manipulate end-to-end data flows without having to understand the underlying systems. The key to making it all work are the standardized microservice functions called Slices that Datacoral has now built out for all sorts of functions and integrations you can think of. There are three general types of slices: “Collect” slices (make data available), “Organize” slices (consistently transform data into any way you want, without breaking anything) and “Harness” slices (publish data to production databases or third party applications that anyone can use). For companies who are beginning to set up their business logic and have to come to terms with the role that data will play in their day-to-day options, being able to design around Datacoral is like being able to set up your business operations with an understanding that Excel will be available to you. You just design it smarter from day one.

Where did this come from? As you may know, Datacoral’s founder and CEO Raghu Murthy was head of data infrastructure at Facebook — one of the ultimate example of businesses that needed to learn how to handle titanic flows of data at an unprecedented scale, and with no road map to consult. Facebook isn’t the type of business that accumulates data in any sort of discrete or predictable manner: it comes in a torrent, and the infrastructure engineers who’re building its supporting technology can’t afford to mess it up. Raghu and his team, having built these systems at a scale hardly anyone in the world will ever need to grow beyond, have figured out how to do this. So why not make it available to everyone, for trivially easy effort?

With their function-oriented, event-driven mindset around data, Datacoral is a perfect candidate to piggyback on the wave of businesses starting out or making the switch to serverless computing. This, too, is more than just a semantic or technical detail: it goes to the very core about how people think about what it is that businesses do, how they run, and what “operations” means in a world where capital and equipment is increasingly rented by the hour rather than purchased and amortized. For many of these new business owners and entrepreneurs, we’re leaving the world of managing stocks of things almost entirely behind, and entering a new world where managing flows is the most important thing to master. This is Datacoral’s world. And if that customer sounds like you, then get in touch with Datacoral right now; you won’t regret it.

Data shouldn’t be something you have to engineer. It should be something you program. Now, with Datacoral, you can. If you think that’s you, sign up for a demo here, or email at hello@datacoral.co. If you’re interested in joining the Datacoral team, they’re looking for a few select strong engineers to join the team — head here to find out more.

Have a great week,

Alex & the team from Social Capital


Clocks and Watchtowers: March 31, 2019 Snippets was originally published in Social Capital on Medium, where people are continuing the conversation by highlighting and responding to this story.