The Crazy Security Behind the Birth of Zcash, the Inside Story - IEEE Spectrum

2022-06-24 23:59:33 By : Mr. Terry Huang

IEEE websites place cookies on your device to give you the best user experience. By using our websites, you agree to the placement of these cookies. To learn more, read our Privacy Policy.

“How would you feel about donating your phone to science?”

When Zooko Wilcox posed this question to me in October, what I heard was: Can I take your phone and hand it over to a hacker to riffle through its contents and sniff all over your data like a pervert who’s just opened the top drawer of a lady’s dresser?

At least, that’s how it felt.

“I think I’d rather donate my body,” I said.

What Wilcox really wanted to do with my phone was to run forensic analysis on it in the hopes of determining whether someone was using it to spy on us. Wilcox is the CEO of a company called Zcash which designed and recently launched a new privacy-preserving digital currency of the same name. On the weekend he asked for my phone we were both sitting with a two-man documentary film crew in a hotel room stuffed with computer equipment and surveillance cameras.

A secret ceremony was underway. Before the company could release the source code of its digital currency and turn the crank on the engine, a series of cryptographic computations needed to be completed and added to the protocol. But for complex reasons, Wilcox had to prevent the calculations from ever being seen. If they were, it could completely compromise the security of the currency he had built.

Over the course of the two-day event, everything went pretty much as planned. Everyone and everything did just what they were supposed to do, except for my cellphone, which in the middle of the event exhibited behaviors that made no sense at all and which planted suspicions that it had been used in a targeted attack against the currency.

The story of Zcash has already been roughly sketched by me and others. The currency launched 28 October onto the high seas of the cryptocurrency ecosystem with a strong wind of hype pushing violently at its sails. On the first morning that Zcash existed, it was trading on cryptocurrency exchanges for over US $4000 per coin. By the next day, the first round of frenzied feeding had subsided and the price was already below $1000. Now, a month later, you’ll be lucky if you can get $100 for a single Zcash coin. Even in the bubble-and-burst landscape of cryptocurrency trading, these fluctuations are completely insane.

Some hype was certainly warranted. The vast majority of digital currencies out there are cheap Bitcoin imitations. But the same cannot be said of Zcash. The project, which was three years in the making and which combines the cutting edge research of cryptographers and computer scientists at multiple top universities, confronts Bitcoin’s privacy problems head on, introducing an optional layer of encryption that veils the identifying marks of a transaction: who sent it, how much was sent, who received it. In Bitcoin, all of this data is out in the public for anyone to see.

However, with digital currencies, everything is a trade-off, and the improvement in privacy that Zcash brings comes with a risk, one that has gotten much less attention since the currency launched. Obscuring data on the blockchain inevitably complicates the process of verifying the validity of transactions, which in Bitcoin is a simple matter of tracking coins on a public ledger. In Zcash, verifying transactions requires some seriously experimental computation, mathematical proofs called zk-SNARKS that are so hot-off-the-presses that they’ve never been used anywhere else. In order to set up the zk-SNARKS in the Zcash protocol, a human being must create a pair of mathematically linked cryptographic keys. One of the keys is essential to ensuring the proper functioning of the currency, while the other one—and here’s the big risk—can be used to counterfeit new coins.

If it’s not immediately clear how this works, you’re in good company. The number of people who really understand zk-SNARKs, and therefore the Zcash protocol, is probably small enough that you could feed them all with one Thanksgiving turkey. The important thing to get is that, given the current state of cryptographic research, it’s impossible to create a private, reliable version of Zcash without also simultaneously creating the tools for plundering it. Let’s call those tools the bad key.

Prior to launching Zcash, the developers who invented it had to create the bad key, use it to make a set of mathematical parameters for the zk-SNARKS (the good key), then dispose of the bad key before any nefarious individual could get hold of it. And they had to do it all in a way that was both secret enough to be secure yet public enough that anyone who wanted to use Zcash felt well-assured of the technology’s integrity.

The Zcash developers, whose work is funded by over $2 million raised from private investors in the Zcash Company, chose a strategy that relied heavily on the secrecy part of this equation. Nearly everything about the ceremony—where and when it would be held, who would be involved, what software would be used—was kept from the public until a blog post about it was published this afternoon.

Instead of building real-time transparency into the ceremony design, the Zcash team opted to meticulously document the event and save all artifacts that remained after the bad key was destroyed. This evidence is now available for analysis to prove the process went as it was described.

As an extra measure, they decided to invite a journalist to bear witness—me.

Two weeks before the ceremony, I got a vague invite on Signal, an encrypted messaging app, from Wilcox without any specifics about what to expect. A week later he told me where I would have to go. And a week after that—two days before the ceremony—I was told when to arrive. On 21 October, I walked into a coffee shop in Boulder Colorado where I met up with Wilcox and a documentary filmmaker who had been hired to get the whole thing on tape. From there we headed to a computer shop in Denver to buy a bunch of equipment and then returned to a hotel in Boulder, where I stayed for the next three days.

The headquarters in Boulder was one of five “immobile” stations, all of which were participating in the ceremony from different cities across the planet. One mobile station was doing its part while making a mad dash across British Columbia. The generation of the keys was decentralized such that each station would only be responsible for creating a fragment of the bad key. For the ceremony, a cryptographic algorithm was custom designed that created a full version of the zk-SNARK parameters while keeping the pieces of the bad key segregated, a process that took two days of relaying data back and forth among the six stations. 

I’ll hazard an analogy in order to explain more generally how this works: Let’s say you have a recipe and you want to use it to make a single cake that is going to feed everyone in the world and that’s the only cake that anyone is allowed to eat, ever. You have to have a recipe to bake the cake, but you also have to make sure no one can ever make it again. So you split the recipe up into six parts and you design a baking process that allows each participant to add their ingredients and mix them into the batter without the others (or anyone else) seeing what they’re up to. After pulling the cake out of the oven, you burn all the pieces of the recipe.

In this analogy, the recipe is the bad key; the cake is the zk-SNARK parameters; and the person hiding the ingredients and doing all of the mixing is a cryptographic algorithm.

Zooko Wilcox,  Zcash CEO, with a DVD containing part of a record of the cryptographic ceremony.Photo: Morgen Peck

The way this looks in practice is that each station has a computer storing a fragment of the secret. That computer can’t connect to the Internet, has been stripped of its hard drive, and runs off a custom-built operating system. The secret never moves off the computer but it is used in a series of calculations that are then copied to write-once DVDs and carried to separate, networked computer that shares the results with the rest of the stations. Each station builds off the results of the station before it in a computational round robin until the process is complete and the software finally spits out a product.

The benefit of dividing up the work in this way is that no one participant can compromise the ceremony. Each fragment of the bad key is worthless unless it is combined with all the others. It cannot even be brought into existence unless all members of the ceremony collude or an attacker successfully compromises all six of the participating stations.

As an observer, there was very little I could do to verify the security of the events as they unfolded in front of me. I don’t have the advanced cryptography coursework that would be necessary to audit the software that Wilcox and the other station operators were running. And even if I did, the code had not yet been made available for public review. My role, as I saw it, was simply to be present and make sure the people involved did all the things that they would later tell people they did. I can bear witness to the fact that the computer storing the key fragment was bought new, that the wireless card and hard drive were removed, that while I was watching no attacker sneaked into the hotel room to mess with the equipment, that all of the DVDs were correctly labeled, and that the RAM chips that stored the key fragment were smashed and burned in a fire pit after the ceremony.

I can testify that nothing strange happened. Until it did.

During the ceremony most of the station operators were talking with each other on a Google Hangout. On the evening of the first day, after getting up from a bit of a rest, Wilcox wandered over to the laptop that was running the Google Hangout and began chatting with Peter Van Valkenburgh, a station operator located in Washington D.C. We noticed an echo of the audio coming from across the room and started looking for its source.

The whole place was filled with gadgets. Four security cameras had been hoisted onto poles and aimed at the offline computer to provide 24 hour surveillance in the event of a ninja attack. Another digital camera on a tripod was capturing a wide angle shot of the room. Both Wilcox and I were geared up with wireless mics. And another mic was secured to the laptop running the Google Hangout.

I went over to a monitor that was set up to display the security footage between the two hotel beds, and at first I thought that was it. Then I looked down at one of the beds and saw my phone lying there, When I picked it up I immediately realized that the audio was blaring out of the speaker.  

 “Morgen, why is your phone playing the audio from our Google Hangout?” asked Wilcox, bemused, curious, and slightly alarmed.

Why indeed. It was especially strange because I had not knowingly connected to the Google Hangout at all during the ceremony. Furthermore, footage of Wilcox’s computer screen shows that I wasn’t listed as a participant.

So, how was my phone accessing the audio?

Without wasting any time, Wilcox began experimenting. While continuing to talk to Van Valkenburgh, he muted the microphone on his Google Hangout session and then turned it back on. When he did that, my phone only picked up Van Valkenburgh’s audio.

Stranger still, when Wilcox re-enabled his hangout microphone, his voice came through my phone with a slight lag—maybe 100-200 milliseconds—indicating that my phone was picking it up from somewhere outside the room, perhaps from a Google Hangout server.

We could hear Van Valkenburgh a bit too well.Photo: Morgen Peck

Just as we started to examine my phone, looking at the programs that were running and a few suspicious text messages that I had received a couple days before the ceremony, the echo abruptly stopped. We quickly put it into airplane mode hoping to preserve whatever evidence remained.

After much negotiating, I surrendered my phone (an archaic Android that was ripe for the hacking) to Wilcox. He has since passed it off to a hacker in San Francisco. Those efforts have produced no evidence about what caused my phone to turn on me, and it’s now on its way to a professional security firm for further analysis.

Unless we find evidence of malware on my phone, the question of how it may have impacted the ceremony is completely hypothetical. Assuming my phone was hacked, who would want to break into the Zcash ceremony? And if an attacker did have full control over my phone, which was powered on and present until the moment it started misbehaving, what could that person do with it?

For answers, I traveled up to Columbia University to the lab of Eran Tromer, a computer scientist at the Zcash company who co-invented its cryptographic protocol. Tromer is at Columbia for a year as a visiting researcher, but his home base is the Tel Aviv University School of Computer Science where he is a member of the faculty and the director of the Laboratory for Experimental Information Security (LEISec) at the Checkpoint Institute for Information Security.

A big part of Tromer’s work at LEISec involves investigating side channel attacks. The idea behind side channel attacks is that you don’t have to have direct access to a computer’s data in order to spy on it. Often, you can piece together some idea of what a computer is doing by examining what’s going on with the physical components. What frequencies are humming across the metal capacitors in a laptop? How much power is it pulling from the wall? How is the voltage fluctuating? The patterns in these signals can leak information about a software program’s operation, which, when you’re running a program that you want to keep secret, can be a problem.

“My research is about what happens to good, sound, cryptographic schemes when they reach the real world and are implemented on computing platforms that are faulty and leaky at the levels of software and hardware,” says Tromer.

In his lab at Columbia, Tromer opened his laptop and ran a demonstration program that executes several different computations in a loop. He told me to put my ear down close to where the fan was blowing out hot air from the computer’s innards. I leaned over, listened carefully and heard the computer whine ever so slightly over and over.

Eran Tromer, an expert in side-channel attacks.Photo: Amit Shaal

“What you’re hearing is a capacitor in the power supply, striving to maintain constant voltage to the CPU. Different computations done on the CPU have different power draw, which changes the mechanical forces on the capacitor plates. This causes vibration, which in turn are transmitted by the air as sound waves that we can capture from afar,” he says.

Tromer started investigating this phenomenon, called “coil whine,” for himself about ten years ago. “I was in a quiet hotel room at a conference. I was working on my laptop and it was making these annoying whining noises whenever I ran some computation. And I thought, let’s see what happens if the computation is actually cryptographic calculation involving a secret key, and how the key affects the emitted noise.”

Tromer and his colleagues spent the next decade trying to use acoustic leakage from computer hardware components to spy on cryptographic algorithms. In 2014, they demonstrated a successful attack in which they were able to steal a decryption key from a laptop by recording and analyzing the sounds it made as it ran RSA decryption software. With a high tech parabolic microphone, they were able to steal the secret from ten meters away. They were even able to pull off the same attack using the internal microphone on a mobile phone, provided that the device was snuggled up close to the computer.

However, for various reasons Tromer doesn’t think anyone could have used the same strategy with my phone. For one thing, the coil whine in modern computers occurs at higher frequencies than the one he demonstrated—in a range that is typically outside what a mobile phone, which is designed for the lower frequencies of the human voice, can detect.  

“It seems extremely unlikely that there would be exploitable signals that can be captured by a commodity phone, placed in a random orientation several feet away from a modern computer,” he says. “It is not completely unthinkable. There might be some extremely lucky combination. But it would be a very long shot, and at a high risk of detection, for an adversary to even try this, especially since the ceremony setup gave them very little time to tailor attacks to the specific hardware and software setting.”

Moreover, the attacks that Tromer has demonstrated are not passive. In order to collect a useful signal, you have to amplify it by sending challenges to the software that you are attacking. The challenges force the software to repeat computations. In order to do this, you have to know and have studied the code that the computer is running.

The software that was running during the Zcash key generation ceremony was all custom built specifically for that occasion and was intentionally kept from the public until the ceremony was over. The choice to do this was controversial and the approach strays from that of other similar ceremonies. (For example, the DNSSEC ceremony, which generates the digital signatures that secure top level domain names, is done in a much more transparent ceremony that gets publicly audited in real time.)

Before flying to Colorado, I contacted Bryan Ford, a computer science professor who directs the Decentralized and Distributed Systems Laboratory at the École Polytechnique Fédérale de Lausanne in Switzerland. He was troubled by the decision to keep the details of the Zcash ceremony secret. In a series of Twitter direct messages he told me:

“I understand the crypto principles that the parameter-generation is supposed to be based on well enough to know that nothing *should* need to be kept secret other than the critical secret parts of the parameter keys that eventually get combined to produce the final public parameters. If they think the ceremony needs to be kept secret, then...something’s wrong.”

By keeping the details of the ceremony software secret, the Zcash team limited their security audit to just a handful of people inside the company, but they may also have made it more difficult for an attacker to make the kinds of preparations that would be necessary to mount a successful side channel attack.

Even if someone did get a look at the source code in advance, Wilcox says it wouldn’t be the end of the world because secrecy was not the primary defense. According to him, one of the best aspects of the ceremony design was the use of multiple parties. It wouldn’t be enough to pull recordings off the computer in Colorado. An attacker would have to successfully record a side channel at each station. And because Wilcox left many of the security details up to the personal discretion of each station operator, the craftwork that would go into designing six unique side channel attacks would cost a huge amount in both time and money.

At one of the stations it may even have been impossible. Peter Todd, one of the ceremony participants, ran all of his computations on a laptop encased in a tin foil-lined cardboard box, while driving across Canada. He then burned his compute node to a crisp with a propane torch. “It was my goal to outdo every other station in Canadian cypherpunk glory,” says Todd, who also happens to be one of Zcash’s most outspoken critics.

If someone did attempt a side channel attack with the strategies Tromer has demonstrated in his lab, then there would likely be evidence of it in the trove of forensic artifacts that the ceremony produced. Among those items are all of the write-once DVDs that provide a record (authenticated by cryptographic hashes) of what computations were being relayed between the stations in the ceremony. Tromer’s techniques require direct interaction with the software and those manipulations would make their way onto the public record.

At no point did the incident with my phone stop the ceremony. Nor did Wilcox seem terribly concerned that it posed a serious threat. “We have super great security. I’m not worried about preventing some kind of attack. But I’m very interested in figuring it out, or experimenting, or extracting more evidence,” said Wilcox. “They’re very far from winning. So far from winning,”

And I’m curious too. Right now my phone is somewhere, I know not where, awaiting its strip down. Even if it wasn’t used to topple a privacy-guaranteeing digital currency—which, judging from everything I’ve learned, would have been a technological miracle—it’s still quite likely that someone was on it listening to me. Who? Why? For how long? If anything, this experience has deepened my respect for the people who are trying to make it easier to keep our private information private. And at the very least, I’ve learned a lesson: when you get invited to a super-secret cryptography ceremony, leave your phone at home.

This computer rendering depicts the pattern on a photonic chip that the author and his colleagues have devised for performing neural-network calculations using light.

Think of the many tasks to which computers are being applied that in the not-so-distant past required human intuition. Computers routinely identify objects in images, transcribe speech, translate between languages, diagnose medical conditions, play complex games, and drive cars.

The technique that has empowered these stunning developments is called deep learning, a term that refers to mathematical models known as artificial neural networks. Deep learning is a subfield of machine learning, a branch of computer science based on fitting complex models to data.

While machine learning has been around a long time, deep learning has taken on a life of its own lately. The reason for that has mostly to do with the increasing amounts of computing power that have become widely available—along with the burgeoning quantities of data that can be easily harvested and used to train neural networks.

The amount of computing power at people's fingertips started growing in leaps and bounds at the turn of the millennium, when graphical processing units (GPUs) began to be harnessed for nongraphical calculations, a trend that has become increasingly pervasive over the past decade. But the computing demands of deep learning have been rising even faster. This dynamic has spurred engineers to develop electronic hardware accelerators specifically targeted to deep learning, Google's Tensor Processing Unit (TPU) being a prime example.

Here, I will describe a very different approach to this problem—using optical processors to carry out neural-network calculations with photons instead of electrons. To understand how optics can serve here, you need to know a little bit about how computers currently carry out neural-network calculations. So bear with me as I outline what goes on under the hood.

Almost invariably, artificial neurons are constructed using special software running on digital electronic computers of some sort. That software provides a given neuron with multiple inputs and one output. The state of each neuron depends on the weighted sum of its inputs, to which a nonlinear function, called an activation function, is applied. The result, the output of this neuron, then becomes an input for various other neurons.

For computational efficiency, these neurons are grouped into layers, with neurons connected only to neurons in adjacent layers. The benefit of arranging things that way, as opposed to allowing connections between any two neurons, is that it allows certain mathematical tricks of linear algebra to be used to speed the calculations.

While they are not the whole story, these linear-algebra calculations are the most computationally demanding part of deep learning, particularly as the size of the network grows. This is true for both training (the process of determining what weights to apply to the inputs for each neuron) and for inference (when the neural network is providing the desired results).

What are these mysterious linear-algebra calculations? They aren't so complicated really. They involve operations on matrices, which are just rectangular arrays of numbers—spreadsheets if you will, minus the descriptive column headers you might find in a typical Excel file.

This is great news because modern computer hardware has been very well optimized for matrix operations, which were the bread and butter of high-performance computing long before deep learning became popular. The relevant matrix calculations for deep learning boil down to a large number of multiply-and-accumulate operations, whereby pairs of numbers are multiplied together and their products are added up.

Two beams whose electric fields are proportional to the numbers to be multiplied, x and y, impinge on a beam splitter (blue square). The beams leaving the beam splitter shine on photodetectors (ovals), which provide electrical signals proportional to these electric fields squared. Inverting one photodetector signal and adding it to the other then results in a signal proportional to the product of the two inputs.David Schneider

Over the years, deep learning has required an ever-growing number of these multiply-and-accumulate operations. Consider LeNet, a pioneering deep neural network, designed to do image classification. In 1998 it was shown to outperform other machine techniques for recognizing handwritten letters and numerals. But by 2012 AlexNet, a neural network that crunched through about 1,600 times as many multiply-and-accumulate operations as LeNet, was able to recognize thousands of different types of objects in images.

Advancing from LeNet's initial success to AlexNet required almost 11 doublings of computing performance. During the 14 years that took, Moore's law provided much of that increase. The challenge has been to keep this trend going now that Moore's law is running out of steam. The usual solution is simply to throw more computing resources—along with time, money, and energy—at the problem.

As a result, training today's large neural networks often has a significant environmental footprint. One 2019 study found, for example, that training a certain deep neural network for natural-language processing produced five times the CO2 emissions typically associated with driving an automobile over its lifetime.

Improvements in digital electronic computers allowed deep learning to blossom, to be sure. But that doesn't mean that the only way to carry out neural-network calculations is with such machines. Decades ago, when digital computers were still relatively primitive, some engineers tackled difficult calculations using analog computers instead. As digital electronics improved, those analog computers fell by the wayside. But it may be time to pursue that strategy once again, in particular when the analog computations can be done optically.

It has long been known that optical fibers can support much higher data rates than electrical wires. That's why all long-haul communication lines went optical, starting in the late 1970s. Since then, optical data links have replaced copper wires for shorter and shorter spans, all the way down to rack-to-rack communication in data centers. Optical data communication is faster and uses less power. Optical computing promises the same advantages.

But there is a big difference between communicating data and computing with it. And this is where analog optical approaches hit a roadblock. Conventional computers are based on transistors, which are highly nonlinear circuit elements—meaning that their outputs aren't just proportional to their inputs, at least when used for computing. Nonlinearity is what lets transistors switch on and off, allowing them to be fashioned into logic gates. This switching is easy to accomplish with electronics, for which nonlinearities are a dime a dozen. But photons follow Maxwell's equations, which are annoyingly linear, meaning that the output of an optical device is typically proportional to its inputs.

The trick is to use the linearity of optical devices to do the one thing that deep learning relies on most: linear algebra.

To illustrate how that can be done, I'll describe here a photonic device that, when coupled to some simple analog electronics, can multiply two matrices together. Such multiplication combines the rows of one matrix with the columns of the other. More precisely, it multiplies pairs of numbers from these rows and columns and adds their products together—the multiply-and-accumulate operations I described earlier. My MIT colleagues and I published a paper about how this could be done in 2019. We're working now to build such an optical matrix multiplier.

The basic computing unit in this device is an optical element called a beam splitter. Although its makeup is in fact more complicated, you can think of it as a half-silvered mirror set at a 45-degree angle. If you send a beam of light into it from the side, the beam splitter will allow half that light to pass straight through it, while the other half is reflected from the angled mirror, causing it to bounce off at 90 degrees from the incoming beam.

Now shine a second beam of light, perpendicular to the first, into this beam splitter so that it impinges on the other side of the angled mirror. Half of this second beam will similarly be transmitted and half reflected at 90 degrees. The two output beams will combine with the two outputs from the first beam. So this beam splitter has two inputs and two outputs.

To use this device for matrix multiplication, you generate two light beams with electric-field intensities that are proportional to the two numbers you want to multiply. Let's call these field intensities x and y. Shine those two beams into the beam splitter, which will combine these two beams. This particular beam splitter does that in a way that will produce two outputs whose electric fields have values of (x + y)/√2 and (x − y)/√2.

In addition to the beam splitter, this analog multiplier requires two simple electronic components—photodetectors—to measure the two output beams. They don't measure the electric field intensity of those beams, though. They measure the power of a beam, which is proportional to the square of its electric-field intensity.

Why is that relation important? To understand that requires some algebra—but nothing beyond what you learned in high school. Recall that when you square ( x + y)/√2 you get (x2 + 2xy + y2)/2. And when you square (x − y)/√2, you get (x2 − 2xy + y2)/2. Subtracting the latter from the former gives 2xy.

Pause now to contemplate the significance of this simple bit of math. It means that if you encode a number as a beam of light of a certain intensity and another number as a beam of another intensity, send them through such a beam splitter, measure the two outputs with photodetectors, and negate one of the resulting electrical signals before summing them together, you will have a signal proportional to the product of your two numbers.

Simulations of the integrated Mach-Zehnder interferometer found in Lightmatter's neural-network accelerator show three different conditions whereby light traveling in the two branches of the interferometer undergoes different relative phase shifts (0 degrees in a, 45 degrees in b, and 90 degrees in c).Lightmatter

My description has made it sound as though each of these light beams must be held steady. In fact, you can briefly pulse the light in the two input beams and measure the output pulse. Better yet, you can feed the output signal into a capacitor, which will then accumulate charge for as long as the pulse lasts. Then you can pulse the inputs again for the same duration, this time encoding two new numbers to be multiplied together. Their product adds some more charge to the capacitor. You can repeat this process as many times as you like, each time carrying out another multiply-and-accumulate operation.

Using pulsed light in this way allows you to perform many such operations in rapid-fire sequence. The most energy-intensive part of all this is reading the voltage on that capacitor, which requires an analog-to-digital converter. But you don't have to do that after each pulse—you can wait until the end of a sequence of, say, N pulses. That means that the device can perform N multiply-and-accumulate operations using the same amount of energy to read the answer whether N is small or large. Here, N corresponds to the number of neurons per layer in your neural network, which can easily number in the thousands. So this strategy uses very little energy.

Sometimes you can save energy on the input side of things, too. That's because the same value is often used as an input to multiple neurons. Rather than that number being converted into light multiple times—consuming energy each time—it can be transformed just once, and the light beam that is created can be split into many channels. In this way, the energy cost of input conversion is amortized over many operations.

Splitting one beam into many channels requires nothing more complicated than a lens, but lenses can be tricky to put onto a chip. So the device we are developing to perform neural-network calculations optically may well end up being a hybrid that combines highly integrated photonic chips with separate optical elements.

I've outlined here the strategy my colleagues and I have been pursuing, but there are other ways to skin an optical cat. Another promising scheme is based on something called a Mach-Zehnder interferometer, which combines two beam splitters and two fully reflecting mirrors. It, too, can be used to carry out matrix multiplication optically. Two MIT-based startups, Lightmatter and Lightelligence, are developing optical neural-network accelerators based on this approach. Lightmatter has already built a prototype that uses an optical chip it has fabricated. And the company expects to begin selling an optical accelerator board that uses that chip later this year.

Another startup using optics for computing is Optalysis, which hopes to revive a rather old concept. One of the first uses of optical computing back in the 1960s was for the processing of synthetic-aperture radar data. A key part of the challenge was to apply to the measured data a mathematical operation called the Fourier transform. Digital computers of the time struggled with such things. Even now, applying the Fourier transform to large amounts of data can be computationally intensive. But a Fourier transform can be carried out optically with nothing more complicated than a lens, which for some years was how engineers processed synthetic-aperture data. Optalysis hopes to bring this approach up to date and apply it more widely.

There is also a company called Luminous, spun out of Princeton University, which is working to create spiking neural networks based on something it calls a laser neuron. Spiking neural networks more closely mimic how biological neural networks work and, like our own brains, are able to compute using very little energy. Luminous's hardware is still in the early phase of development, but the promise of combining two energy-saving approaches—spiking and optics—is quite exciting.

There are, of course, still many technical challenges to be overcome. One is to improve the accuracy and dynamic range of the analog optical calculations, which are nowhere near as good as what can be achieved with digital electronics. That's because these optical processors suffer from various sources of noise and because the digital-to-analog and analog-to-digital converters used to get the data in and out are of limited accuracy. Indeed, it's difficult to imagine an optical neural network operating with more than 8 to 10 bits of precision. While 8-bit electronic deep-learning hardware exists (the Google TPU is a good example), this industry demands higher precision, especially for neural-network training.

There is also the difficulty integrating optical components onto a chip. Because those components are tens of micrometers in size, they can't be packed nearly as tightly as transistors, so the required chip area adds up quickly. A 2017 demonstration of this approach by MIT researchers involved a chip that was 1.5 millimeters on a side. Even the biggest chips are no larger than several square centimeters, which places limits on the sizes of matrices that can be processed in parallel this way.

There are many additional questions on the computer-architecture side that photonics researchers tend to sweep under the rug. What's clear though is that, at least theoretically, photonics has the potential to accelerate deep learning by several orders of magnitude.

Based on the technology that's currently available for the various components (optical modulators, detectors, amplifiers, analog-to-digital converters), it's reasonable to think that the energy efficiency of neural-network calculations could be made 1,000 times better than today's electronic processors. Making more aggressive assumptions about emerging optical technology, that factor might be as large as a million. And because electronic processors are power-limited, these improvements in energy efficiency will likely translate into corresponding improvements in speed.

Many of the concepts in analog optical computing are decades old. Some even predate silicon computers. Schemes for optical matrix multiplication, and even for optical neural networks, were first demonstrated in the 1970s. But this approach didn't catch on. Will this time be different? Possibly, for three reasons.

First, deep learning is genuinely useful now, not just an academic curiosity. Second, we can't rely on Moore's Law alone to continue improving electronics. And finally, we have a new technology that was not available to earlier generations: integrated photonics. These factors suggest that optical neural networks will arrive for real this time—and the future of such computations may indeed be photonic.