Application performance demands are now outpacing Moore’s Law, and they’re certainly not slowing down. To keep up, we’re already beginning to rely on specialized processors like GPUs and DSPs for specific use cases. In the future, all data centers will depend on their ability to maximize multiple types of processors in the most efficient way possible. In other words, heterogeneous hardware is on the horizon.
We’re seeing early signs of this progression today; the resulting wave of innovation will be not unlike the floodgates that were opened with the introduction of the cloud.
In order to understand the opportunities in fields as vastly different as deep learning to personalized medicine, let me first start with a brief history of how we got to where we are today.
A decade ago, nearly all desktop computers had single-core CPU processors. This started to change around 2005 or 2006 as dual-core and eventually quad-core CPUs went mainstream. The next step was to add specialized processing capabilities to handle particular tasks, namely through the introduction of Graphics Processing Units (GPUs) and Digital Signal Processors (DSPs). By the end of 2010, nearly all desktop and laptop computers had some sort of integrated CPU-GPU combo.
CPUs can run the operating system and perform conventional serial tasks. GPUs, whose capabilities have been improving rapidly, not only add graphics capabilities but also can perform rapid mathematical computations on large data sets. Digital Signal Processors (DSPs) introduced entirely new capabilities to computer systems, including the ability to handle real-world audio, video, and measurements.
Even the smartphones we carry in our pockets have multiple processor types, each for carrying out various tasks.
Just a couple of years ago, data centers were only populated by CPUs. As data volumes continued to escalate, the only recourse for data centers and cloud environments was to keep scaling out by adding CPU-based server nodes.
Over the past couple years, we’ve started to see data centers and cloud-service providers adding GPU-based nodes for applications like rendering, media, and, most recently, deep learning. All the major cloud providers including Amazon, IBM, and Microsoft now offer GPU nodes in addition to CPU-based nodes; meanwhile, supercomputers are being built with a mix of CPUs, GPUs, and even DSPs. China is building a supercomputer with just DSPs.
Reduced Instruction Set Computer (RISC) processors like ARM processors, are finding their way into the marketplace as well. ARM processors are capable of performing many operations at high speed and at a fraction of the energy demand of conventional CPUs.
Field-programmable Gate Array (FPGA) chips, which were invented over two decades ago, can be configured for special purposes after being installed. The same chip can be configured to do speech recognition at one moment and image processing at another. They used to be employed strictly in embedded systems, but they’re starting to appear in computer systems, and they will greatly enhance the range of specialization in heterogeneous systems. Companies like Microsoft and Baidu are using FPGAs for their search infrastructure in their production systems. FPGAs are making a comeback, as Intel’s recent acquisition of FPGA company Altera, and IBM's partnership with Xilinx, proves.
On the extreme end, there are special-purpose processors or Application Specific Integrated Circuits (ASICs) targeting specific areas like deep learning, pattern matching, and bioinformatics; for example, take this neuromorphic chip from IBM or this ambitious startup BrianChip, inspired by the brain or the automata processor from Micron, which is essentially non-deterministic finite automatons (NFA) in hardware, or the bioinformatics chip from this startup Edico, or this neural network chip from Nervana. In a few years, we may even start seeing these in data centers and clouds.
As heterogeneous hardware architectures begin to take hold, we’ll see technologies and solutions that once required supercomputing capabilities now running on cloud-based machine instances. The processing capabilities of heterogeneous hardware will speed up the pace of innovation by lowering barriers to entry in compute-intensive fields like deep learning, homomorphic encryption, and genomics.
Artificial Intelligence (AI) and Deep Learning: True AI is closer than we think. Neural network architecture is getting more complex and sophisticated as it tries to mimic human speech and intelligence and we’re seeing deep learning being applied all around us – all the way from retail to finance, to manufacturing, to medicine (drug discovery, and personalized medicine). VC dollars are pouring into the industry, startups are emerging left and right, and more industries are adopting deep learning as competitive advantage. This explosive growth of deep learning will lead to more cloud services providers offering GPUs specific for deep learning applications.
Even beyond purpose-built GPUs, there’s the potential for other kinds of architectures altogether in deep learning. Chris Bishop, the managing director of Microsoft Research in Cambridge, predicts that in 2016 we’ll start to see “new silicon architectures that are tuned to the intensive workloads of machine learning, offering a major performance boost over GPUs.”
Homomorphic encryption: Fully homomorphic encryption (FHE) is the ability to allow computations to be run on top of encrypted data, without decrypting it. With FHE, any program could perform any operation on any encrypted input. This means that different companies with different services could share and interact with fully encrypted data. The challenge with the practical applicability of FHE is that it will take roughly a million times more time to do the same computations with FHE. There’s a need for new kinds of architectures altogether.
The heterogeneous cloud of the not-too-distant future will provide the processing capabilities needed to make FHE a viable security solution in the next decade.
Genomics: The field of genomics has created a revolution all the way from investigating the cause of diseases to treating cancer. Sequencing a genome has dropped from over a billion dollars to just under $1,000 in the past decade. As a result, genomic data is exploding far beyond what we think of as “big data” today. By 2025, between 100 million and 2 billion human genomes are expected to be sequenced, which means we’ll be faced with the challenge of processing exabytes of data (far beyond what Youtube or Twitter deals with). Processing of this kind of data needs computational capabilities beyond homogenous hardware. Architectures like FPGAs will likely play a bigger role in the near future.
As Disney’s Chris Launey put it more than a year ago at OSCON 2014: “If you have enough fast, you can make your own cheap. If you have fast, you can iterate to the best quality.”
He’s right. Speed matters. The faster we can speed up computing, the cheaper it will become, and the more accessible it will become to the companies – many of them yet to be founded – that will spark the next big breakthroughs in medicine, artificial intelligence, cybersecurity and more. Heterogeneous hardware architectures are the next frontier for this kind of innovation.
I, for one, can’t wait to see what we achieve.