Aug 24 2016
A team of researchers from Princeton University have developed a new computer chip that has the potential to increase performance of data centers that form the heart of online services, from email to social media.
Data centers can be defined as large warehouses filled with computer servers. They enable cloud-based services, such as Facebook and Gmail, and store the voluminous content available via the Internet.
It is amazing that the computer chips at the core of the largest servers that route and process data rarely differ much from the chips found in smaller servers or daily use personal computers.
The Princeton team state that by designing their chip exclusively for large computing systems, they can significantly increase the processing speed while cutting energy requirements.
The design of the chip is scalable - meaning that they can be transformed to go from 12 processing units (called cores) to several thousand. Also, their structure enables numerous chips to be linked together into one system holding millions of cores.
Termed Piton, after the metal spikes used by rock climbers to help them climb mountainsides.
With Piton, we really sat down and rethought computer architecture in order to build a chip specifically for data centers and the cloud. The chip we've made is among the largest chips ever built in academia and it shows how servers could run far more efficiently and cheaply.
David Wentzlaff, Assistant Professor, Princeton University
Details regarding the Piton project will be presented by Wentzlaff's graduate student, Michael McKeown at Hot Chips, a symposium on high performance chips in Cupertino, California on Tuesday, August 23.
The launch of the chip was possible after years of effort put in by Wentzlaff and his students. Mohammad Shahrad, a graduate student in Wentzlaff's Princeton Parallel Group stated that developing "a physical piece of hardware in an academic setting is a rare and very special opportunity for computer architects."
Since its inception in 2013, several Princeton researchers have been involved in the project - Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Jonathan Balkind, Alexey Lavrov, Matthew Matl, Xiaohua Liang, and Samuel Payne, who is now at NVIDIA.
The Piton chip was manufactured for the team by IBM. Principal financial assistance for the project came from the National Science Foundation, the Defense Advanced Research Projects Agency, and the Air Force Office of Scientific Research.
The Piton chip’s current version measures 6 x 6 mm. The chip has more than 460 million transistors, each of which measure 32 nm – so tiny that they can only be viewed using an electron microscope.
The bulk of these transistors are held in 25 cores, the independent processors that perform the instructions in a computer program.
Most personal computer chips only have four or eight cores. Usually, more cores would mean quicker processing times, as long as software can utilize the hardware's existing cores to operate functions in parallel.
Computer manufacturers are looking to multi-core chips to enable them to gather further benefits from conventional methods to computer hardware.
In the last few years, academic institutions and companies have manufactured chips with many dozens of cores; but Wentzlaff said the easily scalable architecture of Piton can enable thousands of cores on a one chip with half a billion cores in the data center.
What we have with Piton is really a prototype for future commercial server systems that could take advantage of a tremendous number of cores to speed up processing.
David Wentzlaff, Assistant Professor, Princeton University
The design of the Piton chip focuses on the use of commonality among programs operating concurrently on the same chip. To perform this, one technique is execution drafting.
It functions similarly to the drafting in bicycle racing, when cyclists conserve energy by following closely behind a lead rider who cuts through the air, producing a slipstream.
At a data center, many users frequently run programs that rely up on similar operations at the processor level. The cores in the Piton chip can identify these instances and carry out identical instructions successively, so that they flow in a sequence, like a line of drafting cyclists.
This would help to boost energy efficiency by approximately 20% compared to a typical core, the researchers said.
A second innovation integrated into the Piton chip parcels out when opposing programs contact computer memory that exists off of the chip.
The innovation is called memory traffic shaper, and it acts like a traffic cop at a busy intersection, taking into consideration each programs' requirements and modifying memory requests and waving them through correctly so they do not block the system. This method can provide an increased performance of 18% compared to traditional allocation.
The Piton chip also acquires efficiency by its management of memory stored within the chip. This memory, referred to as the cache memory, is the quickest in the computer and used for commonly accessed data.
In most designs, cache memory is shared across the cores of the chip. However that plan can backfire when numerous cores access and alter the cache memory.
Piton avoids this problem by allocating areas of the cache and specific cores to specific applications. The team states that the system can boost efficiency by 29% when applied to a 1,024-core structure.
They predict that this saving would increase as the system is assigned across millions of cores in a data center.
The researchers said these enhancements could be applied while maintaining costs in line with existing manufacturing standards. To accelerate additional developments leveraging and widening the Piton architecture, the Princeton team have made its innovation open source, and therefore available to the public and fellow researchers at the OpenPiton website.
We're very pleased with all that we've achieved with Piton in an academic setting, where there are far fewer resources than at large, commercial chipmakers. We're also happy to give out our design to the world as open source, which has long been commonplace for software, but is almost never done for hardware.
David Wentzlaff, Assistant Professor, Princeton University