PORTLAND, Ore.—Today, direct-write cache memories are the mainstay of microprocessors, since they lower memory latency in a manner transparent to application programs. However, designers of advanced processors have advocated a switch to software-managed scratchpads and message-passing techniques for next-generation multi-core processors, such as the Cell Broadband Engine Architecture developed by IBM, Toshiba and Sony, which is used for the PlayStation 3.
Unfortunately, software-managed scratchpads and message-passing techniques put an additional burden on application programmers and in that sense mark a step backwards in microprocessor evolution. Now Semiconductor Research Corp. (SRC) claims to have solved the scaling problem for next-generation processors with up to 512 cores, by using hierarchical hardware coherence that remains transparent to application programs as the natural evolution of today's multi-level caches.
"Designers are worrying about latency for future multi-core microprocessors, advocating a move to software coherence using scratchpad memories and message passing," said professor Dan Sorin at Duke University, principle researcher on the project."But that would require the programmer to manage data movement, which is not the way the industry should go."
Instead Sorin's SRC-funded study, performed in cooperation with professor Milo Martin from the University of Pennsylvania and professor Mark Hill from the University of Wisconsin, proposes a hierarchical hardware coherence technique, that the researchers claim scales as the square root of the number of cores, adding as little as two percent latency for processors with as many as 512 cores. Likewise, traffic, storage and energy consumption all grow very slowly as cores are added, allowing future processors to continue using direct-write caches with hardware coherence that is transparent to application programs.
"These results will change the direction of computer architecture, by assuring designers that cache coherence will not hit the wall," said David Yeh director of integrated circuit and systems sciences at SRC (Research Triangle, N.C.) "We now know there are ways around the wall. Designers can stop worrying. All the right techniques are available today—you don't need new tricks to be invented, but just need to wisely using the technologies that are already available."
In particular, current direct-write hardware coherence schemes can be evolved to keep traffic, storage, latency, and energy under control as processors scale to more and more cores by using a synergistic combination of shared caches augmented with hierarchical directories and explicit cache eviction notifications. Thus, according to SRC, the roadmap to future massively parallel multi-core processors is clear and unobstructed. Details will be shared in an upcoming issue of the Transactions of the Association of Computing Machinery (ACM).
Single level flat-directory caches (blue) incur unacceptable latency when scaling past 32 cores, but two- (red) and three-level (green) caches with hierarchical directories can scale to 512 cores with only two- to four percent latency.