CUPERTINO, Calif. – Advanced Micro Devices will describe Jaguar, a low-power x86 core for notebooks, tablets and embedded systems at
Hot Chips here. Jaguar packs four x86 cores into one unit with a large shared L2 cache to compete both with Intel’s Core and Atom chips.
In a separate keynote talk, AMD will announce a follow-on for its HyperTransport processor interconnect. Freedom Fabric aims to link thousands of cores at more than a terabit/second, likely based on technology acquired from SeaMicro.
AMD is expected to try to make Freedom Fabric an industry standard across x86, graphics and ARM cores, competing with the proprietary Quick Path Interconnect on Intel’s CPUs.
Last week, the RapidIO Trade Association said it is trying to get ARM and its SoC partners to adopt its technology as a processor interconnect.
As for the Jaguar core, AMD predicts that based on simulations it will deliver more than ten percent higher frequencies and more than 15 percent more instructions per clock than Bobcat, its current low power x86 core. Jaguar will appear in 2013 in AMD’s Kabini SoC for low-power notebooks and in Temash, AMD’s first sub-5W SoC, aimed at tablets.
The chip sports a re-designed load/store unit and an expanded 128-bit floating point unit. It includes several new instructions to support AES encryption, accelerate media processing and switch big/little endian structures for embedded systems. But the most novel aspect of the new core is its use of four x86 cores in a single unit sharing one L2 cache.
“From a core perspective we will call this a four-core unit that forms the building block of an SoC design,” said Jeff Rupley, an AMD Fellow and chief architect of Jaguar. “It’s possible to fuse off some cores for lower end or lower power designs,” he said.
AMD found sharing one 1-2 Mbyte L2 cache among the cores saves silicon area over using four private caches. It also provides a performance boost when only one or two single-threaded cores are running and can then access a larger memory pool.
“Generally the larger cache outweighs the latency” of needing an L2 cache interface, Rupley said. “There could be an app where the latency increase defeats the capacity boost, but across a large swath of apps, there’s a pretty positive uplift,” he said.
One down side to the approach is that all four cores must run at the same dynamic data rate. That means the unit may burn excess power if one tasks needs a high frequency and other simultaneous jobs do not. The cores also share one bus interface to a memory controller.
On a positive note, AMD enhanced the design so that individual cores can more rapidly enter and exit deep sleep state. In addition the L2 data cache is only clocked when an outstanding transaction needs access to the data.