Cache Improvements

The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.

Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate. 
 
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
 
 
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power. 
 
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense. 
 

Looking Forward: High Density Libraries

 
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
 
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:

The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
 
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
 
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
 

Final Words

 
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
 
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
Front End & Execution Improvements
Comments Locked

126 Comments

View All Comments

  • CeriseCogburn - Friday, October 12, 2012 - link

    " AMD will catch up with Intel sooner than most of you thought "
    LOL
    Most think never, so how much sooner is sooner than never ?
  • aesthetics84 - Sunday, May 26, 2013 - link

    So.... PS4 and Xbox One are both releasing with 8 core AMD cpus, "Never" is looking pretty damn close on that horizon, eh bud? Intel fanboys like you are about to be throwing more money away to try and keep up. Be sure to put that Haswell, you'll inevitably get, under water and fill the res with your sweet, sweet fanboy tears.
  • phdchristmas - Sunday, September 9, 2012 - link

    Extreme editions are priced that high because they are the bleeding edge that pave the way for the next generation of chips. Funding for continued research on producing a high production chip of its kind.
  • rarson - Tuesday, September 18, 2012 - link

    No, they're priced that high because demand for them is low. Supply and demand.
  • CeriseCogburn - Friday, October 12, 2012 - link

    Wrong again rarson. Demand for any item can be very high, and drive the lacking supply price higher and higher.
    In this case they are priced high because there is sufficient demand to sustain that top tier price. If the demand was low, the price would drop, YOU AMD FANBOY brainfarter.

    ( do you feel better your self installed idiot version of economics "proved" to you internally that demand for the big top Intel chip is very low ?)
    LOL - so sad.....the emotions of a fanboy farting out uncontrollably, econ dumb oh one we'll call it.

    I'm beginning to understand why you amd freaks have twisted penny pinching frustrated price obsessions, you haven't a clue about the very basics, but your mind is very willing to arrogantly and in error, attempt to "correct others" with amd fanboyism as the leading call in the emotionally fulfilling statements you offer.

    It is like a crazy girl having her period and blurting out her out of control emotions. LOL
    No wonder I told giraradou or whatever miss sensitive's name is to take the midol.
  • rarson - Tuesday, September 18, 2012 - link

    Regardless of what prices are now, they'd be even better with better competition from AMD. It's called "economics."
  • CeriseCogburn - Friday, October 12, 2012 - link

    In this case it's called " amd fanboy fantasy "
  • rocketbuddha - Tuesday, August 28, 2012 - link

    Anand was that a typo or really AMD is going to use TSMC 28nm to manufacture Steamroller based APUs?
  • Paedric - Tuesday, August 28, 2012 - link

    They're supposed to switch from GF to TSMC sometimes soon.
    I guess that's the when, if it hasn't happened already.
  • Anand Lal Shimpi - Tuesday, August 28, 2012 - link

    Er that's my mistake, GF 28nm is correct. Fixed :)

    Take care,
    Anand

Log in

Don't have an account? Sign up now