AMD's Steamroller Detailed: 3rd Generation Bulldozer Coreby Anand Lal Shimpi on August 28, 2012 4:39 PM EST
- Posted in
The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.
Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate.
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power.
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense.
Looking Forward: High Density Libraries
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:
The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
Post Your CommentPlease log in or sign up to comment.
View All Comments
flgt - Tuesday, August 28, 2012 - linkI doubt it was AMD's master plan to give up the juicy profit margins in high performance and enterprise applications. I'm guessing that AMD would kill to have the revenue that Intel is pulling in from those small number of processors. AMD just can't compete hence the need to fall back to the low margin value business.
CeriseCogburn - Wednesday, August 29, 2012 - linkI had several friends predict the crap that bulldozer is long before it arrived simply by perusing leaks of the achitecture.
If my less than certified yet sentient friends can read the writing on the wall concerning the architecture choices long before they actually arrive...what the heck is wrong with amd's design teams ?
"Can't compete" is usually a phrase I toss out for it's overtly exaggerated usage, but in this case I make an exception.
Somehow amd found some light in the GPU arena concerning the same thing, then their drivers fall flat on their face far too often, ruining the core work.
I certainly hope their new hires can straighten out the mess, but hope for change has not been a well placed bet of late.
brucek2 - Wednesday, August 29, 2012 - linkWhat forum do you think you're on? Yes, if you want to debate the likely impact to AMD's volume sales, overall adoption of this family of chips, etc. there are many factors that are lot more significant than what its peak performance is like (even though you left most of those out, and hint, individual consumer preference has a lot less to do with it than it should.)
But this is not motleyfool or another stock discussion site, nor one that really is much interested with "mainstream consumers" in general. This is a site for hardware enthusiasts, and the big question most of us are going to have is, a) is this a chip we might be interested in having in one of our systems, and b) what technologies does it bring to the table that might be interesting as far as overall technical evolution of computing?
In short, the article is correct, the big question for this forum and this audience is how will it stack up against Haswell.
CeriseCogburn - Wednesday, August 29, 2012 - linkThe answer is the same in both cases, so you complained, then agreed, unwittingly.
r3fug3 - Wednesday, August 29, 2012 - linkIvy's OC issues are not from the die shrinkage... Its from the method used to attach the heat shield.
HighTech4US - Tuesday, August 28, 2012 - link> "Is it cheap, will it do Ebay and can my daughter play the Sims on it?"
> Thats all the criteria needed in most cases.
In that case just get a $200 tablet.
The Nexus 7 would do just fine with those criteria .
jabber - Tuesday, August 28, 2012 - linkAnd there you have the decline of the desktop PC.
Get used to it.
You will all be part of a smaller and smaller club.
swaaye - Tuesday, August 28, 2012 - linkMost people have always bought low end hardware so not much has changed. There are some more options now in tablets but those aren't really a replacement for a notebook/desktop because they have many constraints. My impression is they are used alongside normal computers.
swaaye - Tuesday, August 28, 2012 - linkI should say - alongside or as a supplementary media consumption toy.
CeriseCogburn - Wednesday, August 29, 2012 - linkYes, and PC sales are rising - with the population.
There is a point though, as overall percentage is of course, and has been, of course, not rising, as more gadgets tending toward mobile use are developed, and that has been occurring for some time now.
Unless the world population becomes completely nomadic 24/7/365, PC's are not going away.