AMD's Steamroller Detailed: 3rd Generation Bulldozer Core
by Anand Lal Shimpi on August 28, 2012 4:39 PM EST- Posted in
- CPUs
- Bulldozer
- AMD
- Steamroller
Cache Improvements
The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.
Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate.
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power.
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense.
Looking Forward: High Density Libraries
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:
The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
Final Words
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
126 Comments
View All Comments
fic2 - Wednesday, August 29, 2012 - link
I was thinking that he actually meant Sandy Bridge or possibly Ivy Bridge instead of Haswell... - it is what he should have meant anyway.jabber - Wednesday, August 29, 2012 - link
..."Shock as AMD chip fails to excite minority tech audience in numerous pointless synthetic benchmarks that no one with a life cares about!"Elsewhere the rest of the world worries about important stuff and tries to make a living.
meloz - Wednesday, August 29, 2012 - link
>Elsewhere the rest of the world worries about important stuff and tries to make a living.And they do this by buying Intel CPUs, apparently, because Intel have over 90% marketshare and it is only increasing day by day. It would appear that having an Intel processor does not get in the way of "important stuff" and "making a living", quiet the contrary.
Only large scale dumping of CPUs / APUs to OEMs at cost price -and their graphic division- has kept AMD alive these past few months. No hope on the horizon, either.
jabber - Wednesday, August 29, 2012 - link
The follow up story - 'IT Folks constantly fail to understand irony shock!'Aaron73 - Wednesday, August 29, 2012 - link
"Elsewhere the rest of the world worries about important stuff and tries to make a living."Apparently not you, as you have posted multiple pointless comments to this article. I happened to notice while on my lunch break.
Sub Zero - Thursday, August 30, 2012 - link
The new AMD architecture is noticeably slower than the last one. My 4 core 965 is faster than the 8150 in many operations - most in fact. It's pathetic and inexcusable.I'm so disenchanted with AMD that even with all of the integrated video stuff on Intel systems and the extra cost, I am probably going to go with Intel for every purchase in the near future. I'm going to recommend Intel to everyone who asks, especially gamers.
AMD is just not worth putting any money into.
Laststop311 - Thursday, August 30, 2012 - link
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs.-Shame they got lazy and went with an automated design when they could of had faster chips if they designed them by hand.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
-28nm really? Intel will be on it's 2nd gen of 22nm over a year after 22nm debuts for intel and amd still can't match that size. Will steamroller be enough to go up against haswell, thats not even a legit question, haswell is going to obliterate steamroller in every way imaginable. Haswell will most likely run cooler, use less power while at the same time delivering more performance. The only hope for steamroller will be the fusion chips where amd will actually beat intel on the i-gpu. I see a good market for steamroller HTPC's and also steamroller gaming ultrabooks, if you can crossfire the i-gpu with another radeon card it would also make larger 15-17" gaming notebooks attractive. You basically get a free crossfire set up with a big boost in graphics performance without actually having to have 2 power hungry heat producing gpu's. 7970m(or 8970m if its out) + highest level steamroller fusion i-gpu will produce some pretty smooth sexy graphics without needing all this room and extreme cooling and loud fans that accompany dual gpu laptops (sorry m18x I still love you like my child since I have no children)
hapkiman - Sunday, September 2, 2012 - link
You know I was an AMD fanboy all the way, and waited, and waited....and waited for Bulldozer, expecting a significant step over my Phenom IIx6 1090T. I used to talk trash with my Intel buddies, and I stuck with AMD through some hard times. And when Bulldozer finally came out and was basically a letdown- I felt burned. AMD has got to do something with their core business model and there fabrication process. Some of these FX chips they're producing are ok, and may have a niche market, but geeze AMD, when you have Intel as your main and only competitor - you've got to step up your A game. And AMD had consistently failed to do that.Sorry guysbut I gave up and put together an Ivy Bridge rig - and I am amazed at how much smoother and faster it is than my old 1090T rig.
If they don't hit one out of the park soon, I see AMD turning into a second rate company making low-end APUs for OEMs. and of course graphics cards.
PLEASE prove me wrong AMD. Make Steamroller something special.
mikato - Tuesday, September 4, 2012 - link
Hmm, I have a Phenom II X4 965 (and SSD for OS and programs) and my system is completely smooth and almost all everyday tasks are instant, so I'm not sure how you could be seeing that much of a difference. Maybe your OS got bloated up or something?AVoyeur4U - Friday, September 14, 2012 - link
I have an Intel i7-based notebook provided by my employer, use an HP DL380 G5 (8way 32GB RAM w/ nVidia graphics) as my primary workstation instead, and have a handful of them at home. All of that compute power and I still prefer to use the system which I'm on currently, which is built on an AMD Athlon x64 x2 6400 with and nVidia GeForce 8800 GTS video card and 8GB of RAM.Being an AMD 'fanboy' combined with how well my home desktop has held up and all the hype leading up to Bulldozer, I've held off on upgrading. Heck, I even took down my servers and have had them stacked in the garage for the last couple of years.
Each missed release date only made it that much more disappointing when Bulldozer finally released and fell far short of expectations. I actually found myself looking forward to the 2nd generation release before the 1st became available. After a few more months of waiting I finally decided to pull the plug on AMD and build an Intel-based system.
Out of pure desperation to be proven wrong by AMD I decided to check and see if there was anything in the pipeline that would persuade me to hold off on clicking the check-out button.
I read 2 other articles prior to this regarding their 3rd generation builds. All 3 the same. Consequently, I sit here asking myself WTF!? Not at AMD (this let down was expected) but at myself. There is no other manufacturer, service provider, or producer that I would tollerate this from, why am accepting it from AMD?
No more... Going to go check out after posting. By time I'm ready to purchase again it'll most likely be a choice between nVidia's upcoming CPU line and Intel - no longer hopeful that AMD will be able to become competitive again.
(I'm well aware that AMD is quite competitive, especially when it comes to price/performance; however, they are so far behind in terms of currency. They don't support the latest technologies/components and most likely won't within the next 3 years - I don't plan to upgrade again within that time so they're out.)