AMD's Steamroller Detailed: 3rd Generation Bulldozer Coreby Anand Lal Shimpi on August 28, 2012 4:39 PM EST
- Posted in
The shared L1 instruction cache grew in size with Steamroller, although AMD isn’t telling us by how much. Bulldozer featured a 2-way 64KB L1 instruction cache, with each “core” using one of the ways. This approach gave Bulldozer less cache per core than previous designs, so the increase here makes a lot of sense. AMD claims the larger L1 can reduce i-cache misses by up to 30%. There’s no word on any possible impact to L1 d-cache sizes.
Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate.
The L1 to L2 interface has also been improved. Some queues have grown and logic is improved.
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache. Based on workload and hit rate in the cache, a Steamroller module can choose to resize its L2 cache (powering down the unused slices) in 1/4 intervals. AMD believes this is a huge power win for mobile client applications such as video decode (not so much for servers), where the CPU only has to wake up for short periods of time to run minor tasks that don’t have large L2 footprints. The L2 cache accounts for a large chunk of AMD’s core leakage, so shutting half or more of it down can definitely help with battery life. The resized cache is no faster (same access latency); it just consumes less power.
Steamroller brings no significant reduction in L2/L3 cache latencies. According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense.
Looking Forward: High Density Libraries
This one falls into the reasons-we-bought-ATI column: future AMD CPU architectures will employ higher levels of design automation and new high density cell libraries, both heavily influenced by AMD’s GPU group. Automated place and route is already commonplace in AMD CPU designs, but AMD is going even further with this approach.
The methodology comes from AMD’s work in designing graphics cores, and we’ve already seen some of it used in AMD’s ‘cat cores (e.g. Bobcat). As an example, AMD demonstrated a 30% reduction in area and power consumption when these new automated procedures with high density libraries were applied to a 32nm Bulldozer FPU:
The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.
We won’t see these new libraries and automated designs in Steamroller, but rather its successor in 2014: Excavator.
Steamroller seems like a good evolutionary improvement to AMD’s Bulldozer and Piledriver architectures. While Piledriver focused more on improving power efficiency, Steamroller should make a bigger impact on performance.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
Post Your CommentPlease log in or sign up to comment.
View All Comments
StevoLincolnite - Wednesday, August 29, 2012 - linkThe Desktop isn't going anywhere, neither is it shrinking, the sales rate is merely slowing down as everyone has one.
Netbooks hit the same wall a couple years ago, tablets and phones will hit the wall in due time.
The Netbook didn't kill off the laptop, the laptop didn't kill off the desktop, they all compliment each other.
We have been in a post-pc world since early 2000 and in that time the sales of PC's have tripled.
Conficio - Wednesday, August 29, 2012 - linkI'd like you to see bid frequently (daily or more) and successfully on eBay items on a Nexus 7.
Not to mention the crowd of people that actually sell something on eBay. have fun uploading multiple pictures and typing longer descriptions on an eBay item on a tablet.
Hrel - Tuesday, August 28, 2012 - linkLet me know when AMD releases a new CPU that is at least 100% faster than their last CPU. Cause that's the only time I'll consider them even being an option again. Honestly AMD, add SMT. The performance gain/watt is amazing. You can still have more cores, but have SMT too.
Taft12 - Tuesday, August 28, 2012 - linkDefine faster.
CeriseCogburn - Wednesday, August 29, 2012 - linkSomething that doesn't get renamed "crapdozer". LOL
nicamarvin - Thursday, August 30, 2012 - linkIvy is only 5% faster than Sandy, let me know when Intel releases a new CPU thats at least 100% faster than their las CPU
Lepton87 - Tuesday, August 28, 2012 - linkNot by a long shot. All we can expect this steaming pile of shitty engineering is to be competitive with nehalem. Still worse ST performance but better MT performance. There's only so much you can do with polishing a turd.
CeriseCogburn - Wednesday, August 29, 2012 - linkBut what if it becomes a petrified turd from being around so long and getting buried all the time ?
Then it seems it could be a really hard, polished up.... legendary find ?
nicamarvin - Friday, August 31, 2012 - linkgood thing these are processors and not Turds, and they can and will be polished
Belard - Tuesday, August 28, 2012 - linkAMD as a whole, needs to streamline their entire consumer line. The Steamroller sounds good in everyway - but we need to see it. By the time it comes out, I'll be chugging along with my intel i5 CPU... my Core2Quad is actually holding up pretty damn good.
Much of my AMD friends and clients have gone intel already. But, I have no problems building an AMD system as long as it provides good performance for the price... which is something the FX DID not come close to doing. There is simply NO way I can recommend any FX CPU to anyone... The A-series for low-end is fine. Windows8 is another thing to mess things up, hopefully Windows7 will be available for us IT /small tech people to continue building and selling systems.
The problems (I see) with the AMD mess, which should hopefully be cleared up by 2013. Currently, AMD has 3 different sockets on the market. Its confusing as to what chip goes with which chipset etc etc. Socket A+ needs to die. The CPUs need to be like Core i-series, ALL of them have a GPU built in -THAT can be used as a co-processor if not used for graphics at any time. It simplify the SKUs.
Socket FM1 is dead... Socket FM2 is currently shipping only from OEMs (HP, etc). But the bone-head thing is that FM2 is not at all compatible with FM1 - yet current FM2 motherboards use the EXACT same AMD north bridge! WTF?! FM2 doesn't support PCIe 3.0 And according to the LAST AMD roadmap I've seen, AMD won't have a PCIe 3.0 chipset until 2014? Hey, doesn't AMD sell PCIe 3.0 video cards? Yep... and you can't use them on an AMD powered computer... how stupid. FM1/2 chipset are more advanced than AM3 as they have native USB 3.0 support.
AMD needs to get their butt into gear. There should only be FM2, a NEW chipset in 2013 with PCIe 3.0 support. The new Steamroller CPUs should have a whole new brand name and model number. "FX" has been poisoned. AMD ruined the name of FX from the past.
How about Athlon III X4-3400 (quad core @ 3.4Ghz)?
I hope AMD does well... I'm not counting on it... but they may not be as stupid as Microsoft.