AMD Revises Bulldozer Transistor Count: 1.2B, not 2B
by Anand Lal Shimpi on December 2, 2011 2:36 AM ESTThis is a bit unusual. I got an email from AMD PR this week asking me to correct the Bulldozer transistor count in our Sandy Bridge E review. The incorrect number, provided to me (and other reviewers) by AMD PR around 3 months ago was 2 billion transistors. The actual transistor count for Bulldozer is apparently 1.2 billion transistors. I don't have an explanation as to why the original number was wrong, just that the new number has been triple checked by my contact and is indeed right. The total die area for a 4-module/8-core Bulldozer remains correct at 315mm2.
CPU Specification Comparison | ||||||||
CPU | Manufacturing Process | Cores | Transistor Count | Die Size | ||||
AMD Bulldozer 8C | 32nm | 8 |
1.2B |
315mm2 | ||||
AMD Thuban 6C | 45nm | 6 | 904M | 346mm2 | ||||
AMD Deneb 4C | 45nm | 4 | 758M | 258mm2 | ||||
Intel Gulftown 6C | 32nm | 6 | 1.17B | 240mm2 | ||||
Intel Sandy Bridge E (6C) | 32nm | 6 | 2.27B | 435mm2 | ||||
Intel Nehalem/Bloomfield 4C | 45nm | 4 | 731M | 263mm2 | ||||
Intel Sandy Bridge 4C | 32nm | 4 | 995M | 216mm2 | ||||
Intel Lynnfield 4C | 45nm | 4 | 774M | 296mm2 | ||||
Intel Clarkdale 2C | 32nm | 2 | 384M | 81mm2 | ||||
Intel Sandy Bridge 2C (GT1) | 32nm | 2 | 504M | 131mm2 | ||||
Intel Sandy Bridge 2C (GT2) | 32nm | 2 | 624M | 149mm2 |
Despite the downward revision in Bulldozer's transistor count by 800M, AMD's first high-end 32nm processor still boasts a higher transistor density than any of its 45nm predecessors (as you'd expect):
Transistor density depends on more than just process technology. The design of the chip itself including details like the balance between logic, cache and IO transistors can have a major impact on how compact the die ends up being. Higher transistor densities are generally more desirable to a manufacturer (fewer defects per die, more die per wafer, lower costs), but from the end user's perspective the overall price/performance (and power?) ratio is what ultimately matters.
43 Comments
View All Comments
Conficio - Friday, December 2, 2011 - link
Unfortunately the transistor count i snot mentioned in the table for Liano.However, Liano a 4C part of the old Core types has in the density graph double the density of Bulldozer?
I'd think if AMD is capable of such a dense design and it is advantageous, they'd use it for their flag ship processor.
In other words, can you add the Liano numbers to the first table and verify that the density is correct?
Thanks!
Marc HFR - Friday, December 2, 2011 - link
1.45B for 228mm2But the 1.45B seems way too high
For example Athlon II X2 CPU is 234 Millions transistors, and Redwood GPU is 627 Millions. 234x2 + 627 = 1.095 Billions and in this number we get double IMC etc...
tipoo - Friday, December 2, 2011 - link
Probably due to the on-die GPU portion of Llano, since GPU's have so much redundant hardware its easier to make them nice and dense.Evleos - Friday, December 2, 2011 - link
How could anyone believe that it was 2.4 billion?http://en.wikipedia.org/wiki/List_of_future_AMD_mi...
The_Countess - Friday, December 2, 2011 - link
2 x 1.2 = 2.4 billion for the dual-die server parts?I can easily see how that could lead to confusion.
chromatix - Friday, December 2, 2011 - link
Okay, what we know is that cache (and DRAM) are extremely transistor-dense, GPU compute area is fairly dense, and CPU compute area is much less dense (because it doesn't make regular patterns). Crossbar switches and other routing stuff is perhaps the least dense of all - it's all wires.As a rough estimate, caches require 64 transistors per byte, hence 64 million transistors per megabyte - so Deneb's 8MB total makes 512 million transistors just in the cache, Bulldozer doubles that to 16MB and 1024 million transistors for cache.
Subtracting the appropriate cache sizes from the original Deneb and Bulldozer figures left Bulldozer with twice the transistor count per core - not per module, per *core* - than Deneb. With no performance improvement per clock per core to show for it, I thought that was a really strange result.
Subtracting 800 million transistors from Bulldozer makes that comparison much more interesting. Deneb gets 246M over four cores, giving 61.5M transistors per core. Bulldozer gets only about 200M transistors over four *modules*, making on average 50M transistors per module, 25M transistors per core.
So somehow, Bulldozer's modules are actually more efficient in transistor count than Deneb's, despite the longer pipeline and contaiing two threads! A slight reduction in IPC per core is therefore entirely justified.
Marc HFR - Friday, December 2, 2011 - link
Bulldozer module (including L2 cache) is 213 millions transistors according to AMD at the 2011 International Solid-State Circuits Conference.85 millions excluding L2 cache according to your data (64 millions transistors per L2 Megabyte). It's much more than 50M ...
twhittet - Friday, December 2, 2011 - link
40% reduction in transistor count equals makes perfect sense, because it's about 40% slower than I thought it should have been.I remembered looking at the charts at the beginning, and wondering how the hell it was slower, clock for clock, than Thuban, with more than twice the amount of transistors.
dew111 - Friday, December 2, 2011 - link
I'm somewhat relieved at this news as well. It doesn't change bulldozer's performance, but it sure makes it look better for future variants to increase performance and power efficiency. If AMD can't beat Intel with 2x the transistor count, they would be in huge trouble. Luckily, with 1.33x the transistor count, they can trounce Intel in many multithreaded workloads. This makes a lot more sense, as it's what the architecture was designed to do. Bulldozer was meant to add more 'cores' with fewer transistors, and it appears with the real transistor count they have achieved this.Aone - Friday, December 2, 2011 - link
AMD should has corrected transistors of ONE module which w/ 2MB L2 has 213M tr. because if we'd do calculation 213M(tr./one module)*4= 852M transistors. 1200M - 852M= 348M.Is it possible that 348M transistors could serve 8MB L3 plus uncore parts?