The basic building block of Bulldozer is the dual-core module, pictured below. AMD wanted better performance than simple SMT (ala Hyper Threading) would allow but without resorting to full duplication of resources we get in a traditional dual core CPU. The result is a duplication of integer execution resources and L1 caches, but a sharing of the front end and FPU. AMD still refers to this module as being dual-core, although it's a departure from the more traditional definition of the word. In the early days of multi-core x86 processors, dual-core designs were simply two single core processors stuck on the same package. Today we still see simple duplication of identical cores in a single processor, but moving forward it's likely that we'll see more heterogenous multi-core systems. AMD's Bulldozer architecture may be unusual, but it challenges the conventional definition of a core in a way that we're probably going to face one way or another in the not too distant future.


A four-module, eight-core Bulldozer

The bigger issue with Bulldozer isn't one of core semantics, but rather how threads get scheduled on those cores. Ideally, threads with shared data sets would get scheduled on the same module, while threads that share no data would be scheduled on separate modules. The former allows more efficient use of a module's L2 cache, while the latter guarantees each thread has access to all of a module's resources when there's no tangible benefit to sharing.

This ideal scenario isn't how threads are scheduled on Bulldozer today. Instead of intelligent core/module scheduling based on the memory addresses touched by a thread, Windows 7 currently just schedules threads on Bulldozer in order. Starting from core 0 and going up to core 7 in an eight-core FX-8150, Windows 7 will schedule two threads on the first module, then move to the next module, etc... If the threads happen to be working on the same data, then Windows 7's scheduling approach makes sense. If the threads scheduled are working on different data sets however, Windows 7's current treatment of Bulldozer is suboptimal.

AMD and Microsoft have been working on a patch to Windows 7 that improves scheduling behavior on Bulldozer. The result are two hotfixes that should both be installed on Bulldozer systems. Both hotfixes require Windows 7 SP1, they will refuse to install on a pre-SP1 installation.

The first update simply tells Windows 7 to schedule all threads on empty modules first, then on shared cores. The second hotfix increases Windows 7's core parking latency if there are threads that need scheduling. There's a performance penalty you pay to sleep/wake a module, so if there are threads waiting to be scheduled they'll have a better chance to be scheduled on an unused module after this update.

Note that neither hotfix enables the most optimal scheduling on Bulldozer. Rather than being thread aware and scheduling dependent threads on the same module and independent threads across separate modules, the updates simply move to a better default cause of scheduling on modules first. This should improve performance in most cases but there's a chance that some workloads will see a performance reduction. AMD tells me that it's still working with OS vendors (read: Microsoft) to better optimize for Bulldozer. If I had to guess I'd say that we may see the next big step forward with Windows 8.

AMD was pretty honest when it described the performance gains FX owners can expect to see from this update. In its own blog post on the topic AMD tells users to expect a 1 - 2% gain on average across most applications. Without any big promises I wasn't expecting the Bulldozer vs. Sandy Bridge standings to change post-update, but I wanted to run some tests just to be sure.

The Test

Motherboard: ASUS P8Z68-V Pro (Intel Z68)
ASUS Crosshair V Formula (AMD 990FX)
Hard Disk: Intel X25-M SSD (80GB)
Crucial RealSSD C300
Memory: 2 x 4GB G.Skill Ripjaws X DDR3-1600 9-9-9-20
Video Card: ATI Radeon HD 5870 (Windows 7)
Video Drivers: AMD Catalyst 11.10 Beta (Windows 7)
Desktop Resolution: 1920 x 1200
OS: Windows 7 x64 SP1 w/ BD Hotfixes
Single & Heavily Threaded Workloads Need Not Apply
Comments Locked

79 Comments

View All Comments

  • Conficio - Saturday, January 28, 2012 - link

    This kind of problem, more intelligent schedulers for a new architecture cries for open source to be the experiment and proofing ground.

    So I'd like to know what the scheduling behavior of Linux, BSD (and in extension Mac OS X, once Apple does use the architecture) is? Has AMD any experience? Do they work with any Universities to find optimal algorithms for this new architecture?

    If such new architectures where the "core" concept blurs will be more common in the future, there is sure some research that can shed some light on this topic. Does any body know?
  • just4U - Monday, January 30, 2012 - link

    I had to figure out what to do for our secondary system. Eventually I decided on the FX6100 which I picked up for $145... so a little bit more then a midrange I3. The board I got was a Asus M5A EVO for $110. That system is pretty fast... and I think overall a little better then what I'd have gotten out of an I3. I am on a i5 2500K everyday.. and before that a i7 920 setup..

    These new FX proccessors are not what some reviewers make them out to be. Their actually pretty good for their price range. Are they going to win any awards? Likely not.. but for most of us your not going to be pulling your teeth and whining about it being slow because their not.
  • Mugur - Monday, January 30, 2012 - link

    I wonder if the fact that the memory controller is running at 2-2.2 Ghz instead of the full cpu speed on Intel (the uncore part) and the cache latency is higher on AMD maked also FX cpus not competitive in single threaded tasks?

    Regarding the memory speed, I remember that high speed DDR 3 is required for the AMD APU line, not the FX line...

    I recently changed my gaming machine from a Phenom II X2 BE 3.3 Ghz (ran at 3.8 Ghz) to a Core i3 2120 and although some tasks in Windows seems a little slower (like browsing with a lot of pages opened - I have no ideea why - maybe the amount of cache PII had compared with the i3?), the gaming (Battlefield 3) improved a lot.

    I wanted to go FX route, but simply looking on some gaming benchmarks made me go Intel (and the fact that I found a cheap Z68 Gigabyte board, people are not taking into account that sometimes a good Z68 board is twice the price of a good AMD board). The other components were a 60 GB SSD / 500 GB HDD, 8 GB DDR1600 and a Radeon 6870.

    What I wanted to point out is that AMD does not compete through price properly with the FX line. A Core i3 2120 is about the same price like an FX 4100 and a Core i5 2500k has a lower price than an FX 8150. Only a high end Z68 board is (much) more expensive than an 8xx/9xx AMD AM3+ board...
  • Scali - Monday, January 30, 2012 - link

    We've seen linux kernel patches for Bulldozer that have about as much effect:
    http://openbenchmarking.org/result/1110200-AR-BULL...

    Let's just blame AMD, shall we?
  • richaron - Monday, January 30, 2012 - link

    I've seen linux benchmarks, before any patch, which show the 8150 perform much better compared to the 2500k. Check the Phorinox benchmarks (unfortunately there's no 2600k in them).

    Let's just blame Microsoft, shall we?
  • Scali - Tuesday, January 31, 2012 - link

    Well no.
    The point here is that kernel patches don't really have much of an effect on Bulldozer performance. There's no 'magic bullet'.

    If the 8150 performs better compared to the 2500k in linux, that is a different story.
    How does performance in linux compare to Windows?
    It could also be that linux is just slower on the 2500k than Windows is (which I'm quite sure to be the case).
    So no, I'll just blame AMD. The Bulldozer module architecture just doesn't work.
  • superccs - Monday, January 30, 2012 - link

    How come this was not developed in association with windows BEFORE release....???? That's like releasing a new GPU without having a firmware that runs the new feature set.

    I sure hope the recent house cleaning at AMD got rid of some of these upper level jack@sses responsible.
  • wingless - Monday, January 30, 2012 - link

    The 4100 and 6100 look like they could gain a bit from this patch as well. Will there be a test with them included?
  • just4U - Monday, January 30, 2012 - link

    I don't see either of them on bench results. Did Anand even review them? I don't think they did.
  • Trailmixxx - Monday, January 30, 2012 - link

    Has droppend from over by a factor of 5 on my system after applying the hotfixes. Can someone else confirm this? And possibly on an Intel system as well?

Log in

Don't have an account? Sign up now