Integer Crunching Power

Each core has two integer executions units (EX0 and EX1) and two AGUs (Address Generation Units). For comparison, the K10 core inside Magny-Cours and Istanbul had three ports to a “Fully featured ALU + AGU” couple. AMD marketing cleverly drew four pipeline blocks inside the Bulldozer integer core, but those powerpoint blocks cannot hide the fact that each Bulldozer integer core has fewer execution resources.

In practice, the AG0 and AG1 are little more than assistants with limited capabilities to EX0 and EX1.The software optimization guide for AMD family 15h processors lists only a few instructions (page 248 in the January 2012 version) that can be processed by the AG0 and AG1 execution units and each time the remark "First op to AG0 | AG1, Second to EX0 | EX1" is made. The AG0 and AG1 execution units reduce the latency of the CALL and LEA instructions, but the maximum throughput of each integer core inside the Bulldozer module is only two integer instructions per clock cycle. It's only when a fused branch enters EX0 and another integer instruction can enter EX1 that we have a slightly higher throughput of three integer instructions.

So the Bulldozer integer core can execute one integer instruction less per cycle (2 vs 3). That doesn’t mean that the Bulldozer integer core is 1/3 slower, however. The integer core of Bulldozer is smaller but also more flexible. The per lane dedicated 8-entry schedulers are gone, and a much larger 40 entry scheduler replaced it. This means that Bulldozer should be better at extracting ILP (Instruction Level Parallelism) out of code that has low IPC (Instructions Per Clock).

In some integer intensive applications, the fact that the maximum throughput of integer instructions is somewhat lower might slow things down. That is the not very useful "it depends" answer, so let's clarify: what kind of applications are we talking about?

Setting Expectations: the Front End Reevaluating the Situation
Comments Locked

84 Comments

View All Comments

  • Taft12 - Wednesday, May 30, 2012 - link

    Johan, this is the best article I've read on Anandtech in quite some time, even better than Jarred, Ryan and Anand have come up with lately.

    The level of analysis goes far, far beyond just what the benchmarks show.

    Bravo!
  • JohanAnandtech - Thursday, May 31, 2012 - link

    Great! Good to read there are still people that like these kinds of analysis!

    :-)
  • ct760ster - Wednesday, May 30, 2012 - link

    Would be interesting if they could test the aforementioned benchmark in an OS with a customizable kernel like GNU-Linux since code optimization is not possible in most of the proprietary format benchmark.
  • alpha754293 - Wednesday, May 30, 2012 - link

    What about the lacklustre FPU performance?

    The very fact that the FP has to be shared between two integer cores and as far as I know, it cannot run two FP threads at the same time, so for a lot of HPC/computationally heavy workloads - Bulldozer takes a HUGE performance hit. (almost regardless of anything/everything else; although yes, it counts, but remembering that CPUs are glorified calculators, when you take out one of the lanes of the highway and two-lane traffic is now squeezed down to one lane, it's bound to get slower.)
  • The_Countess - Wednesday, May 30, 2012 - link

    except the FP CAN run 2 threads at the same time.
    only for the as yet pretty much unused 256bit instructions does it need the whole FP unit per clock.

    in fact the FP can run 2 threads of 128bit, or 4 even of 64bit.
    and a single CPU can use 2x128bit or both can use 1x128.
    intel and AMD previously had only 1x128bit capability per core.
    so there is no regression in FP performance per core. its just much more flexible.
  • Zoomer - Wednesday, May 30, 2012 - link

    FPU throughput is much more irrelevant nowadays, as many FP intensive HPC computations have already been ported to GPUs. Yes, there may be instances where there might be FP heavy and branchy, not easily parallelization or otherwise unsuitable, but such beasts are few and far between. I can't think of any, to be honest.
  • Iger - Wednesday, May 30, 2012 - link

    Thanks a lot, that was a very interesting read!
  • Rael - Wednesday, May 30, 2012 - link

    AMD should fire all its marketing department, because these guys accustomed to lie at every announcement they make. The performance gains are multiplied by five or ten, and the per-core advancement, which is close to zero, is presented as 'significant'.
    I don't believe these announcements anymore.
  • jabber - Wednesday, May 30, 2012 - link

    What the whole of the AMD Marketing team?

    Thats Tim the caretaker and Trisha on the front desk isnt it?

    I thought AMD's marketing budget was around $42.
  • kyuu - Wednesday, May 30, 2012 - link

    Oh hai. You must be new to the human race. Marketing and "stretching the truth" have been synonymous since... forever. AMD is hardly exceptional in this regard. Stop believing anything any marketing department sells you, period.

Log in

Don't have an account? Sign up now