Understanding Connectivity: Some on the APU, External Chipset Optional

Users keeping tabs on the developments of CPUs will have seen the shift over the last ten years to moving the traditional ‘northbridge’ onto the main CPU die. The northbridge was typically the connectivity hub, allowing the CPU to communicate to the PCIe, DRAM and the Chipset (or Southbridge), and moving this onto the CPU silicon gave better latency, better power characteristics, and reduced the complexity of the motherboard, all for a little extra die area. Typically when we say ‘CPU’ in the context of a modern PC build, this is the image we have, with the CPU containing cores and possibly graphics (which AMD calls an APU).

Typically the CPU/APU has limited connectivity: video outputs (if an integrated GPU is present), a PCIe root complex for the main PCIe lanes, and an additional connectivity pathway to the chipset to enable additional input/output functionality. The chipset uses a one-to-many philosophy, whereby the total bandwidth between the CPU and Chipset may be lower than the total bandwidth of all the functionality coming out of the chipset. Using FIFO buffers, this is typically managed as required. The best analogy for this is that a motorway is not 50 million lanes wide, because not all cars use it at the same time. You only need a few lanes to cater for all but the busiest circumstances.

If the CPU also has the chipset/southbridge built in, either in the silicon or as a multi-chip package, we typically call this an ‘SoC’, or system on chip, as the one unit has all the connectivity needed to fully enable its use. Add on some slots, some power delivery and firmware, then away you go.

Bristol Ridge’s ‘SoC’ Configuration

What AMD is doing with Bristol Ridge is a half-way house between a SoC and having a fully external chipset. Some of the connectivity, such as SATA ports, PCIe storage, or PCIe lanes beyond the standard GPU lanes, is built into the processor. These fall under the features of the processor, and for the current launch is a fixed set of features. The CPU also has additional connectivity to an optional chipset which can provide more features, however the use of the chipset is optional.

Here’s a block diagram to help explain:

On the APU we have two channels of DDR4, supporting two DIMMs per channel. For the major PCIe devices, we have a PCIe 3.0 x8 port, and this does not support bifurcation (or splitting) to any x4, x2 or x1 combination. It’s a solitary x8 lane suitable for a PCIe x8 port (we’ll discuss what else can be done with this later). The APU communicates with the optional chipset with a PCIe 3.0 x4 link, and we’ve confirmed with AMD that this is a simple PCIe interface. The other parts of the APU give four USB 3.0 ports, two SATA 6 Gbps ports, and two PCIe 3.0 x1 ports. These ports also support NVMe, and can provide two PCIe 3.0 x1 storage ports or can be combined for a single PCIe 3.0 x2.

It Looks Like an x16

Now, if you look at the layout, try counting up how many PCIe lanes are split across all the features. We’ve seen a USB 3.0 hub support four ports of USB 3.0 from a single lane of PCIe 3.0 before, and there are plenty of controllers out there that split a PCIe 3.0 x1 into two SATA ports. So play the adding game: x8 + x4 + x1 + x1 + x1 + x1 = x16. The Bristol Ridge APU seems to suggest it actually has sixteen PCIe 3.0 lanes, but AMD has decided to forcibly split some of them using internal hubs and controllers.

It’s an interesting tactic because it means that systems can be built without a discrete chipset, or the four chipset lanes can be used for other features. However it negates a full PCIe 3.0 x16 link for a full-bandwidth PCIe co-processor. Bearing in mind that if there was a PCIe 3.0 x16 link, there are no additional lanes for a chipset, so there would not be any IO such as SATA ports anyway, such that there would be no physical storage.

The x16 total theory is also somewhat backed up by the lack of bifurcation on the x8 link. Historically a PCIe root complex in a consumer platform that supports x16 can be bifurcated down to x8/x4/x4, and anything else requires additional PCIe switches to support more than three devices. It would seem that AMD has taken the final x4 link and added an on-die PCIe switch to provide those ports, for standard PCIe to USB/SATA controllers. I would hazard a guess and say that what AMD has done is more integrated and complicated than this, in order to keep die area low.

PCIe is Fun with Switches: PLX, Thunderbolt, 10GigE, the Kitchen Sink

Another thing about the x8 link is that it can be combined with an external PCIe switch. In my discussions with AMD, they suggested a switch that bifurcates the x8 to dual x4 interfaces, which could leverage fast PCIe storage while maintaining the onboard graphics for any GPU duties. There’s the other side, in using an x8 to x32 PCIe switch and affording two large x16 links. However, large GPU CrossFire is not one of the main aims for the platform.

Here’s a crazy mockup I thought of, using a $100 PCIe switch. I doubt this would come to market.


Ian plays a crazy game of PCIe Lego

The joy of PCIe and switches is that it becomes a mix and match game - there’s also the PCIe 3.0 x4 to the chipset. This can be used for non-chipset duties, such as anything that takes PCIe 3.0 x4 like a fast SSD, or potentially Thunderbolt 3. We discussed TB3 support, via Intel’s Alpine Ridge controller, and we were told that the AM4 platform is currently being validated for systems supporting AMD XConnect, which will require Thunderbolt support. AMD did state that they are not willing to speculate on TB3 use, and from my perspective this is because the external GPU feature is what AMD is counting on as being the primary draw for TB3 enabled systems (particularly for OEMs). I suspect the traditional motherboard manufacturers will offer wilder designs, and ASRock likes to throw some spaghetti at the wall, to see what sticks.

The Integrated GPU The Two Main Chipsets: B350 and A320
Comments Locked

122 Comments

View All Comments

  • patrickjp93 - Tuesday, September 27, 2016 - link

    Yeah, no. Even the 5775C has thoroughly beaten everything AMD has so far offered, even if narrowly. Skylake GT3e increased that gap, and GT4e dug a canyon between Intel and AMD. This is why I sincerely doubt AMD's HPC APUs will get much traction.
  • MrCommunistGen - Tuesday, September 27, 2016 - link

    Skylake's GT4e really is in a whole new league of iGPU performance (at a significant cost) - but to nitpick GT3e on Skylake very likely underperforms GT3e on Broadwell.

    Broadwell GT3e (specifically Iris Pro 6200) is 48EUs at up to 1150MHz backed by 128MB eDRAM on a 65W TDP quad core.
    Skylake GT3e (specifically Iris Graphics 550) is 48EUs at up to 1100MHz backed by 64MB eDRAM on a 28W dual core. (I'm not counting the listing for the "Server" GT3e which is listed as P555 with 128MB)

    Skylake has microarchitecture improvements to both the EUs and to the x86 cores. It also supports DDR4 and Gen9 includes delta compression - so the iGPU undoubtedly has more bandwidth available.

    That said, I really think that the smaller eDRAM, lower TDP, and lower max clock speed will make SKL GT3e slower than BDW GT3e.

    Due to the rearrangement of SKUs and iGPUs in SKL I think this is all natural. Intel is trying to *significantly* increase the iGPU perf available to their 15/28W dual core CPUs with its iteration of GT3e, however this allows for the potential that this architecture's GT3e underperforms the previous.
  • Danvelopment - Saturday, September 24, 2016 - link

    Something seems a bit funny about the IGPs of the A8 and A6. The A8 states R7 with 384SPs and a lower clock than the R5 in the A6 also with 384SPs.

    Should the A6 be R7 or 256 SPs? And if it's correct, how does the IGP compare between the two? And what makes the A8 faster?
  • Danvelopment - Saturday, September 24, 2016 - link

    Also, any chance in getting a couple R5 performance indicators?
  • MrCommunistGen - Tuesday, September 27, 2016 - link

    I noticed that too. I've seen tables and slides from various sites and they all list those specs. The only thing I can conclude is that since the A6 has half the CPU resources, maybe AMD just assumes that for most tasks you'll end up CPU bound to a large enough degree that even thought the iGPU is technically faster, performance will be lower.

    Or... the alternative is that there's a typo of some sort in the original materials published by AMD.
  • yannigr2 - Saturday, September 24, 2016 - link

    Oh, really nice article. Thanks for this.
    Well the first Zen processors will probably cost more than the A12. Then when Zen based APUs will be ready to come to the market, we will see and smaller Zen models.
  • ET - Saturday, September 24, 2016 - link

    Thanks for the writeup, Ian. I even learned some new things which I haven't yet read elsewhere. I do hope that Anandtech can get its hands on a chip for reviews.
  • NeatOman - Sunday, September 25, 2016 - link

    Marginal improvement IMO over last gen APU's. Looks interesting but an i3 will still "feel" faster as its single core power is still greater, although i almost always build AMD APU's for small offices because Microcenter has a instant $40 rebate when you buy in store. BUT.. ZEN SoC with HBM might be a game changer (to quote 2014), and rumor is Apple is eyeballing it for the next Macbook Pro. And AMD has a good track record on Unix based OS's which might be why the Mac Pro uses AMD... I'm going down a rabbit hole. SOO! A12... ehh.. no significant change but I'm waiting for ZEN.
  • utroz - Sunday, September 25, 2016 - link

    Ian Cutress on the first page you say: "Using the updated 28nm process from TSMC, AMD was able to tweak the microarchitecture and allow full on APUs for desktops using a similar design."
    Problem is that Carrizo and Bristol Ridge are actually made at Global Foundries on a 28nm process.
  • TheinsanegamerN - Monday, September 26, 2016 - link

    "Back at Computex, AMD announced the thing to fill that hole: Bristol Ridge (and Stoney Ridge by extension). This is what AMD will call and market it's '7th Generation' of APUs, and will be built on TSMC's 28nm Bulk CMOS process."

    From anandtech's look at carrizo in july. Everything else i've found doesnt say GF or TSMC, just 28nm.

    so far, evidence is pointing at TSMC, not GF.

Log in

Don't have an account? Sign up now