At the AMD Zen microarchitecture announcement event yesterday, the lid was lifted on some of the details of AMD’s server platform. The 32-core CPU, codename Naples, will feature simultaneous multithreading similar to the desktop platform we wrote about earlier, allowing for 64 threads per processor. Thus, in a dual socket system, up to 128 threads will be available. These development systems are currently in the hands of select AMD partners for qualification and development.

AMD was clear that we will expect to hear more over the coming months (SuperComputing 2016 is in November 2016, International SuperComputing is in June 2017) with a current schedule to start providing servers in Q2 2017.

 

Analysing AMD’s 2P Motherboard

AMD showed off a dual socket development motherboard, with two large AMD sockets using eight phase power for each socket as well as eight DDR4 memory slots.

It was not stated if the CPUs supported quad-channel memory at two DIMMs per channel or eight channel memory at this time, and there’s nothing written on the motherboard to indicate which is the case – typically the second DIMM slot in a 2DPC environment is a different color, which would suggest that this is an eight-channel design, however that is not always the case as some motherboard designs use the same color anyway.

However, it is worth noting that each bank of four memory slots on each side of each CPU has four chokes and four heatsinks (probably VRMs) in two sets. Typically we see one per channel (or one per solution), but the fact that each socket seems to have eight VRMs for the memory would also lean into the eight-channel idea. To top it off, each socket has a black EPS 12V (most likely for the CPU), which is isolated and clearly for CPU power, but also a transparent EPS 12V and a transparent 6-pin PCIe connector. These transparent connectors are not as isolated, so are not for low power implementation, but each socket does have one attached, perhaps suggesting that the memory interfaces are powered independently to the CPU. More memory channels would require more power, and four-channel interfaces have been done and dusted before via the single EPS 12V, so requiring even more power raises questions. I have had word in my ear that this may be as a result of support for future high energy memory, such as NVDIMM, although I have not been able to confirm this.

Edit: The transparent EPS 12V could be a PCIe 8-pin in retrospect, but still seems excessive for the power it can provide.

Unfortunately, we could not remove the heatsinks to see the CPUs or the socket, but chances are this demo system would not have CPUs equipped in the first place. Doing some basic math based on the length of a DDR4 module, our calculations show that the socket area (as delineated by the white line beyond the socket) is 7.46 cm x 11.877 cm, to give an area of 88.59 cm2. By comparison, the heatsink has an active fin floor plan area of 62.6 cm2 based on what we can measure. Unfortuantely this gives us no indication of package area or die area, both of which would be more exciting numbers to have.

Putting the CPU, memory and sockets aside, the motherboard has a number of features worth pointing out. There is no obvious chipset or southbridge in play here. Where we would normally expect a chipset, we have a Xilinx Spartan FPGA without a heatsink, although I would doubt this is the chipset based on the fact that there is an ‘FPGA Button’ right above it and this is most likely to aid in some of the debugging elements on the system.

Further to this, the storage options for the motherboard are all located on the left hand side (as seen) right next to one of the CPUs. Eight SATA style ports are here, all in blue which usually indicates that these are part of the same head controller, but also part of the text on the motherboard states ‘ALL SATA CONNS CONNECTED TO P1’ which indicates the first processor (from the main image, left to right, athough P1 is actually the 'second processor') has direct control.

Other typical IO on the rear panel such as a 10/100 network port (for the management) and the USB 3.0 ports are next to the second processor, which might indicate that this processor has IO control over these parts of the system. However the onboard management control, provided by an ASpeed AST2500 controller with access to Elpida memory, is nearer the PCIe slots and the Xilinx FPGA.

The lack of an obvious chipset, and the location of the SATA ports, would point to Naples having the southbridge integrated on die, and creating an SoC rather than a pure CPU. Bringing this on die, to 14nm FinFET, will allow the functions to be in a lower power process (historically chipsets are created at a larger lithography node to the CPU) as well as adjustments in bandwidth and utility, although at the expense of modularity and die area. If Naples has an integrated chipset, it makes some of the findings on the AM4 platform we saw at the show very interesting. Either that or the FPGA is actually used for the developers to change southbridge operation on the fly (or that chipsets are actually becoming more like FPGAs, which is more realistic as chipsets move to PCIe switch mechanisms).

There are a lot of headers and jumpers on board which won’t be of much interest to anyone except the platform testing, but the PCIe layout needs a look. On this board we have four PCIe slots below one of the CPUs, each using a 16 lane PCIe slot. By careful inspection of the pins we can certainly tell that the slots are each x16 electrical.

However the highlighted box gives some insight into the PCIe lane allocation. The text says:

“Slot 3 has X15 PCIe lanes if MGMT PCIe Connected
Slot 3 has X16 PCIe lanes if MGMT PCIe Disconnected”

This would indicate that slot three has a full x16 lane connection for data, or in effect we have 64 lanes of PCIe bandwidth in the PCIe slots. That’s about as far as we can determine here – we have seen motherboards in the past that take PCIe lanes from both CPUs, so at best we can say that in this configuration that the Naples CPU has between 32 lanes and 64 lanes for a dual processor system. The board traces, as far as we were able to look at the motherboard, did not make this clear, especially when this is a multi-layer motherboard (qualification samples are typically over-engineered anyway). There is an outside chance that the integrated southbridge/IO is able to supply an x16 combination PCIe lane, however there is no obvious way to determine if this is the case (and is not something we’ve seen historically).

AM4 Desktop Motherboards

Elsewhere on display for Zen, we also saw some of the internal AM4 motherboards in the base units at the event.

These were not typical motherboard manufacturer boards from the usual names like ASUS or GIGABYTE, and were very clearly internal use products. We weren’t able to open up the cases to see the boards better, but on closer inspection we saw a number of things.

First, there were two different models of motherboards on show, both ATX but varying a little in the functionality. One of the boards had twelve SATA ports, some of which were in very odd locations and colors, but we were unable to determine if any controllers were on board.

Second, each of the boards had video outputs. This would be because we already know that the AM4 platform has to cater for both Bristol Ridge and Summit Ridge, with the former being APU based with integrated graphics and the updated Excavator v2 core design. On one of the motherboards we saw two HDMI outputs and a DisplayPort output, suggesting a full 3-digital display pipeline for Bristol Ridge.

The motherboards were running 2x8GB of Micron memory, running at DDR4-2400. Also, the CPU coolers – AMD was using both its 125W AMD Wraith cooler as well as the new 95W near silent cooler between all four/five systems on display. This pegs these engineering samples at a top end of this TDP, but if recent APU and FX product announcements are anything to go by, AMD is happy to put a 125W cooler on a 95W CPU, or a 95W cooler on a 65W CPU if required.

I will say one thing that has me confused a little. AMD has been very quiet on the chipset support for AM4, and what IO the south bridge will have on the new platform (and if that changes if a Bristol or Summit Ridge CPU is in play at the time). In the server platform, we concluded above that the chipset is likely integrated into the CPU – if that is true on the consumer platform as well, then I would point to the chipset-looking device on these motherboards and start asking questions. Typically the chipset on a motherboard is cooled by a passive heatsink, but these chips here had low z-height on fans them and were running at quite the rate. I wonder if they were like this so when the engineers use the motherboards it means there is more space to plug testing tools, or if it for another purpose entirely. As expected, AMD said to expect more information closer to launch.

Wrap Up

To anyone who says motherboards are boring, well I think AMD has given a number of potential aspects of the platform away in merely showing a pair of these products for server and desktop. Sure, they answer some questions and cause a lot more of my hair to fall out trying to answer the questions that arise, but at this point it means we can start to have a fuller understanding of what is going on beyond the CPU.

As for server based Zen, Naples, depending on PCIe counts and memory support, along with the cache hierarchy we discussed in the previous piece, the prospect of it playing an active spot in enterprise seems very real. Unfortunately, it is still a year away from launch. There are lots of questions about how the server parts will be different, and how the 32-cores on the SKUs that were talked about will be arranged in order to shuffle memory around at a reasonable rate – one of the problems with large core count parts is being able to feed the beast. AMD even used that term in their presentation, meaning that it’s clearly a topic they believe they have addressed.

 

 

 

Comments Locked

65 Comments

View All Comments

  • jjj - Friday, August 19, 2016 - link

    Correction: Broadwell-E at 3GHz max power not in Blender so in Prime95.
  • Kevin G - Friday, August 19, 2016 - link

    With 8 channels of DDR4, bandwidth would scale based upon the number of DIMMs in a system. Fully populated with fast memory, there would be enough for a lowend/midrange GPU. It wouldn't surprise me if AMD leveraged this socket for a HPC part based around their GPU architecture.
  • smilingcrow - Friday, August 19, 2016 - link

    Haswell:
    E5-2628 V3 85W 2.5 – 3GHz, Max Turbo @8 Core = 2.8GHz ~$700 (OEM only)
    E5-2667 V3 135W 3.2 – 3.6GHz, Max Turbo @8 Core = 3.4GHz, $2,057
    Broadwell:
    E5-2620 V4 85W 2.1 – 3GHz, Max Turbo @8 Core = 2.3GHz $417
    E5-2667 V4 135W 3.2 – 3.6GHz, Max Turbo @8 Core = 3.5GHz, $2,057

    Note: Broadwell 8 core is underwhelming because the focus is more on 10+ cores where they impress much more.
    So a Zen with a max boost of 3GHz @8 Core at 95W with a seemingly decent IPC would be an amazing comeback if the price is right and turbo for 2 or 4 cores was around 3.5 or more.
    Those expecting 3.5GHz with all 16 threads under full load are being optimistic even at 125W.
    Looking at pricing versus a Xeon a Zen 8 core @ 3GHz is probably close to an E5-2630 V4 85W (10 core @ 2.4 Max) which is $667.
    Of course the AMD motherboards should be a lot cheaper than the X99 boards; desktop boards take the Xeon chips as well as i7 Broadwell E.
  • jjj - Sunday, August 21, 2016 - link

    On pricing, Broadwell-E is 246.3 mm2 i believe while Skylake GT2 4C is half that while some 40% of the die is the GPU. Broadwell-E's die is aimed at server and it ends up much bigger than it could be if aimed at consumer.
    Size wise, Zen 8 cores should be closer to Skylake , if AMD focused on density and ofc adjusted for process. We don't know how big the core is and if the southbridge is integrated that ads some area but AMD should be able to have both reasonable pricing and good margins. Pricing it high in consumer, when they can easily do better, doesn't serve AMD's interests or ours.
    Normality would be many cores without GPU for folks that use discrete GPU and fewer cores plus a GPU in APUs for people that don't need discrete. With Zen, AMD can offer that normality. Intel doesn't have that many cores die aimed at consumer and , if Zen performs, they'll need one since Brodwell-E at sane prices would be uncomfortable for their financials.
    If Zen performs, AMD can harm both Skylake and Broadwell-E in systems with a discrete GPU. Hit Skylake by offering more cores and Brodawell-E by offering much much better prices.

    I even hope (but not expect) that they do a notebook 8 cores SKU, paired with a discrete GPU.There is no reason not to have more than 4 cores in notebook. If they can find the right balance between perf and power, why not. In notebook workstation Intel is pushing 4 cores Xeons and high end gaming is a growing notebook segment so why not address those segments this way. The ASPs would be nice for AMD and if they reach a reasonable perf/power balance it would be a major marketing asset.
  • Gadgety - Friday, August 19, 2016 - link

    So, stuff seems to be happening. Something like this, four of the Vega HBM2 version of the SSG Pro Duos with 1TB M2 SSD each for a $50,000 desktop. Will it be 4K VR capable? Yes, but too heavy to carry in your back pack or "I stumbled and fell backwards on my PC during the VR gaming."
  • MrSpadge - Friday, August 19, 2016 - link

    That's the price you have to pay to haul the big guns!
  • R3MF - Friday, August 19, 2016 - link

    32c/64t

    does that mean AMD is planning an MCM for server SKU's using four Zen SoC's, each of which is composed of two four-CPU modules? Sounds complicated!
  • R3MF - Friday, August 19, 2016 - link

    that would suggest a dual-channel memory access for each of the four Zen SoC's on the MCM, hence eight memory channels in aggregate.
  • milli - Friday, August 19, 2016 - link

    I think we can safely assume that it will be a MCM module. Specially considering that new fast interconnect developed for Zen.
  • BMNify - Monday, August 29, 2016 - link

    Its most likely they just took the arm ccn network interconnect rather than do new custom interconnect ,remember they delayed the drop in cortex soc ,not scrapped it...

Log in

Don't have an account? Sign up now