in 2019, Intel announced its Cascade Lake family of enterprise processors, and sitting at the top of the stack was the Cascade Lake-AP family: a quartet of parts that changed Intel’s paradigm for high-end processors. This hardware used two of Intel’s large 28-core silicon dies in the same package, providing a weakly linked dual-processor system in a single package, built to look like a single processor up to 56 cores and 12 memory channels with up to a 400W TDP. Despite not providing pricing, Intel is keen to promote the Xeon 9200 as its extreme performance platform up against AMD’s 64-core EPYC "Rome" offering. We saw a number of Xeon 9200 systems on display at the Supercomputing 2019 show, and the discussions we had were interesting in their own right.

For readers who aren’t familiar with the Xeon 9200 series, or Intel’s enterprise product portfolio: the Cascade Lake Xeon Scalable family offers a variety products, based on three sizes of dies. The smallest die, LCC, goes up to 10 cores. The middle size die, HCC, offers up to 18 cores. The largest sized die, XCC, gives up to 28 cores. This means that anything 10 cores or under, could be LCC, HCC, or XCC, but a 24 core product would only be XCC. The product lines are split by socket compatibility: the Xeon Platinum 8000 supports up to eight sockets, the Xeon Gold 6000 supports up to 4 sockets, Xeon Silver 4000 gives two sockets, and Xeon Bronze 3000 is single socket only. Within each bracket there will be a range of core counts, however, the high-end core counts are typically found only in Xeon Platinum, while Xeon Bronze only offers up to six cores.

The Xeon Platinum 9200 series, on the other hand, is something slightly different. Rather than a single die in the package, there are two: specifically, two XCC-sized silicon dies, connected through the package. As a result, the Xeon 9200 CPUs can offer up to 56 cores per package, with double the memory channels of a regular Xeon processor. Because an individual package has two silicon dies on it, this hardware is limited to a dual socket configuration, which acts pretty much identically to a traditional quad-socket configuration.

There are some features unique to this family compared to Intel’s other Xeons: TDP starts at 250W for the 32-core parts, going up to 400W for the full-fat 56-core part. Each of these processors supports the AVX-512, and are key high-performance processors showcased with Intel’s DL Boost AI acceleration software stack. On a per-core basis, the frequency hits a peak of 3.8 GHz, with a base frequency of 2.6 GHz. When channeling 400W of power though one package, across 56 cores, the per-core frequency isn’t going to be at the top of the stack, but the idea is that with sufficient parallelism, a user can get dual-socket like performance with only a single socket.

The other feature that differentiates his hardware from the other Xeons is the fact that they are BGA-only. This means that the processors are soldered onto the motherboard, and cannot be changed or removed by users. As a result, when a server is sold with one of these processors, it has a fixed processor configuration. Consequently, rather than requiring all of its partners and resellers to build their own motherboards and server systems, Intel is manufacturing reference design systems for its partners to re-sell. Aside from custom installations, say a supercomputer, there is no way to deviate from the Intel reference design platform.

When dealing with these high performance processors, Intel states that for single and dual processor configurations at 250 W each, these can be air cooled. For any system that uses a 350 W or above, either single or dual socket, then liquid cooling is required.

On top of all this, it is worth noting that unlike its socketed regular Xeon processor platform, Intel does not publicly disclose per-processor pricing for its Xeon 9200 series. For people in this field that work with these types of hardware, they will point out that the ‘Intel sticker price’ is almost useless for big customers – the major cloud partners and hyperscalers are likely to be paying a fraction of the price for the hardware, given that they buy servers on the order of 1000s to 100,000s.

Nonetheless, one of the criticisms leveled at Intel’s feet is that this means that Xeon 9200 processors are kind of ‘floating’, and people analyzing the hardware holistically have no way to gauge performance per dollar. Given that Intel’s 28-core 205 W Platinum 8280 has a list price of $10006, quoting a processor using two of these dies on a single package in a specialist BGA system is likely to have a list price that will run double, if not more. Intel doesn’t state whether we should use ~$20,000 for performance per dollar comparisons, or something closer to ~$35,000+, given how different it is to Intel’s regular Xeon product line. If in doubt, use the latter, or push Intel to actually put dollar amounts on its products

On the show floor at Supercomputing, we expected a high concentration of Intel partners and resellers with Xeon 9200 systems on display. As Intel’s highest performance x86 hardware, we would typically expect it to be getting a sizable amount of floor space, coverage, and partner support. Some of this was borne out – a number of Intel’s key partners did indeed have the hardware, one of Intel’s reference 2U half-width blades, and the smaller the reseller, the more prominently it was at the front of the booth. If you were lucky, there would also be a dud packaged CPU with a logo on top.

Interesting to find out the state of play of the Xeon 9200 family, as a member of the press, I did ask as many of Intel’s partners as I could about the 9200 system in front of them. I asked about which processor versions they are stocking, whether the hardware had garnered any interest, and how customers were approaching the platform against how many units they were expecting to sell.

All the smaller resellers said pretty much the same thing: if they stocked any, it would likely be the air-cooled systems. They all talked about people coming to the booth, being interested in learning about the hardware, but none of their customers were ultimately willing to put some money down for one, even though the platform was announced half-a-year ago. One particular reseller, when asked if they expect to sell even one unit, said ‘no’.

One vendor did actually say something half interesting. Colfax, a reseller of OEM systems and a big consultant for a number of companies in the industry, with custom software stacks, is going to be selling the servers direct on its website. Even better than that, they will be allowing customers on the web to use its configurator, to price up a system before they go any further with the purchase. At the time of writing, this configurator is not yet online, but when it is it should give us some indication of the pricing differences between the different Xeon 9200 CPUs (if Intel hasn’t formally disclosed the list prices by that time).

One of the large OEMs was very clear that they don’t plan on stocking or reselling the Xeon 9200 system. Instead, the customers for which the Xeon 9200 might be reasonably relevant are requesting quad-socket platforms and blades. A true quad-socket system will offer more total memory than a Xeon 9200 system, can potentially support Optane, and uses socketed processors that can be adjusted and configured easily. Not only that, but the system would be easier to cool.


An example quad-socket system on display at Supermicro

Ultimately, Intel’s Xeon 9200 processors are trying to solve one specific issue with certain customers: density. With the right configuration and cooling, a customer can fit two 56-core CPUs into a 1U half-width node, giving 448 threads of AVX-512 performance in a 1U, with liquid cooling. The number of customers that are density constrained to that amount seems to be very low, and those that are on the boundary are telling the resellers that they’d prefer a more configurable slightly lower density configuration that doesn’t require a liquid cooling infrastructure.

One of Intel’s resellers gave us some insight into their contracts. This particular company deals with a number of university supercomputing contracts – those that work with separate research grants and add to their compute power over time, and perhaps spend $20k-$250k per year to build their systems. These systems might also be held off-site. These customers aren’t interested in Xeon 9200. Even bigger customers, spending $1-2m a year, aren’t looking at Xeon 9200 either. For any customer that wants lots of cores, and don’t have a specialized Intel-only software stack, they might even look at AMD’s high-core count offerings that are easier to cool, offer more memory, and more I/O.

Speaking to that, one of the OEMs that provides a number of reference designs for several key supercomputers mentioned that even though they have a strong Intel business, their AMD business is booming, especially for high-performance computing. They see the per-core cost and the overall system cost as big factors, and these customers don’t have any desire to touch the Xeon 9200 family.

With all this being said, there was a lot of presence of Xeon 9200 at Supercomputing. A number of these companies stated that they put it front and center of their booths as a hook – to get people (and customers) to talk about it and then discuss those needs. But ultimately the best hardware for people approaching them was something else.

So Who Exactly is the Xeon 9200 For?

So Intel does have two key wins with the Xeon 9200 hardware. On the TOP500 list of most powerful supercomputers, there are two new entrants with the 9200 hardware.

At #40 is the Lise system, installed at the HLRN in Berlin, which is an Atos Bull cluster using Xeon 9242 (48-core) CPUs , no co-processor accelerators, and an Intel Omnipath interconnect. This system has a theoretical peak throughput of 7.6 PetaFLOPs, and a total of 103,680 cores. This reduces down to 2160 actual Xeon 9242 processors, which at its highest density would be 1080 Us, or 26 racks (likely more, based on power, thermals, cooling, storage nodes, networking nodes, and so on). At 350 W TDP each, CPU Power alone would be 0.756 MW, and the list puts the total system power at 1.258 MW.

At #69 is CTS-1 MAGMA cluster, at Lawrence Livermore National Laboratory. This is a Penguin Computing Relion cluster, also using Xeon 9242 (48-core) CPUs with no co-processor accelerators and an Intel Omnipath interconnect. This system has a peak throughput of 4.6 PetaFLOPs, and a total of 62,400 cores. This reduces down to 1300 actual Xeon 9242 processors.

From #70 all the way down to #500, there are no other Xeon 9200 systems in the list. Just for these two supercomputers, we’re looking at a total of 3460 CPU packages, or 6920 of the XCC dies. Doing some math, knowing the size of the XCC die (694mm2), how many die can fit on a single 300mm wafer (72), and assuming a yield of Intel’s 14++ process as somewhere from 65% to 85%, we’re looking at a total of 110-150 wafers.

We could compare this to one of the big supercomputers that has regular Xeons built on XCC, such as #3 Frontera, which uses Xeon 8280 28-core processors and has a total of 448,448 cores, or ~16000 CPUs / XCC dies. Even going for a smaller supercomputer that uses XCC, like #42 which uses Xeon 6248 20-core dies and has a total of 88,400 cores – this is still 4420 dies. Summing up all the XCC systems shows that the dies that go into the Xeon 9200 hardware are a tiny fraction of what comes out, and even if wafers were specifically made for this hardware, it would again be a tiny amount.

Beyond these two systems, it is hard to gauge exactly how wide-spread Xeon 9200 adoption is. Based on our conversations at the Supercomputing show, partners seemed to be both amused and bemused at the prospect of selling and supporting the platform to anyone who didn’t have a sizable budget to build a fresh supercomputer. In some instances, Intel’s partners would state that going for the lower core count 32 chips didn’t make sense over the standard 28-core, because even despite the four extra cores, the BGA aspect of the system meant less flexibility, a higher cooling requirement, and for single CPU blades, a worse die-to-die bandwidth configuration over a standard socketed 2P system.

Intel's Reply

In advance of publishing this article, we briefed Intel about our article, and thought it only correct to give them a chance to address the criticisms directly in our article. We asked Intel for the latest official line on its Xeon 9200 series. As part of that response, Intel also supplied commentary from one or two of its partners that have deployed Xeon 9200-based systems.

From Carolyn Henry, Senior Director of Strategic Marketing of the Intel Data Platforms Group:

"High performance computing is one of the most demanding compute and memory bandwidth workloads and requires some of the most advanced technologies. We introduced the Xeon Platinum 9200 to address these workloads, and the insatiable performance demands of our HPC customers. Intel has over 20 years of delivering leadership product to our HPC customers, as is evident by the number of Intel-based Top500 systems, and we are committed to continuing to delivering HPC leadership through solutions like the Xeon Platinum 9200 and the S9200WK server system product family.

Customers deploying the Intel Xeon Platinum 9200 achieve higher node performance, which in turn drives lower TCO as fewer nodes are required for a fixed performance level. Fewer nodes drive lower node acquisition cost, lower fabric, switching, and cabling cost for highly optimized rack-level deployment.”

Also, from William Wu, VP of Hardware Products Penguin Computing, who deploys Xeon 9200 solutions:

“Artificial intelligence is permeating across industries and Penguin Computing data science customers require HPC clusters that are designed and built to address the new and demanding workloads brought by the convergence of AI and HPC. The Intel Xeon Platinum 9200 processors deliver breakthrough levels of performance and have enabled us to design, build and deliver groundbreaking solutions for our customers. Together, we are delivering a converged platform that allows AI to accelerate and speed up HPC modeling, as well as manage HPC and AI workloads more effectively."

Beyond this generation, it has been assumed that Intel will continue its dual-die ‘AP’ strategy into Cooper Lake Xeons in 2020, Ice Lake Xeons in 2020, and perhaps even Sapphire Rapids Xeons in 2021. By contrast, it would appear that AMD’s chiplet strategy has helped the company compete, by offering >1000 mm2 of silicon in a single package. Intel needs a chiplet strategy of its own, and based on our discussions at the show, aside from a few select customers, gluing together two XCC dies doesn’t seem to be the right path. Intel needs to be at the forefront of driving performance and innovation, as we recently saw with its Tremont Atom microarchitecture disclosure and trying new things like dual-decoder groups.

Exact Performance

For our readers that keep their ears to the ground on enterprise performance, it would have been hard to miss the fact that Intel pushed out performance numbers on its Xeon 9200 recently in the form of a medium blog post (rather than say, Intel’s own website). Some of Intel’s benchmarking setups comparing AMD to its Xeon 9200 were quickly called into question, as well as the software stacks used. Intel responded in kind, but not a mea culpa as such – it admitted a typo, but stated that it didn’t need to use the latest software as it yielded the same performance.

Of course, we often discourage our readers from reading deeply into first party benchmarks. As much as we would like these companies to provide fair and balanced testing, even if they did it still has to be taken with a pinch of salt. That’s where third party testing comes in.

To that end, we’ve been having discussions with Intel to procure access to a Xeon 9200 system, however Intel is only offering a Linux system – and we do the bulk of our testing under Windows. Intel's reasons for only providing us Linux system access (rather than Windows) are varied, but mostly revolve around the fact that the customers for which this product is aimed towards typically aren't Windows focused (as Windows isn't used with very high core count systems very often), which makes Intel rather hesitant to help with any non-Linux testing, as they want Xeon 9200 seen in the best possible light. Ultimately, we would love to have a best-vs-best shootout: 2x Xeon 9282 against 2x AMD EPYC 7H12, but as this issue isn’t likely to go away any time soon, we’ll need to refine our Linux test suite before we can do that.

Comments Locked

99 Comments

View All Comments

  • schujj07 - Thursday, January 2, 2020 - link

    "Still i want to remember that right now Intel is the absolute dominator of 32 core class SKUs, better suited to manage hard tasks with a large memory footprint"
    Intel outsells AMD in server CPUs that everyone knows. However, performance wise the CPUs trade blows with each other when at equal core counts. Epyc 7601, 32c/64t, was slower than Intel 8180, 28c/56t, in most applications. Now the 7402, 24c/48t, is equal in performance to the 8268, 24c/48t, but the 7502, 32c/64t is far faster than the 8280, 28c/56t, except in the small subsets of applications that use AVX512. Both AMD CPUs have about a 15% base clock speed disadvantage so the only way to make up for that loss of speed is by being faster per clock.
  • Korguz - Thursday, January 2, 2020 - link

    " i have only said the core IPC of Skylake Xeon is better than Zen 2 " to bad it DOESN'T have better IPC then zen 2 clock for clock, its SLOWER, it ONLY gets its performance, because it is clocked higher. when will you understand this ??

    " Anyway your post is too long and with few relevance." like your posts are just intel irrelevance??
  • AshlayW - Saturday, January 18, 2020 - link

    You are mistaken. Intel has a better memory controller in terms of raw latency, due to Zen2 chiplets design. That's it. Zen2 makes up for it with a bigger L3 cache. Core wise, Zen2 is wider, has a bigger uOp cache, a better branch predictor, better cache and memory parallelism, etc. Intel's only advantage in core performance is clock speed. Jeese, after reading your comments it's clear to me you're utterly delusional and/or financially invested in Intel stock. You have zero credibility and only seem to promote intel on every article, whilst attacking AMD products.

    It must really hurt you that Intel's skylake architecture is getting so utterly beaten in essentially all metric, yes?
  • martinw - Thursday, January 2, 2020 - link

    SPEC is not really the gold standard - it is a benchmark which is heavily optimized for by Intel using their own compiler. Few companies use the Intel compiler in the real world, so results based on Intel reported SPEC is not really that representative of real world performance.
  • Korguz - Thursday, January 2, 2020 - link

    martinw , gondalf doesn't care, he only cares about putting intel in the good light.. at all costs.
  • AshlayW - Saturday, January 18, 2020 - link

    I'm going to feature your comments on my website, I'm always looking to expose misinformation spreaders; especially those that zealously support disgusting, anti competitive and anti-consumer companies like Intel.
  • AshlayW - Saturday, January 18, 2020 - link

    Your misinformation spreading is disgusting. Stop it. Large L3 is part of the Zen2 architecture. Overall IPC is higher than Skylake, all included.

    Furthermore, your single threaded argument is grasping at straws and completely moot for almost all HPC workloads.

    Stop embarrassing yourself, hon.
  • Ghan - Thursday, January 2, 2020 - link

    Not to mention that AMD is providing this level of performance in a much more manageable power envelope due to their more efficient process node. And it's also worth noting that AMD's platform as a whole is much more capable. Where the Cascade Lake-AP line only has PCIe 3.0 (and a mere 40 lanes at that), AMD's 7H12 brings 128 lanes of PCIe 4.0 to the table.
  • prisonerX - Thursday, January 2, 2020 - link

    You seem to think that Epyc processors are for playing (single threaded) games. Four cores should be enough for anyone, right!? Well, at least Intel insisted that for a decade, until AMD ate its lunch. Now dumb Intel shills like you are trying to push that fiction along with the 20% difference lie. Shill harder, rube.
  • R7 - Thursday, January 2, 2020 - link

    EPYC Rome has all but closed the ST gap. Plus Intel has been hit hard by their security issues. That also means ST perf loss. Plus the TCO argument. You can get a 64c/128t Rome that uses 250-280W and is in mass production.

    Where as Intel's Xeon 9200 seems to be made for order only. Has 8c/16t less. Is BGA not LGA and uses 400W all while costing ~3 times as much as AMD's part.

    Even if this mythical unicorn exists somewhere and has higher performance than AMD's system the TCO argument blows it out of the water. For the same money you can get 2! AMD's top end EPYC Rome CPU's and have leftover for a 32c/64t one too. Plus an overall faster and more scaleable platform that is easier to deploy, maintain and upgrade.

    If i was running servers the only way i would consider Intel is if they gave away their CPU's at at half the price. Even that would be a hard sell for me.

Log in

Don't have an account? Sign up now