Comments Locked

65 Comments

Back to Article

  • andrewaggb - Tuesday, March 3, 2020 - link

    It would definitely be interesting to see if it's actually faster than an epyc 7742 in practical workloads. I don't believe there is currently any demand for ARM based virtual machines, docker hosts, etc and I feel like that's unlikely to change in the near future. They mention databases, transcoding, storage, web apps, search, and ai inference so that's a pretty big market of services.
  • wanderer66 - Tuesday, March 3, 2020 - link

    Taken across the broad swath of cloud-scale, edge-compute (5g and beyond will require far more compute at a cell site, for example), and the myriad other applications where consistent performance at scale is required, and not necessarily peak performance, this will find traction in the right workloads.
  • yeeeeman - Tuesday, March 3, 2020 - link

    They don't develop products unless they have done a market study and found out these will sell.
  • Ratman6161 - Tuesday, March 3, 2020 - link

    Companies develop products that the think will sell. But weather they actually sell is another matter. Sometimes even being a better product (not saying this is...I have no idea) isn't enough.
    I still remember the days of VHS vs Betamax tapes. I knew people at the time that argued this very aggressively. but in the end it didn't matter which was better. They were both usable and the one with the bigger market share won out. People were more interested in having just one kind of tape with the widest available library of movies. No one really cared by that point which one was better as either one could do the job.
  • Foeketijn - Tuesday, March 3, 2020 - link

    Note, the one on which porn0 was sold and copied won. It was that simple.
  • ArcadeEngineer - Tuesday, March 3, 2020 - link

    That's an urban myth, Beta mostly died before prerecorded tapes were even a thing.
  • Father Time - Tuesday, March 3, 2020 - link

    You guys are all completely wrong.

    VHS could record 4 hours in the US version (an entire football match)
    VHS could be set to record multiple shows on a timer to take advantage of that run time

    Betamax was a product by engineers, for engineers, and it didn't solve the problems the market demanded.

    This is the story, watch and be enlightened: https://youtu.be/FyKRubB5N60
  • wlee15 - Tuesday, March 3, 2020 - link

    Betamax may have failed as a consumer product, but its professional cousin Betacam line went on to dominate the tv industry for about 2 decades.
  • ProDigit - Wednesday, March 4, 2020 - link

    I would but one, provided I had the money for it.
    It would be nice to run a GPU server from this.
  • Julia Samantha - Friday, March 6, 2020 - link

    We help to get best Opt Student Jobs . You will be paid good salary . Working environment is very nice . Opt student jobs seeker can learn new marketing trends here. By working with senior employees , you can gain experience which is very important for Opt Student Jobs in USA.https://www.anjaneyap.io/site/
  • Reflex - Wednesday, March 4, 2020 - link

    There are far more failed products than successful ones.
  • Cogman - Tuesday, March 3, 2020 - link

    What I see with this is very high core counts for cheap. Where that makes a lot of sense (IMO) is the case of FAAS and ultra light weight services.

    There's plenty of market for super cheap CPUs that can run a lot of things in parallel. Cloud providers are swimming with those sorts of opportunities. (see aws t3.nano instances or aws lambda)
  • eek2121 - Tuesday, March 3, 2020 - link

    Doesn't matter if it's low IPC.
  • braxster - Wednesday, March 4, 2020 - link

    What makes you think that the IPC is low?
  • ProDigit - Wednesday, March 4, 2020 - link

    Pcie 4.0, 80 cores at 3Ghz, 100+ pcie Lanes,
    Nothing about it sounds cheap to me.
  • name99 - Tuesday, March 3, 2020 - link

    Every day there’s a new Graviton success story.
    Today’s: https://docs.keydb.dev/blog/2020/03/02/blog-post/
    And those successes transfer to Ampere.
  • eek2121 - Tuesday, March 3, 2020 - link

    It won't be. That's the issue. Read between the marketing lines. ARM, when scaled up, has equal perf/watt to anything x86 has. x86 can scale down to ARM levels as well. However, Neither Intel nor AMD have the desire to tackle insanely low margin, high volume products.

    The giveaways here are the numbers they do provide:

    "These include 2.23x the performance on SPEC2017_int rate over a single 28-core Intel Xeon Platinum 8280, and 1.04x over a single 64-core AMD EPYC 7742"

    24% more cores (forget the threads for a moment) and you only get 4% more performance?

    I try to have at least a bit of faith in benchmarks provided by the manufacturer, but the numbers they provided Anandtech are off the rails.
  • The_Assimilator - Wednesday, March 4, 2020 - link

    Bro, you're SOOO wrong, Arm on severs is happening any day now, just like the year of the Linmx desktop. /s
  • CiccioB - Wednesday, March 4, 2020 - link

    <quote>x86 can scale down to ARM levels as well.</quote>
    Experience does not support this statement, as Intel's Atom chips, precisely created to go against ARM chips in mobile system (where power consumption is factor n.1) did not manage to get to the same power/watt number despite being produced on a more advanced PP.

    You may be impressed by the fact that this chip single core size is about 1/8 of the Zen one against which it has been tested. Yet, the Zen core may have much more performances in many tasks that this chip cannot even think to afford, but the net result for the work that this chip has been designed for is that it is equal than an EPYC and probably much more better in AI tasks where AMD has still not added a single dedicated instruction to take care of that part of the market.
  • HStewart - Wednesday, March 4, 2020 - link

    Yes I would agree that actually workloads would be interested especially with more complex operations than synthetic bench marks - does not matter which x86 processor AMD on Intel on this test - that is different story.

    Maybe for some limited servers stuff this made be good option - but for workstation it would not be and serious doubt it would be good on Virtual x86 machine.
  • ikjadoon - Tuesday, March 3, 2020 - link

    God, the dizzying number of "A" names.

    1) Amazon bought Annapurna (silicon maker) & released ARM's N1-based CPUs = Graviton2 CPUs

    2) Ampere (silicon maker) bought Applied Micro (silicon maker) & released ARM's N1-based CPUs = Altra CPUs
  • valinor89 - Tuesday, March 3, 2020 - link

    Guess it is an advantage to be at the top of the list on the stock exchanges. If nothing else more people will see AAPL than XRX to put an example.
  • shabby - Tuesday, March 3, 2020 - link

    Yup just like those pesky aaaa1 towing places in the yellow pages...
  • s.yu - Tuesday, March 3, 2020 - link

    Which is exactly how Amazon got its name.
  • eastcoast_pete - Tuesday, March 3, 2020 - link

    I guess none of them wants to lead the "B" list (:
  • Drumsticks - Tuesday, March 3, 2020 - link

    We've seen so many of these ARM Server designs pop up with lots of great implications that never follow through. Here's hoping Ampere has some long term success.
  • wanderer66 - Tuesday, March 3, 2020 - link

    They're having more success than you might expect, but most is in high-scale datacenters today, and are competitive designs, not public.
  • Threska - Tuesday, March 3, 2020 - link

    Better chance where one has total control of the software and the hardware. Especially true when the bulk is open-source.
  • wanderer66 - Thursday, March 5, 2020 - link

    More or less correct: these have the most success where the economies of scale and control of usage model works. Think AWS, Google Cloud, Microsoft Azure, etc, where hundreds of millions of cores of all types are deployed yearly.
  • hescominsoon - Tuesday, March 3, 2020 - link

    I am curious about power consumption. rarely can you throw together 80 cores with decent performance without using a good amount of power.
  • hescominsoon - Tuesday, March 3, 2020 - link

    ok found it's a 210W TDP. Now lets see how it compares to similar TDP cpus..:)
  • Ian Cutress - Tuesday, March 3, 2020 - link

    It says this in the article?
  • webdoctors - Tuesday, March 3, 2020 - link

    So much text and slides, why not just a few lines saying how it compares in perf to a AMD or Intel or Apple CPU running SPEC2K6 or some common benchmark?

    The specRATE 2017 int per rack is sooooo vague, do a per core count to make it easy to understand.
  • Wilco1 - Tuesday, March 3, 2020 - link

    Did you not read the article? It says it beats AMDs fastest 64-core Rome by 4% on SPECINT_rate 2017. Or https://www.anandtech.com/Gallery/Album/7519#15

    This is an impressive result given it does it with 80 threads rather than 128. Also, like Graviton 2, it uses a fraction of the cache and silicon area of Rome to achieve this performance.
  • name99 - Tuesday, March 3, 2020 - link

    It’s not impressive, it’s exactly as expected.
    Their cores are more or less the same IPC as AMD, running at more or less the same frequency (~3GHz). SMT is worth a quarter of a core, not a whole core.
    So 64x1.25=80 .. as expected.
  • Wilco1 - Tuesday, March 3, 2020 - link

    Sure - but it's still impressive. We're not talking about a low-end chip here, we're talking about a startup taking a standard Arm core and beating the fastest x86 server chip!
  • CiccioB - Wednesday, March 4, 2020 - link

    It's not impressive that a startup made a chip that is a powerful as the latest ultra advanced, biggest in (x86) core chip that has been ever made?
    Well, if so, AMD's EPYC is not impressive as well as a startup could glue 80 core instead than only 64 without going around speaking on how good, versatile, powerful (and power hungry) their interconnection bus is.

    So we'll wait for the next one piece of impressive silicon to positively comment on innovation and technology which probably will come when Intel will manage to get their MCM chips with their new interconnection buses.
  • name99 - Wednesday, March 4, 2020 - link

    I guess it's impressive if you've had your head in the sand for the past few years!

    I'm not saying ARM is cheating, or this chip sucks, or whatever, I'm saying that this is exactly what people like me expected!
    Ever since Apple started their relentless annual core improvements, followed by ARM always lagging about 2.5 to 3 years behind, it was obvious that this was going to happen.

    People like me were talking about ARM making it big in servers in 2020 five years ago. And it's happening, pretty much on the schedule expected, pretty much playing out as expected, pretty much attacking x86 on the fronts we expected. When you calculate the trajectory then the rocket follows it, it's nice to see that your calculations were correct, but impressive is not the right word.
  • deltaFx2 - Wednesday, March 4, 2020 - link

    You may want to read the STH article on how that 4% number is calculated: https://www.servethehome.com/ampere-altra-80-arm-c...

    They seem to have locked the CPU at 3.3 GHz (3.0 being their max published turbo, so that's single core turbo, at a tdp of 210W). The AMD part has a single core turbo of 3.4 GHz and a base of 2.25GHz, so in these tests, it's running in the 2.8GHz range (approx, guesstimate).

    But wait, there's more. They didn't actually measure the spec int rate score on their competitors, they just derated the published base score on aocc and icc by ~17% and 25% respectively (exact numbers in STH).

    Even with all this fudging, let's say the spec scores for Epyc and Ampere are the same. So, with 25% more cores, they achieve the same perf as Epyc. SMT yield is ~20-30% on a single core system, so if you turn off SMT on a fully loaded system, say you 25% (memory b/w effects means that SMT doesn't always help spec rate). So at best, a 3.3GHz locked ARM neoverse N1 equals a ~2.8GHz AMD Epyc. i.e. AMD still has higher IPC.

    I recall Mike Fillipo/ARM saying the cortex A76 should max out at 3.0GHz. Clever physical design and binning might get 3.3GHz, but that chip is operating in the inefficient part of the VF curve, i.e. power consumption will be horrible. Lets see if they actually put out a 3.3GHz all-core-turbo part and see what the power is. I doubt it will be good.
  • Wilco1 - Wednesday, March 4, 2020 - link

    I have read that article already of course. Direct measurement using identical compiler and options is preferable when possible, but if not, derating is an accepted practice in the industry. GCC is being optimized and the soon to be released GCC10 already shows significantly higher performance on SPEC, so derating may not be needed for much longer!

    Having a 3.3GHz bin does not seem unusual or impossibly power hungry. It should remain below the 300+W power EPYC 7742 draws at wall running integer code according to Phoronix.

    The AnandTech article about Rome showed it can run at least one benchmark at 3.2GHz with 128 threads, so I think you're underestimating Rome's average frequency. I don't think there is any data for SMT gains in SPEC2017, so it would be interesting to see results. My guess is that the N1 has higher overall IPC but throughput scales less due to the much smaller L3 cache.

    Whichever way you put it, a small startup showing server performance on par with Rome using a fraction of the silicon area and L3 cache is incredibly impressive.
  • deltaFx2 - Thursday, March 5, 2020 - link

    So wall power draw includes things like PSU, voltage regulator, DRAM, etc. Stuff outside of the CPU, that is. Ampere will have the same things so TDP is a reasonable comparison point for both until we get real numbers off a real system. Remember that once you go past a certain point, you need to raise voltage a lot to get little freq increase. ARM designs for a lower fmax which makes them efficient at lower frequencies but inefficient at high frequency: AMD/Intel likely need lower voltage to hit 3.3 GHz because fmax is much higher but is inefficient at lower frequencies. Point being, I doubt arm can be power efficient past 3 GHz. Indeed it should not playing that game as it gives up its biggest strength.

    As for derate, measurements are easier. Both systems are easily available. It's no excuse... where did that derating factor come from? Is there an industry standard conversation factor? Of course not.

    At least for AMD, single threaded IPC for a system running only one process may be less than N1 in some cases as there is only 16mb L3 pet ccx so N1 has more cache. Nobody runs a server like that though. Intel has a large unified cache so perhaps it may come ahead. We need real silicon to know.

    This startup was AMCC not that long ago and they have simply cobbled together someone else's IP. It doesn't take that much technical chops, certainly not like doing your own. AWS is already doing the same thing. If MS wants, it can do the same. So what's the big deal here?
  • Wilco1 - Thursday, March 5, 2020 - link

    Well clearly it no longer takes billions and 5-10 years to design a competitive server. Today one can license a server core from Arm, "cobble" something together on a budget and in 1-2 years beat the fastest servers on the market.

    "Cobbled" or not, AMD and Intel just lost the hyperscaler market. A big deal or not?
  • deltaFx2 - Thursday, March 5, 2020 - link

    'beat' is a very loose way of putting it. It's hardly a beat when your overclocked part ( 10% over your advertised single core turbo) barely keeps up with a crippled competitors score. Moreover, this is a paper launch. Intel and AMD system are available today. The fact that there needed to be so much dishonesty in metrics suggests to me that x86 market is safe for this neoverse generation.

    You are being naive if you believe that just being able to tape something out = success. Servers need volume. Volume aids binning, brings costs way down so as to recover nre costs, etc. AMDs chiplet design serves exactly that purpose: yields, yes but mainly volume into the desktop market allows them to harvest the best server chiplet. Which in turn means they can always undercut on price. List price for Intel/amd is meaningless because nobody actually pays that. Graviton at least has a captive market. An oem like Ampere has to make money.

    X86 has not yet lost the hyperscalar market. Not even close. Arm is now at a point where it doesn't suck anymore. Not the same as saying it offers a superior solution because it didn't just yet.
  • name99 - Tuesday, March 3, 2020 - link

    WHY?
    This SoC is designed for a particular job: massive throughput of mostly integer tasks. Benchmark it for that. It’s irrelevant how well or badly it handles single threaded or FP.

    If you really want, it’s mostly an A76, so look up the Kirin 980 numbers. Bottom line, it’s about 50% to 65% of an A13 single threaded, depending on the exact task.
  • Foeketijn - Tuesday, March 3, 2020 - link

    Thanks for making this kind of content!
  • Scipio Africanus - Tuesday, March 3, 2020 - link

    STH has a pretty good breakdown of the announcement. https://www.servethehome.com/ampere-altra-80-arm-c...
  • eastcoast_pete - Tuesday, March 3, 2020 - link

    Thanks Ian! Any word on which companies will deploy these? With it's own in-house design, I'd guess Amazon is out. Would be nice to know who the launch customers/partners are.
  • grrrgrrr - Tuesday, March 3, 2020 - link

    in 4 one. Run bots for mobile games.
  • Sivar - Tuesday, March 3, 2020 - link

    Did anyone else's mind briefly focus elsewhere with mention of the word, "Ampere"?
    I am normally more interested in server tech than GPU tech, but my 970 grows ever longer in the tooth.
  • Santoval - Tuesday, March 3, 2020 - link

    "Ampere didn’t provide similar numbers for SPEC2017_fp, because the company states that the SoC has been developed with INT workloads in mind."
    Translation : "Very low floating point performance".
  • Threska - Tuesday, March 3, 2020 - link

    Machine Learning.
  • The_Assimilator - Wednesday, March 4, 2020 - link

    Their "Processor Complex" slide specifically calls out fp16 for ML, yet the manufacturer doesn't provide any FP performance numbers. Translation: it's bad. Really, really bad. "I wouldn't use it even if I found it dumpster diving" bad.

    The wannabe Arm server vendors had an open window of opportunity when Intel was the only game in server town and set their pricing to exorbitant levels of greed as a consequence. Then AMD launched Zen, the cost of server x86 CPUs dropped back down to acceptable levels, and the window slammed firmly shut.

    We'll probably continue to see a few more in-house Arm server CPU designs from the cloud providers for their bottom-of-the-barrel pricing tiers, but nobody else is going to waste their money on putting trash Arm chips in your server when they can get a real x86 CPU that can actually do real work like floating point operations and AVX. I give Ampere another two years before the realities of the market relegate it to the annals of history; at best they can hope to be purchased by someone like Microsoft or Oracle.
  • CiccioB - Wednesday, March 4, 2020 - link

    You may be impressed by the number of jobs that do NOT require floating point or AVX calculations to have a useful result.

    This chip has been thought for cloud and edge work, not for scientific simulations.
    Wanting to have numbers for tasks they are not been thought for is stupid.
    Believing that the gross datacenter work is scientific simulations stupid.

    For how cheap AMD server chips are, they are not cheap or performant enough for certain jobs.
  • Wilco1 - Wednesday, March 4, 2020 - link

    Neoverse N1 has incredibly high FP performance for its tiny size, so I would not be surprised if someone will use these as HPC nodes, for transcoding or in renderfarms etc. Even if it doesn't beat Rome on FP, it should be very close. Higher density, better perf/Watt and perf/dollar than AMD are the major selling points.
  • CiccioB - Thursday, March 5, 2020 - link

    It depends if you can distribute efficiently the work to run on parallel cores, which means how fast are internal communication between cores and how good is memory access for each of them.
    As said, having lots of FPUs does not mean you are good at scientific simulations. You just score well at single core FP synthetic benchmarks.
  • bananaforscale - Wednesday, March 4, 2020 - link

    Am I imagining things or are they using superheroes as CPU names?
  • littlemoule - Wednesday, March 4, 2020 - link

    failed product with no software support. enough said.
  • GreenReaper - Sunday, March 8, 2020 - link

    I was running on an ARM server over four years ago; it has plenty of support for server operations:
    https://inkbunny.net/j/202221

    Whether it makes sense for hosts to deploy is another matter. The momentum here is with AMD, based on what I've seen recently with the cloud products at Oracle, Google, etc.
  • abufrejoval - Thursday, March 5, 2020 - link

    For me the main question remains open: Why is this chip economically viable?

    From what I understand we get performance on a similar scale as a Rome 64 core, but AMD went through hoops to break up the silicon into chiplets for many good reasons.

    Yet this seems very much a monolithic chip with plenty of I/O that won't scale well to 7nm: It can't be all that small, yet unless it is quite a bit smaller than the equivalent x86, why does it stand a chance?

    Is it perhaps a much better energy proportional compute performance curve under partial and bursty loads?

    Trading LLC SRAM area for cores seems pretty significant, too: It hints at workloads with predictable latencies, trading spin-up delay against peak performance.

    I am very hopeful a deeper dive into that will come, but until then I am quite puzzled.
  • Wilco1 - Thursday, March 5, 2020 - link

    A Neoverse N1 core with 1MB L2 is ~2.5x as dense as a Zen 2 core with 512KB L2. You can fit all 80 cores plus 32MB L3 in 2 chiplets! The similar Graviton 2 is roughly estimated as 340mm^2. That's less than one-third of the die area of Rome...
  • Julia Samantha - Friday, March 6, 2020 - link

    This is very informative. I want more article like it.
    To get Opt Student Jobs go here: https://www.anjaneyap.io/site/
  • chipoutsider - Saturday, March 7, 2020 - link

    I am interested in some information about the back-end implementation of Altra. How large is the area of the chip (die)? How many and what kind of IP did it use?
  • nukunukoo - Saturday, March 7, 2020 - link

    That's impressive and all- but can it do Crysis?

Log in

Don't have an account? Sign up now