64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review

Name: 64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review
Item: 64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review
Author: Dr. Ian Cutress

by Dr. Ian Cutress on February 9, 2021 9:00 AM EST

118 Comments | Add A Comment

118 Comments

CPU Tests: Simulation

Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.

DigiCortex v1.35: link

DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.

The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.

The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.

For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.

(3-1) DigiCortex 1.35 (32k Neuron, 1.8B Synapse)

This test prefers monolithic silicon with proportionally lots of memory bandwidth, which means that we get somewhat of an equalling of results here. The top result in our benchmark database is actually single chiplet Ryzen.

Dwarf Fortress 0.44.12: Link

Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.

Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.

For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:

Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts

DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.

(3-2a) Dwarf Fortress 0.44.12 World Gen 65x65, 250 Yr (3-2b) Dwarf Fortress 0.44.12 World Gen 129x129, 550 Yr (3-2c) Dwarf Fortress 0.44.12 World Gen 257x257, 550 Yr

Dwarf Fortress is mainly single-thread limiting, hence the 64-core models at the back end of the queue. The TR parts are still a good bit faster than the EPYC.

Dolphin v5.0 Emulation: Link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.

(3-3) Dolphin 5.0 Render Test

Similarly here, single thread performance matters.

CPU Tests: Office and Science Conclusions: Zippy

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

118 Comments

View All Comments

Fellovv - Tuesday, February 9, 2021 - link
Agreed— picked up a p620 with 16c for $2500, could have gotten it for lower from Lenovo if they didn’t have weeks of lead time. Ian- you may see Lenovo discounts all the crazy prices about 50% all year, and sometimes there are Honey coupons to knock off hundreds more.
I have read that the 16c 2 CCX 3955WX May only get 4 channel RAM, not the full 8. I may be able to confirm in the near future. Gracias for the fine and thorough review. My only request is to ensure the TR 3990 is included in every graph— it was MIA or AWOL in several. I went with they TR Pro for the RAM and PCIe 4 lanes. Seeing the results confirms it was a good choice for me. Can’t wait for the Zen3!
realbabilu - Tuesday, February 9, 2021 - link
Nice 👍 about mkl, how about blis and open las,.did it suffer high multi core problem
MonkeyMan73 - Wednesday, February 10, 2021 - link
AMD has the performance crown in most scenarios, but it comes at an extremely high price point. Might not be worth this kind of money even for most extreme power user. Maybe get a dual core Xeon? Might be cheaper.

BTW, your las pic of this review is definitly not an OPPO Reno 2 :)
MonkeyMan73 - Wednesday, February 10, 2021 - link
Apologies, not a Dual core Xeon, that will not cut it but meant a Dual Socket Xeon setup.
Oxford Guy - Wednesday, February 10, 2021 - link
The worst aspect of the price-to-performance is that it’s using outdated tech rather than Zen 3.
MonkeyMan73 - Sunday, February 28, 2021 - link
Correct, there is always some sort of trade-off.
Greg13 - Wednesday, February 10, 2021 - link
I feel like you guys really need to get some more memory intensive workloads to test. So often in these Threadripper / Threadripper Pro / EPYC reviews, the consumer CPU (5950X in this case) is often faster or not far behind even on highly multithreaded applications. I do some pretty large thermal fluid system simulations in Simscape where by once a system is designed I use an optimisation algorithm to find the optimal operating parameters of the system. This involves running multiple simulations of the same model in parallel using Matlab Parallel computing toolbox along with their global optimisation toolbox. Last year I bought a 3950X and 128GB ram to do this, but as far as I can tell it is massivly memory bandwidth limited. It's also memory capacity limited too... Each simulation uses around 10GB ram each, so I generally only run 12 parallel workers to keep within the 128GB of ram. However, In terms of throughput I see barely any change when dropping down to 8 parallel workers, suggesting, I think that with 12 workers, it's massivly memory bandwidth limited. This also seems to be the case in terms of the CPU power, even with 12 workers going, the CPU power reported is pretty low, which leads me to think it's waiting for data from memory?

I assume that this would be better with Threadripper or even better with Threadripper Pro with their double and quadrouple memory bandwidth. However I don't have the funds to buy a selection of kit and test it to see if the extra cost is worth it. It would be good if you guys could add some more memory intensive tests to the suite (ideally for me some parallel Simscape simulations!) to show the benefit these extra memory channels (and capacity) offer.
Shmee - Wednesday, February 10, 2021 - link
Yeah I would wait for Zen 3 TR for sure. That said, this would only make sense as X570 has limited IO. It would be great to have a nice 16 core TR that had great OC capability and ST performance, was great in games, and did not have the IO limitation as X570. I really don't need all the cores, mainly I care about gaming, but the current gaming platforms just don't have the SATA and m.2 ports I would like. Extra memory bandwidth is also nice.
eastcoast_pete - Wednesday, February 10, 2021 - link
Thanks Ian! I really wanted one, until I saw the system price (: But, for what these proTRs can do, a price many are willing and able to pay.
Also, as it almost always comes up in discussions of AMD vs Intel workstation processors: could you write a backgrounder on what AVX is/is used for, and how open or open source extensions like AVX512 really are? My understanding is that much of this is proprietary to Intel, but are those AVX512 extensions available to AMD, or do they have to engineer around it?
kgardas - Wednesday, February 10, 2021 - link
avx512 is instruction set implemented and invented by Intel. Currently available in TigerLake laptops and Xeon W desktops plus of course server Xeons. Previous generation was AVX2 and generation before AVX. AVX comes with Intel's SandyBridge cores 9 years ago IIRC. AVX2 with Haswell.
Due to various reasons IIRC AMD and Intel cross-licensed their instruction sets years ago. Intel needed AMD's AMD64 to compete. Not sure if the part of the deal is also future extensions, but I would guess so since AMD since that time implemented both AVX and AVX2. Currently AMD sees no big pressure from Intel hence I guess is not enough motivated to implement avx512. Once it is, I guess we will see AMD chips with avx512 too.

64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review

CPU Tests: Simulation

DigiCortex v1.35: link

Dwarf Fortress 0.44.12: Link

Dolphin v5.0 Emulation: Link

Post Your Comment

118 Comments

View All Comments

Fellovv - Tuesday, February 9, 2021 - link

realbabilu - Tuesday, February 9, 2021 - link

MonkeyMan73 - Wednesday, February 10, 2021 - link

MonkeyMan73 - Wednesday, February 10, 2021 - link

Oxford Guy - Wednesday, February 10, 2021 - link

MonkeyMan73 - Sunday, February 28, 2021 - link

Greg13 - Wednesday, February 10, 2021 - link

Shmee - Wednesday, February 10, 2021 - link

eastcoast_pete - Wednesday, February 10, 2021 - link

kgardas - Wednesday, February 10, 2021 - link

Log in

Don't have an account? Sign up now