CPU Tests: Simulation

Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.

DigiCortex v1.35: link

DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.

The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.

The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.

For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.

(3-1) DigiCortex 1.35 (32k Neuron, 1.8B Synapse)

This test prefers monolithic silicon with proportionally lots of memory bandwidth, which means that we get somewhat of an equalling of results here. The top result in our benchmark database is actually single chiplet Ryzen.

Dwarf Fortress 0.44.12: Link

Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.

Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.

For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:

  • Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
  • Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
  • Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts

DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.

(3-2a) Dwarf Fortress 0.44.12 World Gen 65x65, 250 Yr(3-2b) Dwarf Fortress 0.44.12 World Gen 129x129, 550 Yr(3-2c) Dwarf Fortress 0.44.12 World Gen 257x257, 550 Yr

Dwarf Fortress is mainly single-thread limiting, hence the 64-core models at the back end of the queue. The TR parts are still a good bit faster than the EPYC.

Dolphin v5.0 Emulation: Link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.

(3-3) Dolphin 5.0 Render Test

Similarly here, single thread performance matters.

CPU Tests: Office and Science Conclusions: Zippy
Comments Locked

118 Comments

View All Comments

  • Oxford Guy - Friday, February 12, 2021 - link

    Bulldozer was indeed particularly awful. Abstract names like Xeon are generally less annoying than misapplied real-world names.
  • Qasar - Friday, February 12, 2021 - link

    too bad bulldozer and netburst were code names for the architecture, and not marketing names like xeon and threadripper.
  • GeoffreyA - Saturday, February 13, 2021 - link

    You're right, but even FX and Phenom were in poorer taste than Athlon, which was sheer gold, I'd say. Is Threadripper good or bad as a name? What I say, or anyone else here says, doesn't matter. Only from a survey can we get a better picture, and even there it's a reflection of popular opinion, a blunt instrument, often misled by the times.

    Is there a standard of excellence, a mode of naming so tuned to the genius of the language that it never changes? It's evident to everyone that "Interstellar" sounds better than "Invisible Invaders from Outer Space," but we could be wrong and time only can decide the matter. If, in 500 years, people still get a shiver when they hear Interstellar, we'll know that Nolan named his film right.

    Back to the topic. I think the spirit of Oxford Guy's comment was: TR and Epyc aren't that good names (which I partly agree with). Whether it inspires confidence in professionals is a different matter. A professional might be an expert in their field but it doesn't mean they're an expert on good names (and I'm not claiming I am one either). It matters little: if the target demographic buys, AMD's bank account smiles. But it's a fair question to ask, apart from sales, is a name good or bad? Which were the best? Does it sound beautiful?Names, in themselves, are pleasurable to many people.
  • jospoortvliet - Saturday, February 13, 2021 - link

    Names should have a few properties if they are to be good.
    Easy to pronounce (cross-culturally!)
    Easy to remember (distinctive)
    Not (too) silly/funny
    Bonus: have some (clever) relation to the actual product.

    Threadripper certainly earns the bonus but might arguably perhaps maybe lose out on the 3rd ‘silly’ point. However, in that regards i would argue it makes a difference how well it fulfills that rather ambitious title, and as we all know the answer is “very well”. Now if threadripper was a mediocre product, not at all living up to its name, I’d judge different but as it stands I would say it is a brilliant name.
  • GeoffreyA - Saturday, February 13, 2021 - link

    Good breakdown that, to test names against. Simplicity, too, wins the day.
  • GeoffreyA - Saturday, February 13, 2021 - link

    "Bulldozer was indeed particularly awful"

    One of the worst. AMD's place names were good in the K8 era, and the painter ones are doing a good job too.
  • danjw - Saturday, February 13, 2021 - link

    You may not be aware of this, but Threadripper, is actually comes from the 80's fashion fad of ripped clothing. ;-)
  • GeoffreyA - Saturday, February 13, 2021 - link

    Well, I like it even more then, being a fan of the 80s.
  • Hulk - Tuesday, February 9, 2021 - link

    Is the difference in output quality strictly due to rounding/numerical errors when using GPU vs CPU or are there differences in the computational algorithms that calculate the numbers?
  • Kjella - Tuesday, February 9, 2021 - link

    Not in terms of color accuracy, not a problem making 10/12 bit non-linear color from 32 bit linear - even 16 bit is probably near perfect. But for physics engines, ray tracing etc. errors can compound a lot - imagine a beam of light hitting a reflective surface where fractions of a degree means the light bounces in a completely different direction. Or you're modelling a long chain of events that cause compounding errors, or the sum of a million small effects or whatever. But it can also just be that the algorithms are a bit "lazy" and expect everything to fit because they got 64 bits to play with. I doubt that much needs ~1.0000000001 precision, much less ~1.0000000000000000001.

Log in

Don't have an account? Sign up now