The CPU Overload 2020 Suite

Our new CPU tests go through a number of main areas. We cover Web tests using our un-updateable version of Chromium, opening tricky PDFs, emulation, brain simulation, AI, 2D image to 3D model conversion, rendering (ray tracing, modeling), encoding (compression, AES, video and HEVC), office based tests, and our legacy tests (throwbacks from another generation of code but interesting to compare). Over the next few pages we’ll go over the high level of each test.

However, as mentioned in passing on the previous page, we run a number of registry edit commands again to ensure that various system features are turned off and disabled at the start of the benchmark suite. This includes disabling Cortana, disabling the GameDVR functionality, disabling Windows Error Reporting, disabling Windows Defender as much as possible again, disabling updates, and re-implementing power options and removing OneDrive, in-case it sprouted wings again.

A number of these tests have been requested by our readers, and we’ve split our tests into a few more categories than normal as our readers have been requesting specific focal tests for their workloads. A recent run on a Core i5-10600K, just for the CPU tests alone, took around 20 hours to complete.

Power

  • Peak Power (y-Cruncher using latest AVX)
  • Per-Core Loading Power using POV-Ray

Office

  • Agisoft Photoscan 1.3: 2D to 3D Conversion
  • Application Loading Time: GIMP 2.10.18 from a fresh install
  • Compile Testing (WIP)

Science

  • 3D Particle Movement v2.1 (Non-AVX + AVX2/AVX512)
  • y-Cruncher 0.78.9506 (Optimized Binary Splitting Compute for mathematical constants)
  • NAMD 2.13: Nanoscale Molecular Dynamics on ApoA1 protein
  • AI Benchmark 0.1.2 using TensorFlow (unoptimized for Windows)

Simulation

  • Digicortex 1.35: Brain stimulation simulation
  • Dwarf Fortress 0.44.12: Fantasy world creation and time passage
  • Dolphin 5.0: Ray Tracing rendering test for Wii emulator

Rendering

  • Blender 2.83 LTS: Popular rendering program, using PartyTug frame render
  • Corona 1.3: Ray Tracing Benchmark
  • Crysis CPU-Only: Can it run Crysis? What, on just the CPU at 1080p? Sure
  • POV-Ray 3.7.1: Another Ray Tracing Test
  • V-Ray: Another popular renderer
  • CineBench R20: Cinema4D Rendering engine

Encoding

  • Handbrake 1.32: Popular Transcoding tool
  • 7-Zip: Open source compression software
  • AES Encoding: Instruction accelerated encoding
  • WinRAR 5.90: Popular compression tool

Legacy

  • CineBench R10
  • CineBench R11.5
  • CineBench R15
  • 3DPM v1: Naïve version of 3DPM v2.1 with no acceleration
  • X264 HD3.0: Vintage transcoding benchmark

Web

  • Kraken 1.1: Depreciated web test with no successor
  • Octane 2.0: More comprehensive test (but also deprecated with no successor)
  • Speedometer 2: List-based web-test with different frameworks

Synthetic

  • Geekbench 4
  • AIDA Memory Bandwidth
  • Linux OpenSSL Speed (rsa2048 sign/verify, sha256, md5)
  • LinX 0.9.5 LINPACK

SPEC (Estimated)

  • SPEC2006 rate-1T
  • SPEC2017 rate-1T
  • SPEC2017 rate-nT

It should be noted that due to the terms of the SPEC license, because our benchmark results are not vetted directly by the SPEC consortium, we have to label them as ‘estimated’. The benchmark is still run and we get results out, but those results have to have the ‘estimated’ label.

Others

  • A full x86 instruction throughput/latency analysis
  • Core-to-Core Latency
  • Cache-to-DRAM Latency
  • Frequency Ramping
  • A y-cruncher ‘sprint’ to see how 0.78.9506 scales will increasing digit compute

Some of these tests also have AIDA power wrappers around them in order to provide an insight in the way the power is reported through the test.

2020 CPU Gaming (GPU) Benchmarks

For our new set of CPU Gaming tests, we wanted to think big. There are a lot of users in the ecosystem that prioritize gaming above all else, especially when it comes to choosing the correct CPU. If there is a chance to save $50 and get a better graphics card for no loss in performance from the CPU, then this is the route that gamers would prefer to tread. The angle here though is tough - lots of games have different requirements and cause different stresses on a system, with various graphics cards having different reactions to the code flow of a game. Then users also have different resolutions and different perceptions of what feels 'normal'. This all amounts to more degrees of freedom than we could hope to test in a lifetime, only for the data to become irrelevant in a few months when a new game or new GPU comes into the mix. Just for good measure, let us add in DirectX 12 titles that make it easier to use more CPU cores in a game to enhance fidelity.

When it comes down to gaming tests, some of the same rules apply to the CPU tests. If we can get standalone versions of tests, then perfect – even better if they will never update, because that gives us a consistent codebase to work with. However, given the nature of Steam or Origin or the EPIC Store, having a consistent code base is not always possible. So for our gaming tests, for those that we could find with offline DRM-free variants (such as those from GOG), we used those instead. Otherwise we rely on Steam for the most part, because it is the only store front that offers an external API to allow us to check if an account is online – and thus a single account to be used across multiple systems. When scaling out automation, it can be difficult when there are multiple accounts to deal with, so as we aim for fewer than 10 systems running simultaneously, one account is enough.

I could speak for a few days about the gripes of automating gaming benchmarks – the ones that do it well compared to the ones that have no consideration for the others that want to use an in-game benchmark repeatedly. There’s also the discussion for in-game benchmarks vs native benchmarks, which I’ve had many times with colleagues and peers, that I might go into depth sometime. But I have thrown benchmark titles out for the stupidest things – updates that cause *new* splash screens is why I’ve cut games like AoTS and Civ6 in the past. Or Ubisoft games that offer benchmark modes that do not output benchmark results files. Or those files that create HTML files that need to be pruned for the correct data, rather than a simple text file. Or shall we go into games that have their settings not as simple ini files, but are embedded in the registry !?! Total War gets thrown out for not allowing key presses in its menus, and then having cheat detection when you try to emulate mouse movements. I have, on multiple occasions, spent a day of work trying to code for a game that just doesn’t want to work – as a result, it gets thrown out of our benchmark suite.

In the past, we’ve tackled the GPU benchmark set in several different ways. We’ve had one GPU to multiple games at one resolution, or multiple GPUs take a few games at one resolution, then as the automation progressed into something better, multiple GPUs take a few games at several resolutions. However, based on feedback, having the best GPU we can get hold of over a dozen games at several resolutions seems to be the best bet.

Normally securing GPUs for this testing is difficult, as we need several identical models for concurrent testing, and very rarely is a GPU manufacturer, or one of its OEM partners, happy to hand me 3-4+ of the latest and greatest. In that aspect, over the years, I have to thank ECS for sending us four GTX 580s in 2012, MSI for sending us three GTX 770 Lightnings in 2014, Sapphire for sending us multiple RX 480s and R9 Fury X cards in 2016, and in our last test suite, MSI for sending us three GTX 1080 Gaming cards in 2018.

For our testing on the 2020 suite, we have secured three RTX 2080 Ti GPUs direct from NVIDIA. These GPUs have been optimized for with drivers and in gaming titles, and given how rare our updates are, we are thankful for getting the high-end hardware.  (It’s worth noting we won’t be updating to whatever RTX 3080 variant is coming out at some point for a while yet.)

On the topic of resolutions, this is something that has been hit and miss for us in the past. Some users state that they want to see the lowest resolution and lowest fidelity options, because this puts the most strain on the CPU, such as a 480p Ultra Low setting. In the past we have found this unrealistic for all use cases, and even if it does give the best shot for a difference in results, the actual point where you come GPU limited might be at a higher resolution. In our last test suite, we went from the 720p Ultra Low up to 1080p Medium, 1440p High, and 4K Ultra settings. However, our most vocal readers hated it, because even by 1080p medium, we were GPU limited for the most part.

So to that end, the benchmarks this time round attempt to follow the basic patter where possible:

  1. Lowest Resolution with lowest scaling, Lowest Settings
  2. 2560x1440 with the lowest settings (1080p where not possible)
  3. 3840x2160 with the lowest settings
  4. 1920x1080 at the maximum settings

Point (1) should give the ultimate CPU limited scenario. We should see that lift as we move up through (2) 1440p and (3) 4K, with 4K low still being quite strenuous in some titles.

Point (4) is essentially our ‘real world’ test. The RTX 2080 Ti is overkill for 1080p Maximum, and we’ll see that most modern CPUs pull well over 60 FPS average in this scenario.

What will be interesting is that for some titles, 4K Low is more compute heavy than 1080p Maximum, and for other titles that relationship is reversed.

So we have the following benchmarks as part of our script, automated to the point of a one-button run and out pops the results approximately 10 hours later, per GPU. Also listed are the resolutions and settings used.

Offline Games

  1. Chernobylite, 360p Low, 1440p Low, 4K Low, 1080p Max
  2. Civilization 6, 480p Low, 1440p Low, 4K Low, 1080p Max
  3. Deus Ex: Mankind Divided, 600p Low, 1440p Low, 4K Low, 1080p Max
  4. Final Fantasy XIV: 768p Min, 1440p Min, 4K Min, 1080p Max
  5. Final Fantasy XV: 720p Standard, 1080p Standard, 4K Standard, 8K Standard
  6. World of Tanks enCore: 768p Min, 1080p Standard, 1080p Max, 4K Max

Online Games

  1. Borderlands 3, 360p VLow, 1440p VLow, 4K VLow, 1080p Badass
  2. F1 2019, 768p ULow, 1440p ULow, 4K ULow, 1080p Ultra
  3. Far Cry 5, 720p Low, 1440p Low, 4K Low, 1080p Ultra*
  4. Gears Tactics, 720p Low, 4K Low, 8K Low 1080p Ultra
  5. Grand Theft Auto 5, 720p Low, 1440p Low, 4K Low, 1080p Max
  6. Red Dead Redemption 2, 384p Min, 1440p Min, 4K Min, 1080p Max
  7. Strange Brigade DX12, 720p Low, 1440p Low, 4K Low, 1080p Ultra
  8. Strange Brigade Vulkan, 720p Low, 1440p Low, 4K Low, 1080p Ultra

For each of the games in our testing, we take the frame times where we can (the two that we cannot are Chernobylite and FFXIV). For these games, at each resolution/setting combination, we run them for as many loops in a given time limit (often 10 minutes per resolution). Results are then taken as average frame rates and 95th percentiles.

Some of the games are ultimately still being evaluated for usefulness, and may eventually be dropped – Far Cry 5 has taken more time than I care to admit to get to work. Some of these titles require the exact CPU/GPU combination to be part of the settings files otherwise the settings file will be discarded, which gets ever increasingly frustrating.

*Update 7/20 : I recently found that Far Cry 5 has additional requirements regarding monitor resolution support. If the settings file requests a resolution that it can’t detect in the monitor on the test bed, then it defaults to 1080p. My test beds contain two brands of 4K monitor – Dell UP2415Qs and cheap 27-inch TN displays, in a 50:50 split. For whatever reason, FC5 doesn’t really like any resolution changes on the Dell monitors. I can adjust the resolution scale (0.5x-2.0x) for this game, and quality, but I only found this out on 7/20, which means we have to rerun chips for this data.

If there are any game developers out there involved with any of the benchmarks above, please get in touch at ian@anandtech.com. I have a list of requests to make benchmarking your title easier!

The other angle is DRM, and some titles have limits of 5 systems per day. This may limit our testing in some cases; in other cases it is solvable.

OS Preparation and Benchmark Installation CPU Tests: Office
Comments Locked

110 Comments

View All Comments

  • DiHydro - Monday, July 20, 2020 - link

    This is epic. Thank you for doing this.
  • DiHydro - Monday, July 20, 2020 - link

    To add a note: I think the ~$300 CPU year-over-year performance would be an interesting metric to see. That price point seems to be pretty popular for enthusiasts, and seeing back 5-6 years how that performance has increased per dollar would be neat.
  • bldr - Monday, July 20, 2020 - link

    Agree!
  • close - Monday, July 20, 2020 - link

    It will be especially interesting to see those CPUs (the popular mainstream ones) tested now and compared to the numbers they got originally to see how much they lost with all the recent mitigations.
  • close - Tuesday, July 21, 2020 - link

    Oh, because I forgot previously, congratulations and good luck with the endeavor! I got exhausted only by reading about the work you're going to have to do
  • Fozzie - Monday, July 20, 2020 - link

    Except keep in mind that adjusted for inflation $200 in the year 2000 is worth over $300 now.

    You'd either be making a chart of the increased value over time just due to inflation or in fact the every increasing value at the $300 price point due to the reduced value of the Dollar on top of whatever performance gains occurred.
  • biosstar - Friday, July 24, 2020 - link

    You could also use the value of a dollar in a certain year (let's say 2020) and compare the processors in the inflation adjusted equal categories.
  • PeterCollier - Monday, July 20, 2020 - link

    What's the point of this Geekbench/Userbenchmark knockoff? I've never used AT's Bench tool. Especially not for smartphones, since the Bench tool is about 5 years out of date.
  • BushLin - Monday, July 20, 2020 - link

    A controlled environment across all tests is reason enough. Even if I don't agree with AT policy on what speed they allow RAM to operate, it is a fair comparison.
  • Byte - Monday, July 20, 2020 - link

    RAM is a really important topic. I think at this point in time, we can reasonable put almost maxed out ram for every platform. Like DDR3 can run at 2133, DDR4 we can run it at 3200 as prices are so close.
    It is like rating sports cars but all have Goodride tires on them.
    A dodge viper was a widowmaker when it came out. Today with a good set of summers like PS4S or PZero, you will have a hard time slipping even if you tried.

Log in

Don't have an account? Sign up now