Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010

Name: Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010
Item: Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010
Author: Dr. Ian Cutress

by Dr. Ian Cutress on July 20, 2020 1:30 PM EST

110 Comments | Add A Comment

110 Comments

The CPU Overload 2020 Suite

Our new CPU tests go through a number of main areas. We cover Web tests using our un-updateable version of Chromium, opening tricky PDFs, emulation, brain simulation, AI, 2D image to 3D model conversion, rendering (ray tracing, modeling), encoding (compression, AES, video and HEVC), office based tests, and our legacy tests (throwbacks from another generation of code but interesting to compare). Over the next few pages we’ll go over the high level of each test.

However, as mentioned in passing on the previous page, we run a number of registry edit commands again to ensure that various system features are turned off and disabled at the start of the benchmark suite. This includes disabling Cortana, disabling the GameDVR functionality, disabling Windows Error Reporting, disabling Windows Defender as much as possible again, disabling updates, and re-implementing power options and removing OneDrive, in-case it sprouted wings again.

A number of these tests have been requested by our readers, and we’ve split our tests into a few more categories than normal as our readers have been requesting specific focal tests for their workloads. A recent run on a Core i5-10600K, just for the CPU tests alone, took around 20 hours to complete.

Power

Peak Power (y-Cruncher using latest AVX)
Per-Core Loading Power using POV-Ray

Office

Agisoft Photoscan 1.3: 2D to 3D Conversion
Application Loading Time: GIMP 2.10.18 from a fresh install
Compile Testing (WIP)

Science

3D Particle Movement v2.1 (Non-AVX + AVX2/AVX512)
y-Cruncher 0.78.9506 (Optimized Binary Splitting Compute for mathematical constants)
NAMD 2.13: Nanoscale Molecular Dynamics on ApoA1 protein
AI Benchmark 0.1.2 using TensorFlow (unoptimized for Windows)

Simulation

Digicortex 1.35: Brain stimulation simulation
Dwarf Fortress 0.44.12: Fantasy world creation and time passage
Dolphin 5.0: Ray Tracing rendering test for Wii emulator

Rendering

Blender 2.83 LTS: Popular rendering program, using PartyTug frame render
Corona 1.3: Ray Tracing Benchmark
Crysis CPU-Only: Can it run Crysis? What, on just the CPU at 1080p? Sure
POV-Ray 3.7.1: Another Ray Tracing Test
V-Ray: Another popular renderer
CineBench R20: Cinema4D Rendering engine

Encoding

Handbrake 1.32: Popular Transcoding tool
7-Zip: Open source compression software
AES Encoding: Instruction accelerated encoding
WinRAR 5.90: Popular compression tool

Legacy

CineBench R10
CineBench R11.5
CineBench R15
3DPM v1: Naïve version of 3DPM v2.1 with no acceleration
X264 HD3.0: Vintage transcoding benchmark

Web

Kraken 1.1: Depreciated web test with no successor
Octane 2.0: More comprehensive test (but also deprecated with no successor)
Speedometer 2: List-based web-test with different frameworks

Synthetic

Geekbench 4
AIDA Memory Bandwidth
Linux OpenSSL Speed (rsa2048 sign/verify, sha256, md5)
LinX 0.9.5 LINPACK

SPEC (Estimated)

SPEC2006 rate-1T
SPEC2017 rate-1T
SPEC2017 rate-nT

It should be noted that due to the terms of the SPEC license, because our benchmark results are not vetted directly by the SPEC consortium, we have to label them as ‘estimated’. The benchmark is still run and we get results out, but those results have to have the ‘estimated’ label.

Others

A full x86 instruction throughput/latency analysis
Core-to-Core Latency
Cache-to-DRAM Latency
Frequency Ramping
A y-cruncher ‘sprint’ to see how 0.78.9506 scales will increasing digit compute

Some of these tests also have AIDA power wrappers around them in order to provide an insight in the way the power is reported through the test.

2020 CPU Gaming (GPU) Benchmarks

For our new set of CPU Gaming tests, we wanted to think big. There are a lot of users in the ecosystem that prioritize gaming above all else, especially when it comes to choosing the correct CPU. If there is a chance to save $50 and get a better graphics card for no loss in performance from the CPU, then this is the route that gamers would prefer to tread. The angle here though is tough - lots of games have different requirements and cause different stresses on a system, with various graphics cards having different reactions to the code flow of a game. Then users also have different resolutions and different perceptions of what feels 'normal'. This all amounts to more degrees of freedom than we could hope to test in a lifetime, only for the data to become irrelevant in a few months when a new game or new GPU comes into the mix. Just for good measure, let us add in DirectX 12 titles that make it easier to use more CPU cores in a game to enhance fidelity.

When it comes down to gaming tests, some of the same rules apply to the CPU tests. If we can get standalone versions of tests, then perfect – even better if they will never update, because that gives us a consistent codebase to work with. However, given the nature of Steam or Origin or the EPIC Store, having a consistent code base is not always possible. So for our gaming tests, for those that we could find with offline DRM-free variants (such as those from GOG), we used those instead. Otherwise we rely on Steam for the most part, because it is the only store front that offers an external API to allow us to check if an account is online – and thus a single account to be used across multiple systems. When scaling out automation, it can be difficult when there are multiple accounts to deal with, so as we aim for fewer than 10 systems running simultaneously, one account is enough.

I could speak for a few days about the gripes of automating gaming benchmarks – the ones that do it well compared to the ones that have no consideration for the others that want to use an in-game benchmark repeatedly. There’s also the discussion for in-game benchmarks vs native benchmarks, which I’ve had many times with colleagues and peers, that I might go into depth sometime. But I have thrown benchmark titles out for the stupidest things – updates that cause *new* splash screens is why I’ve cut games like AoTS and Civ6 in the past. Or Ubisoft games that offer benchmark modes that do not output benchmark results files. Or those files that create HTML files that need to be pruned for the correct data, rather than a simple text file. Or shall we go into games that have their settings not as simple ini files, but are embedded in the registry !?! Total War gets thrown out for not allowing key presses in its menus, and then having cheat detection when you try to emulate mouse movements. I have, on multiple occasions, spent a day of work trying to code for a game that just doesn’t want to work – as a result, it gets thrown out of our benchmark suite.

In the past, we’ve tackled the GPU benchmark set in several different ways. We’ve had one GPU to multiple games at one resolution, or multiple GPUs take a few games at one resolution, then as the automation progressed into something better, multiple GPUs take a few games at several resolutions. However, based on feedback, having the best GPU we can get hold of over a dozen games at several resolutions seems to be the best bet.

Normally securing GPUs for this testing is difficult, as we need several identical models for concurrent testing, and very rarely is a GPU manufacturer, or one of its OEM partners, happy to hand me 3-4+ of the latest and greatest. In that aspect, over the years, I have to thank ECS for sending us four GTX 580s in 2012, MSI for sending us three GTX 770 Lightnings in 2014, Sapphire for sending us multiple RX 480s and R9 Fury X cards in 2016, and in our last test suite, MSI for sending us three GTX 1080 Gaming cards in 2018.

For our testing on the 2020 suite, we have secured three RTX 2080 Ti GPUs direct from NVIDIA. These GPUs have been optimized for with drivers and in gaming titles, and given how rare our updates are, we are thankful for getting the high-end hardware. (It’s worth noting we won’t be updating to whatever RTX 3080 variant is coming out at some point for a while yet.)

On the topic of resolutions, this is something that has been hit and miss for us in the past. Some users state that they want to see the lowest resolution and lowest fidelity options, because this puts the most strain on the CPU, such as a 480p Ultra Low setting. In the past we have found this unrealistic for all use cases, and even if it does give the best shot for a difference in results, the actual point where you come GPU limited might be at a higher resolution. In our last test suite, we went from the 720p Ultra Low up to 1080p Medium, 1440p High, and 4K Ultra settings. However, our most vocal readers hated it, because even by 1080p medium, we were GPU limited for the most part.

So to that end, the benchmarks this time round attempt to follow the basic patter where possible:

Lowest Resolution with lowest scaling, Lowest Settings
2560x1440 with the lowest settings (1080p where not possible)
3840x2160 with the lowest settings
1920x1080 at the maximum settings

Point (1) should give the ultimate CPU limited scenario. We should see that lift as we move up through (2) 1440p and (3) 4K, with 4K low still being quite strenuous in some titles.

Point (4) is essentially our ‘real world’ test. The RTX 2080 Ti is overkill for 1080p Maximum, and we’ll see that most modern CPUs pull well over 60 FPS average in this scenario.

What will be interesting is that for some titles, 4K Low is more compute heavy than 1080p Maximum, and for other titles that relationship is reversed.

So we have the following benchmarks as part of our script, automated to the point of a one-button run and out pops the results approximately 10 hours later, per GPU. Also listed are the resolutions and settings used.

Offline Games

Chernobylite, 360p Low, 1440p Low, 4K Low, 1080p Max
Civilization 6, 480p Low, 1440p Low, 4K Low, 1080p Max
Deus Ex: Mankind Divided, 600p Low, 1440p Low, 4K Low, 1080p Max
Final Fantasy XIV: 768p Min, 1440p Min, 4K Min, 1080p Max
Final Fantasy XV: 720p Standard, 1080p Standard, 4K Standard, 8K Standard
World of Tanks enCore: 768p Min, 1080p Standard, 1080p Max, 4K Max

Online Games

Borderlands 3, 360p VLow, 1440p VLow, 4K VLow, 1080p Badass
F1 2019, 768p ULow, 1440p ULow, 4K ULow, 1080p Ultra
Far Cry 5, 720p Low, 1440p Low, 4K Low, 1080p Ultra*
Gears Tactics, 720p Low, 4K Low, 8K Low 1080p Ultra
Grand Theft Auto 5, 720p Low, 1440p Low, 4K Low, 1080p Max
Red Dead Redemption 2, 384p Min, 1440p Min, 4K Min, 1080p Max
Strange Brigade DX12, 720p Low, 1440p Low, 4K Low, 1080p Ultra
Strange Brigade Vulkan, 720p Low, 1440p Low, 4K Low, 1080p Ultra

For each of the games in our testing, we take the frame times where we can (the two that we cannot are Chernobylite and FFXIV). For these games, at each resolution/setting combination, we run them for as many loops in a given time limit (often 10 minutes per resolution). Results are then taken as average frame rates and 95^th percentiles.

Some of the games are ultimately still being evaluated for usefulness, and may eventually be dropped – Far Cry 5 has taken more time than I care to admit to get to work. Some of these titles require the exact CPU/GPU combination to be part of the settings files otherwise the settings file will be discarded, which gets ever increasingly frustrating.

*Update 7/20 : I recently found that Far Cry 5 has additional requirements regarding monitor resolution support. If the settings file requests a resolution that it can’t detect in the monitor on the test bed, then it defaults to 1080p. My test beds contain two brands of 4K monitor – Dell UP2415Qs and cheap 27-inch TN displays, in a 50:50 split. For whatever reason, FC5 doesn’t really like any resolution changes on the Dell monitors. I can adjust the resolution scale (0.5x-2.0x) for this game, and quality, but I only found this out on 7/20, which means we have to rerun chips for this data.

If there are any game developers out there involved with any of the benchmarks above, please get in touch at ian@anandtech.com. I have a list of requests to make benchmarking your title easier!

The other angle is DRM, and some titles have limits of 5 systems per day. This may limit our testing in some cases; in other cases it is solvable.

OS Preparation and Benchmark Installation CPU Tests: Office

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

110 Comments

View All Comments

DiHydro - Monday, July 20, 2020 - link
This is epic. Thank you for doing this.
DiHydro - Monday, July 20, 2020 - link
To add a note: I think the ~$300 CPU year-over-year performance would be an interesting metric to see. That price point seems to be pretty popular for enthusiasts, and seeing back 5-6 years how that performance has increased per dollar would be neat.
bldr - Monday, July 20, 2020 - link
Agree!
close - Monday, July 20, 2020 - link
It will be especially interesting to see those CPUs (the popular mainstream ones) tested now and compared to the numbers they got originally to see how much they lost with all the recent mitigations.
close - Tuesday, July 21, 2020 - link
Oh, because I forgot previously, congratulations and good luck with the endeavor! I got exhausted only by reading about the work you're going to have to do
Fozzie - Monday, July 20, 2020 - link
Except keep in mind that adjusted for inflation $200 in the year 2000 is worth over $300 now.

You'd either be making a chart of the increased value over time just due to inflation or in fact the every increasing value at the $300 price point due to the reduced value of the Dollar on top of whatever performance gains occurred.
biosstar - Friday, July 24, 2020 - link
You could also use the value of a dollar in a certain year (let's say 2020) and compare the processors in the inflation adjusted equal categories.
PeterCollier - Monday, July 20, 2020 - link
What's the point of this Geekbench/Userbenchmark knockoff? I've never used AT's Bench tool. Especially not for smartphones, since the Bench tool is about 5 years out of date.
BushLin - Monday, July 20, 2020 - link
A controlled environment across all tests is reason enough. Even if I don't agree with AT policy on what speed they allow RAM to operate, it is a fair comparison.
Byte - Monday, July 20, 2020 - link
RAM is a really important topic. I think at this point in time, we can reasonable put almost maxed out ram for every platform. Like DDR3 can run at 2133, DDR4 we can run it at 3200 as prices are so close.
It is like rating sports cars but all have Goodride tires on them.
A dodge viper was a widowmaker when it came out. Today with a good set of summers like PS4S or PZero, you will have a hard time slipping even if you tried.

Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010

The CPU Overload 2020 Suite

Power

Office

Science

Simulation

Rendering

Encoding

Legacy

Web

Synthetic

SPEC (Estimated)

Others

2020 CPU Gaming (GPU) Benchmarks

Post Your Comment

110 Comments

View All Comments

DiHydro - Monday, July 20, 2020 - link

DiHydro - Monday, July 20, 2020 - link

bldr - Monday, July 20, 2020 - link

close - Monday, July 20, 2020 - link

close - Tuesday, July 21, 2020 - link

Fozzie - Monday, July 20, 2020 - link

biosstar - Friday, July 24, 2020 - link

PeterCollier - Monday, July 20, 2020 - link

BushLin - Monday, July 20, 2020 - link

Byte - Monday, July 20, 2020 - link

Log in

Don't have an account? Sign up now