Today Futuremark is pulling the covers off of their new Time Spy benchmark, which is being released today for all Windows editions of 3DMark. A showcase of sorts of the last decade or so of 3DMark benchmarks, Time Spy is a modern DirectX 12 benchmark implementing a number of the API's important features. All of this comes together in a demanding test for those who think their GPU hasn’t earned its keep yet.

DirectX 12 support for game engines has been coming along for a few months now. To join in the fray Futuremark has written the Time Spy benchmark on top of a pure DirectX 12 engine. This brings features such as asynchronous compute, explicit multi-adapter, and of course multi-threading/multi-core work submission improvements. All of this comes together into what I think is not only visually interesting, but also borrows a large number of gaming assets from benchmarks of 3DMarks past.

For those who haven’t been following the 3DMark franchise for more than a decade, there are portions of the prior benchmarks showcased as shrunken museum exhibits. These exhibits come to life as the titular Time Spy wanders the hall, giving a throwback to past demos. I must admit a bit of fun was had watching to see what I recognized. I personally couldn’t spot anything older than 3DMark 2005, but I would be interested in hearing about anything I missed.

Unlike many of the benchmarks exhibited in this museum, the entirety of this benchmark takes place in the same environment. Fortunately, the large variety of eye candy present gives a varied backdrop for the tests presented. To add story in, we see a crystalline ivy entangled with the entire museum. In parts of the exhibit there are deceased in orange hazmat suits demonstrating signs of a previous struggle. Meanwhile, the Time Spy examines the museum with a handheld time portal. Through said portal she can view a bright and clean museum, and view bustling air traffic outside. I’ll not spoil the entire brief story here, but the benchmark makes good work of providing both eye candy for the newcomers and tributes for the enthusiasts that will spend ample time watching the events unroll.

From a technical perspective, this benchmark is, as you might imagine, designed to be the successor to Fire Strike. The system requirements are higher than ever, and while Fire Strike Ultra could run at 4K, 1440p is enough to bring even the latest cards to their knees with Time Spy.

Under the hood, the engine only makes use of FL 11_0 features, which means it can run on video cards as far back as GeForce GTX 680 and Radeon HD 7970. At the same time it doesn't use any of the features from the newer feature levels, so while it ensures a consistent test between all cards, it doesn't push the very newest graphics features such as conservative rasterization.

That said, Futuremark has definitely set out to make full use of FL 11_0. Futuremark has published an excellent technical guide for the benchmark, which should go live at the same time as this article, so I won't recap it verbatim. But in brief, everything from asynchronous compute to resource heaps get used. In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that "the asynchronous compute workload per frame varies between 10-20%." On the work submission front, they're making full use of multi-threaded command queue submission, noting that every logical core in a system is used to submit work.

Meanwhile on the multi-GPU front, Time Spy is also mGPU capable. Futuremark is essentially meeting the GPUs half-way here, using DX12 explicit multi-adapter's linked-node mode. Linked-node mode is designed for matching GPUs - so there isn't any Ashes-style wacky heterogeneous configurations supported here - trading off some of the fine-grained power of explicit multi-adapter for the simplicity of matching GPUs and useful features that can only be done with matching GPUs such as cross-node resource sharing. For their mGPU implementation Futuremark is using otherwise common AFR, which for a non-interactive demo should offer the best performance.

3DMark Time Spy Benchmark: 1440p

3DMark Time Spy Benchmark: 1440p

To take a quick look at the benchmark, we ran the full test on a small number of cards on the default 1440p setting. In our previous testing AMD’s RX 480 and R9 390 traded blows with each other and NVIDIA’s GTX 970. Here though, the RX 480 pulls a small lead over the R9 390 while they both leave a slightly larger gap ahead of the GTX 970. Only to then see the GeForce GTX 1070 appropriately zip past the lot of them.

The graphics tests scale similarly to the overall score in this case, and if these tests were a real game anything less than the GTX 1070 would provide a poor gameplay experience with framerates under 30 fps. While we didn’t get any 4K numbers off our test bench, I ran a GTX 1080 in my personal rig (i7-2600k @4.2GHz) and saw 4K scores that were about half of my 1440p scores. While this is a synthetic test, the graphical demands this benchmark can place on a system will provide a plenty hefty workload for any seeking it out.

Meanwhile, for the Advanced and Professional versions of the benchmark there's an interesting ability to run it with async compute disabled. Since this is one of the only pieces of software out right now that can use async on Pascal GPUs, I went ahead and quickly ran the graphics test on the GTX 1070 and RX 480. It's not an apples-to-apples comparison in that they have much different performance levels, but for now it's the best look we can take at async on Pascal.

3DMark Time Spy Benchmark: Async Compute

Both cards pick up 300-400 points in score. On a relative basis this is a 10.8% gain for the RX 480, and a 5.4% gain for the GTX 1070. Though whenever working with async, I should note that the primary performance benefit as implemented in Time Spy is via concurrency, so everything here is dependent on a game having additional work to submit and a GPU having execution bubbles to fill.

The new Time Spy test will be coming today to Windows users of 3DMark. This walk down memory lane not only puts demands on the latest gaming hardware but also provides another showcase of the benefits DX12 can bring to our games. To anyone who’s found FireStrike too easy of a benchmark, keep an eye out for Time Spy in the near future.

Comments Locked

75 Comments

View All Comments

  • Eden-K121D - Thursday, July 14, 2016 - link

    There is something fishy. are they disguising pre-emption as async compute for nvidia cards
  • Eden-K121D - Thursday, July 14, 2016 - link

    and no other GPUs
  • ddriver - Thursday, July 14, 2016 - link

    Like everyone else, they sell out to the highest bidder, and amd just doesn't have that much to bid.
  • euskalzabe - Thursday, July 14, 2016 - link

    Wait, isn't Nvidia doing async, just via pre-emption? As far as I understand, AMD has proper ACEs so they do async on hardware, whereas Nvidia doesn't have the hardware parts and thus does async via software through pre-emption. In a similar way, AMD doesn't have Pascal's simultaneous multi-projection so they do it via software.

    In the end, they're both doing async in one way or another. Isn't that right?
  • edzieba - Thursday, July 14, 2016 - link

    Kind of. Pre-emption has almost nothing to do with Asynchronous Compute.

    Maxwell, Pascal, and GCN all support Async Compute, but implement it in different ways.
    GCN uses Asynchronous Shaders (and ACEs) with hardware scheduling. But this only works under DX12 and Vulkan when software actually explicitly targets Async Compute. Otherwise, that silicon is left underutilised.
    Maxwell and Pascal perform scheduling at the driver level (GPC particioned in Maxwell, SM partitioned in Pascal). But because this is done in software, it was already implemented for DX11. This is why Async Compute sees little benefit on Maxwell and Pascal when moving from DX11 to DX12: Async Compute was already being performed.
  • xenol - Thursday, July 14, 2016 - link

    So unless the app specifically asks to use the ACEs, AMD's drivers won't put the instructions through there whereas NVIDIA does all its sorting ahead of time?
  • Yojimbo - Thursday, July 14, 2016 - link

    the ACEs are schedulers, not execution units.
  • Scali - Friday, July 15, 2016 - link

    Well, AMD's drivers did not implement Driver Command Lists (DCLS) in the DX11 API properly.
    The DX11 API allows you to create multiple contexts, which you can use from multiple threads, where each context/thread can create their own DCL.
    On nVidia hardware, we saw up to 2x the performance compared to using a single context (see the earlier 3DMark API overhead test).
    On AMD, this was not implemented, so even if you made multiple threads and contexts, the driver would just serialize it and run it sequentially from a single thread. As a result, you saw 1x performance, regardless of how many threads you used.
    Given this serializing behaviour, it seems that there was no way for AMD to make use of async compute in DX11 either.
    nVidia could do this, but I'm not sure to what extent they actually did. All we can see is that nVidia did get reasonable performance increase from using multiple DX11 contexts, where AMD did not get anything at all. Whether some or all of nVidia's performance increase came from async compute, or some other benefits of multithreading, is difficult to say.
  • Yojimbo - Friday, July 15, 2016 - link

    DirectX 11 has been out for 7 years and has been the mainstay of games development for a long time. How likely is it that AMD missed out, and continues to miss out, on performance using DirectX 11 simply because of poor driver implementation? If that is a feature of the API actually used by games and it makes a significant difference in performance then it's hard to believe AMD would just let it languish. It would be both incompetence on the part of their driver team and strategic mismanagement of resources by their management. Is it not possible that their architecture simply is not amenable to that feature of the API?
  • D. Lister - Friday, July 15, 2016 - link

    Driver-level changes alone cannot mitigate hardware limitations, and despite the added features in the later versions, at its core, GCN has been outdated for quite some time. Consequently, we have been seeing one family after another of GPUs with nearly nonexistent OC headroom, ridiculous power usage and/or temperatures, and a list of open-source "features".

    Sadly, right now, AMD is in a vicious cycle of finance; they need more money to fix these issues, but to make more money they need to fix these issues, hence the inevitable downward slope.

Log in

Don't have an account? Sign up now