With the announcement of DirectX 12 features like low-level programming, it appears we're having a revival of the DirectX vs. OpenGL debates—and we can toss AMD's Mantle into the mix in place of Glide (RIP 3dfx). I was around back in the days of the flame wars between OGL and DX1/2/3 devotees, with id Software's John Carmack and others weighing in on behalf of OGL at the time. As Microsoft continued to add features to DX, and with a healthy dose of marketing muscle, the subject mostly faded away after a few years. Today, the vast majority of Windows games run on DirectX, but with mobile platforms predominantly using variants of OpenGL (i.e. smartphones and tablets use a subset called OpenGL ES—the ES being for "Embedded Systems") we're seeing a bit of a resurgence in OGL use. There's also the increasing support of Linux and OS X, making a cross-platform grapics API even more desirable.

At the Game Developers Conference 2014, in a panel including NVIDIA's Cass Everitt and John McDonald, AMD's Graham Sellers, and Intel's Tim Foley, explanations and demonstrations were given suggesting OpenGL could unlock as much as a 7X to 15X improvement in performance. Even without fine tuning, they note that in general OpenGL code is around 1.3X faster than DirectX. It almost makes you wonder why we ever settled for DirectX in the first place—particularly considering many developers felt DirectX code was always a bit more complex than OpenGL code. (Short summary: DX was able to push new features into the API and get them working faster than OpenGL in the DX8/9/10/11 days.) Anyway, if you have an interest in graphics programming (or happen to be a game developer), you can find a full set of 130 slides from the presentation on NVIDIA's blog. Not surprisingly, Valve is also promoting OpenGL in various ways; the same link also has a video from a couple weeks back at Steam Dev Days covering the same topic.

The key to unlocking improved performance appears to be pretty straightforward: reducing driver overhead and increasing the number of draw calls. These are both items targeted by AMD's Mantle API, and presumably the low level DX12 API as well. I suspect the "7-15X improved performance" is going to be far more than we'll see in most real-world situations (i.e. games), but even a 50-100% performance improvement would be huge. Many of the mainstream laptops I test can hit 30-40 FPS at high quality 1080p settings, but there are periodic dips into the low 20s or maybe even the teens. Double the frame rates and everything becomes substantially smoother.

I won't pretend to have a definitive answer on which API is "best", but just like being locked into a single hardware platform or OS can lead to stagnation, I think it's always good to have alternatives. Obviously there's a lot going on with developing game engines, and sometimes slower code that's easier to use/understand is preferable to fast/difficult code. There's also far more to making a "good" game than graphics, which is a topic unto itself. Regardless, code for some of the testing scenarios provided by John McDonald is available on Github if you're interested in checking it out. It should work on Windows and Linux but may require some additional work to get it running on OS X for now.

Source: NVIDIA Blog - GDC 2014

Comments Locked

105 Comments

View All Comments

  • jwcalla - Wednesday, March 26, 2014 - link

    OpenGL 4.x features are required for the performance we're talking about here. The video cards listed in the L4D2 system requirements are 7-yo cards. It's Shader Model 3.0 equivalent.

    They didn't port Source to OGL. They already had an OGL rendering backend for OS X and slapped a translation layer on top to convert D3D to those OGL calls. That's how they had something up and running so quickly on Linux.

    Valve doesn't even have a pure OGL engine yet.
  • tuxRoller - Wednesday, March 26, 2014 - link

    He's right about the version of gl being used. Intel is only now on 3.3 and steam had been available for many months now. I think when it was first released Intel was on 3.0/3.1. This is only speaking about Linux, of course.
  • ET - Wednesday, March 26, 2014 - link

    Regardless of the other less than valid points, you're still comparing to DX9, and that's highly irrelevant to the "OpenGL vs. DX" debate. Yes, DX9 had a very high overhead, but DX has moved forward quite a bit since then.
  • tuxRoller - Wednesday, March 26, 2014 - link

    Except, according to the presentation linked in the article, it still is decently slower than gl.
  • bobvodka - Tuesday, March 25, 2014 - link

    and this is all well and good BUT.... unless you are playing on NV hardware you aren't going to be able to do this.

    At the time of writing AMD's most recent beta drivers are missing at least 3 of the extensions mentioned there (bindless, buffer_storage and shader_parameters) and have been since the 4.4 spec was released some 8 months ago.

    Intel don't support 4.4 either but that's kind of expected in the graphics world.

    So, right now you are stuck writing 3 PC paths - 4.4, 4.3 + AMD extension which is 'like' buffer_storage but without bindless/shader parameters (higher CPU cost), Intel compatible.

    And none of it addresses the problem of 'spending all your time on one thread'; games do not consist of one scene with 1,000,000 instanced objects. They consist of lots of scenes, with different render targets and shaders and data; the fact that GL does not allow command lists/buffers to be built on separate threads to dispatch just hamstrings things going forward because that magical 'single thread' which is doing all the work isn't getting any faster.
  • jwcalla - Wednesday, March 26, 2014 - link

    True, but video game rendering is ultimately a synchronized / serialized process. I'm not saying more threads don't matter, but ultimately all that stuff has to be synchronized and done so very frequently.

    Video games are simply not truly parallel operations.
  • inighthawki - Wednesday, March 26, 2014 - link

    Game rendering is only synchronized that way because no modern API provides a mechanism to do otherwise. OpenGL's multithreading model is basically the same as the one D3D11 introduced, which is multiple contexts. This model required significant amounts of additional work for little or no (and occasionally negative) improvement due to the overhead.

    DX12 looks to be solving this issue with command lists and bundles. They show a nearly linear scaling across processors for submitting workloads. And they do so with real world demos - actual games and benchmarks like 3DMark which have been ported to DX12.
  • jwcalla - Wednesday, March 26, 2014 - link

    No, game rendering is synchronized because you ultimately have to synchronize the video, audio, input, AI, networking and everything else or you're going to have one messed up experience. It's not like you can just go off and do AI stuff without considering the player's input, or render video frames without considering AI. Just like A/V sync -- it's synchronized. All of that stuff has to eventually be funneled down one pipe for the final presentation.
  • inighthawki - Wednesday, March 26, 2014 - link

    I think there was a misunderstanding. I thought you were referring to the rendering synchronization itself. Once you have all the dependencies needed for rendering, it is possible to split the rendering work nearly equally across cores, but modern game engines do not because none of the existing APIs do it very well.
  • jwcalla - Wednesday, March 26, 2014 - link

    Yeah, you can get a ton more draw calls by splitting the work up across all the cores like that. I think that helps a lot but even then, the actually frame rendering has to be serialized (can only render one at a time) and in order. It can help in CPU-limited scenarios where the GPU becomes starved (like we see in Mantle).

    The OGL approach presented here is somewhat different and intriguing. Instead of trying for more draw calls they're using the multidraw concept to bundle more visual updates into a single draw call. So they're trying for fewer draw calls where each call has a bigger punch. In theory this should alleviate pressure on the CPU. I think this approach has better advantages for mobile platforms.

Log in

Don't have an account? Sign up now