The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • Flunk - Thursday, June 30, 2016 - link

    The Crossfire reviews I've read have said the GTX 1070 is faster on average than RX 480 Crossfire, maybe you should go read those reviews.
  • Murloc - Tuesday, July 5, 2016 - link

    comparing crossfire/sli to a single gpu is really useless. Multigpu means lots of heat, noise, power consumption, driver and game support issues, and performance that is most certainly not doubled on many games.

    Most people want ONE video card and they're going to get the one with the best bang for buck.
  • R0H1T - Wednesday, June 29, 2016 - link

    For $200 I'll take this over the massive cash grab i.e. FE obviously!
  • Wreckage - Wednesday, June 29, 2016 - link

    Going down with the ship eh? It took AMD 2 years to compete with the 970. I guess we will have to wait until 2018 to see what they have to go against the 1070
  • looncraz - Wednesday, June 29, 2016 - link

    Two years to compete with the 970?

    The 970's only advantage over AMD's similarly priced GPUs was power consumption. That advantage is now gone - and AMD is charging much less for that level of performance.

    The RX480 is a solid GPU for mainstream 1080p gamers - i.e. the majority of the market. In fact, right now, it's the best GPU to buy under $300 by any metric (other than the cooler).

    Better performance, better power consumption, more memory, more affordable, more up-to-date, etc...
  • stereopticon - Wednesday, June 29, 2016 - link

    are you kidding me?! better power consumption?! its about the same as the 970... it used something like 13 lets watts while running crysis 3... if the gtx1060 ends up being as good this card for under 300 while consuming less watts i have no idea what AMD is gonna do. I was hoping for this to have a little more power (more along 980) to go inside my secondary rig.. but we will see how the 1060 performance.

    i still believe this is a good card for the money.. but the hype was definitely far greater than what the actual outcome was...
  • adamaxis - Wednesday, June 29, 2016 - link

    Nvidia measures power consumption by average draw. AMD measures by max.

    These cards are not even remotely equal.
  • dragonsqrrl - Wednesday, June 29, 2016 - link

    "Nvidia measures power consumption by average draw. AMD measures by max."

    That's completely false.
  • CiccioB - Friday, July 1, 2016 - link

    Didn't you know that when using AMD HW the watt meter switches to "maximum mode" while when applying the probes on nvidia HW it switched to "average mode"?

    Ah, ignorance, what a funny thing it is
  • dragonsqrrl - Friday, July 8, 2016 - link

    @CiccioB

    No I didn't, source? Are you suggesting that the presence of AMD or Nvidia hardware in a system has some influence over metering hardware use to measure power consumption? What about total system power consumption from the wall?

    At least in relation to advertised TDP, which is what my original comment was referring to, I know that what adamaxis said about avg and max power consumption is false.

Log in

Don't have an account? Sign up now