The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • basroil - Thursday, June 30, 2016 - link

    "Or both the mobo and the PSU are supplying the same voltage and the power input is combined into a single bus... y'know... preventing the unlikely scenario you describe from ever possibly happening."

    1) The two do NOT have the same voltage. Ideally they do but that's not how things actually work in practice.
    2) The folks at tomshardware did bus level analysis of power draws and put their results into their review. Their tests for various cards will prove to you that power draw can indeed be modified to either PCIe slot or power cable and is not 50-50 like you claim.
    3) Even assuming that your point was valid (which it most certainly is NOT), it wouldn't change the fact that a single card already draws more power from the PCIe slot than allowable by ATX specifications, and that two cards will be far more than the specs allow (double the spec for PCIe3.0)
  • schulmaster - Thursday, June 30, 2016 - link

    Lol. The PSU is the source for all board power AND PCIE Aux. The board design and PSU will negotiate how much 12V power is reliably sourced from the 24pin. A 6pin PCIe aux is rated for an additional 75W, and that limit could be down to the cable itself, let alone the card interface and/or the PSU. Even high-end OC boards have a supplemental molex connector for multi GPU configs to supplement available bus power, which is the burden of the 24pin. It is not outlandish to have concern if a single RX480 is overdrawing from the entire PCIe bus wattage allotted in the spec, especially when the fall back is a PCIe 6 pin already being overdrawn from as well. Tomshardware was literally unwilling to due further multiGPU testing due to the numbers they were physically seeing, not paranoia.
  • pats1111 - Thursday, June 30, 2016 - link

    @binarydissonance: Don't confuse these fanboys with the facts, they're NVIDIA goons, it's a waste of time because they are TROLLS
  • AbbieHoffman - Wednesday, June 29, 2016 - link

    Actually most motherboards support crossfire. There are many that support only crossfire. Because it is cheaper to make crossfire support than SLI.
  • Gigaplex - Thursday, June 30, 2016 - link

    But they don't support the excessive power consumption on the PCIe bus, which is a specification violation.
  • jospoortvliet - Monday, July 4, 2016 - link

    Luckily every motherboard except for cheap ones that are quite old can handle easily 100+ watt over the PCIe port, as any over clocking would need that, too.
  • beck2050 - Thursday, June 30, 2016 - link

    I just laugh when I see people talking about Crossfire
  • fanofanand - Thursday, June 30, 2016 - link

    "when even 2x1080 wouldn't hit 75W"

    Your post is so full of FUD it should be deleted.
  • basroil - Thursday, June 30, 2016 - link

    "Your post is so full of FUD it should be deleted. "

    I'm not responsible for your ignorance. Check tomshardware /reviews/nvidia-geforce-gtx-1080-pascal,4572-10.html and you'll see I'm right
  • fanofanand - Thursday, June 30, 2016 - link

    I checked, you are wrong. Stop spreading FUD, you Nvidiot.

Log in

Don't have an account? Sign up now