The AMD Radeon RX 480 Preview: Polaris Makes Its Mainstream Mark
by Ryan Smith on June 29, 2016 9:00 AM ESTThe Polaris Architecture: In Brief
For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.
In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.
At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.
Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.
Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.
Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.
Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.
Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.
449 Comments
View All Comments
Yojimbo - Thursday, June 30, 2016 - link
Why wouldn't you use data for NVIDIA's GPUs to try to determine the GTX 1060's performance rather than use data from AMD's GPUs? The experience from the 700 series and the 900 series implies that, assuming that the 1060 has two GPCs (half that of the 1080) it should be about 20% faster than the 970 in DX 11 games and so about 20% faster than the RX 480. Pascal seems to be doing better in DX 12 than Maxwell, so it may end up being close to 20% faster than RX 480 in DX 12 games, too.dragonsqrrl - Thursday, June 30, 2016 - link
I'm not using AMDs GPUs to determine the performance of the 1060, I'm using the 1070 and 1080. What I was trying to say in my previous comment was that I've assumed roughly 50% 1080 performance (or around the 970) for the 1060. The RX480 leaks prior to launch suggested 390X-like performance, which led me to believe the 1060 would probably perform a step below it. Apparently the leaks were a bit exaggerated, so I now think the 1060 will be more competitive against the RX480 than I did before.I'm actually curious why your estimate is so different. Am I missing something?
Yojimbo - Thursday, June 30, 2016 - link
OK, first let's look at the 900 series. The GTX 980 was released on September 18, 2014 for $549. It has a 2048:128:64 configuration @ 1126 MHz base clock for 4612 SP throughput. The GTX 970 was released on September 18, 2014 for $329. It has a 1664:104:56 core configuration @ 1050 MHz for 3494 SP throughput. The GTX 960 was released January 22, 2015 for $199. It has a 1024:64:32 core configuration @ 1127 MHz base clock for 2308 SP throughput.
Relative: performance - 980 is 1.32 times 970, price - 980 is 1.67 times 970. performance - 980 is 2 times 960, price - 980 is 2.76 times 960. performance - 970 is 1.51 times 960, price - 970 is 1.65 times 960.
Now the 10 series. GTX 1080 was just recently released and will presumably be available soon for $599. It has a 2560:160:64 configuration @ 1607 MHz for SP throughput of 8228. GTX 1070 was just recently released and will presumably be available soon for $379. It has a 1920:120:64 configuration @ 1506 MHz for SP throughput of 5783. Now the GTX 1060 is rumored to have a 1280:80:48 configuration. It will probably have a clock very close to the 1080 judging by the 900 series clocks. That would give it an SP throughput of 4114. Relative: performance - 980 is 1.42 times 1070, price - 1080 is 1.58 times 1070. speculative: performance - 1080 is 2 times 1060. performance - 1070 is 1.41 times 1060.
Now the GTX 1070 has SP throughput that is 25% more than the GTX 980. It performs 20% to 40% faster than the 980 (in DX11 games. More in DX12 games), averaging more than 30% faster. 4114 SP throughput for the GTX 1060 would make it give it 18% more than the GTX 970. It should then average about 25% faster in DX11 games, and so more than 20% faster than the RX 480.
Now, I know that you were only interested in how I got the performance numbers for the 1060, but I decided to include an argument for pricing as well while I was at it:
The 1070 has a performance/price ratio of 1.11 wrt to the 1080. The 970 has a performance/price ratio of 1.27 wrt to the 980. The 960 has a performance/price ratio of 1.38 wrt the 980. The 960 has a performance/price ratio of 1.09 wrt the 970. You can see the 1070 is priced a lot closer to the 1080 compared to the 970's price relative to the 980, despite the 970 being closer to the 980 in performance compared with the relative performance of 1070 and 1080. The question is why is this the case? Does it have to do something with the defect density of the 14nm node, or something else? My guess is it has to do with the success of the 970 and the amount of competition from AMD in the space the cards occupy. The 970 was enourmously successful, and NVIDIA wants to push up the average selling price of the replacement card if they can, in order to tap into the prior success of the x70 card. Additionally, when the 980 and 970 were released, AMD had more competition for the 970 than for the 980. Therefore the 980 could be prices relatively higher. Now AMD does not really have competition for both the 1080 and the 1070, allowing both those cards to be priced higher. The 1060, however, faces competition. Therefore I think that the expected pricing of the 1060 would be to remain close to the relative price/performance ratio of the 960 wrt the 980, a card with competition compared with a card without much competition, rather than remain closer to the relative price/performance ratio of the 960 wrt the 970, which were both cards with competition. If we divide the price ratio of the 980 to 960 with the performance ratio of the 980 to 960 we get 2.76/2 = 1.38. This represents a conversion factor that will convert relative performance to relative price, under the assumption that the relative price performance ratio of the 980 to 960 also holds for the 1080 to 1060. Since I speculated that the 1080 will have 2 times the performance of the 1060, the 1080 would then cost 2.76 times the 1060 under these assumptions. Since the 1080 costs $599, the 1060 would be expected to cost about $217.
$217 obviously leaves quite a bit of wiggle room for upward pricing pressure of the 1060, such as it falling closer in line with the 1070 price for whaetever reason, and still be well below the $300 that many seem to be claiming. But the point is that a $220 GTX 1060 performing 25% faster than the GTX 970 is well within the range of reasonable expectations given the recent historical data of NVIDIA's cards. If anything the GTX 1080 has even less competition than the GTX 980 had, suggesting the converstion factor might actually be greater (But I doubt it. The 1080 can't be found for the $599 at the moment and part of the reason for that is that the 1080 doesn't have any competition. That larger conversion factor is factored into the actual real world prices but probably not the MSRPs.) So the RX 480 seems to exert no extra pricing pressure on the GTX 1060 than AMD's offerings exerted on the GTX 960 when the GTX 960 was released.
dragonsqrrl - Thursday, June 30, 2016 - link
"Relative: performance - 980 is 1.42 times 1070, price - 1080 is 1.58 times 1070. speculative: performance - 1080 is 2 times 1060. performance - 1070 is 1.41 times 1060."There's something wrong here. If the 1060 is roughly equal to the 980, and the 1080 is 2x the performance of the 1060, the 1080 would also have to be 2x the performance of the 980, which it isn't. I'm not exactly sure where the ratios or logic went wrong, but there's clearly an inconsistency there. The 1080 is about 1.65x the performance of the 980, and about 1.95x the performance of the 970. I'm not using theoretical SP performance, I'm basing this primarily off of real world DX11 performance at 1440p. This is why I assumed it would perform closer to the 970, because it's roughly 50% the performance of the 1080.
Yojimbo - Thursday, June 30, 2016 - link
" If the 1060 is roughly equal to the 980, and the 1080 is 2x the performance of the 1060, the 1080 would also have to be 2x the performance of the 980, which it isn't."There's pretty obviously a typo there. The organization of the information should lead you to know it's a typo. It should read: ""Relative: performance - 1080 is 1.42 times 1070, price - 1080 is 1.58 times 1070. speculative: performance - 1080 is 2 times 1060. performance - 1070 is 1.41 times 1060." Does that clear things up?
" The 1080 is about 1.65x the performance of the 980, and about 1.95x the performance of the 970."
Does that contradict my information? If it does, then show how. It doesn't seem relevant to me, because you don't directly argue against the cross-generational comparison I did make. I established the relative performance of the 10 series to the 900 series by comparing the 1070 to the 980. The 1070 performs on average 30% faster than the 980 in real world games. The relative performance of the cards within their architecture is closely related to their theoretical performance.
"I'm not using theoretical SP performance"
Without considering theoretical performance there's no way whatsoever you can guess the performance of the 1060 because at this point theoretical performance of the 1060 is all we have information for.
sonicmerlin - Friday, July 1, 2016 - link
Yojimbo almost certainly has Aspergers. And yet you read everything he wrote. JesusYojimbo - Friday, July 1, 2016 - link
Why wouldn't you want to read something that's right? Jesuscrimson117 - Wednesday, June 29, 2016 - link
Go away, troll. Don't just post shit like that without backing it up. When can we get downvote buttons on AT comments?"Wrapping things up then, today’s launch of the Radeon RX 480 leaves AMD is in a good position. They have the mainstream market to themselves, and RX 480 is a strong showing for their new Polaris architecture. AMD will have to fend off NVIDIA at some point, but for now they can sit back and enjoy another successful launch."
atlantico - Wednesday, June 29, 2016 - link
He's not wrong, for $200-240 the best GPU on the market is AMD RX480. For "backing that up" check the benchmarks in the article.crimson117 - Wednesday, June 29, 2016 - link
I was referring to the OP's comment "What a massive F-Up by AMD"