A8: Apple’s First 20nm SoC

As has been customary for every iPhone launch since the company began publicly naming their SoCs, Apple has once again rolled out a new SoC for their latest line of phones. With the launch of the iPhone 6 series Apple is now up to their eight generation SoC, the appropriately named A8.

After a period of rapid change with the A6 and A7 SoCs – which introduced Apple’s first custom CPU design (Swift) and the first ARMv8 AArch64 design (Cyclone) respectively – A8 is a more structured and straightforward evolution of Apple’s SoC designs. Which is not to say that Apple hasn’t been busy tweaking their designs to extract ever-improved performance and power efficiency, as we’ll see, but our examination of A8 has not uncovered the same kind of radical changes that defined A6 and A7.

The heart and soul of A8 is as always the CPU and GPU. We’ll be taking a look at each of these individually in a moment, but from a high level both of these are evolutions of their predecessors found in A7. Apple’s GPU of choice remains Imagination’s PowerVR, having upgraded from the Series6 based G6430 to Imagination’s newer GX6450 design. Meanwhile Apple continues to develop their own CPUs and A8 packs their latest design, which is an enhanced version of the Cyclone core first introduced in A7.

Stepping away from the GPU and CPU for the moment, the biggest change about A8 is that it’s smaller. As discovered by Chipworks, A8 is being fabricated on TSMC’s new 20nm process, making the iPhone 6 among the first smartphones to be shipped with a 20nm SoC.

This move to 20nm is not unexpected, but nonetheless it is considerable for a couple of reasons. The first is that this means Apple has moved production over to TSMC’s 20nm HKMG Planar process, making this the first time an Apple SoC has been manufactured anywhere but a Samsung fab. There are numerous possible reasons for this – and not every reason needs to be technical – but from a process development standpoint it’s important to note that over the last few generations TSMC has been the leader among contract foundries, being the first to get new processes up and running for volume production.

Apple A8 vs A7 SoCs
  Apple A8 (2014) Apple A7 (2013)
Manufacturing Process TSMC 20nm HKMG Samsung 28nm HKMG
Die Size 89mm2 102mm2
Transistor Count ~2B "Over 1B"
CPU 2 x Apple Enhanced Cyclone
ARMv8 64-bit cores
2 x Apple Cyclone
ARMv8 64-bit cores
GPU IMG PowerVR GX6450 IMG PowerVR G6430

This move is also quite considerable because it means for the first time Apple is manufacturing their SoCs on a bleeding edge manufacturing process. Prior to this Apple has been slow to utilize new manufacturing processes, only finally utilizing a 28nm process in late 2013 for A7 over a year after 28nm first became available. The fact that we are seeing a 20nm SoC from Apple at a time when almost everyone else is still on 28nm indicates just how much the market has shifted over the last few years, and how Apple’s SoC development is now synchronized with the very edge of semiconductor fabrication technology.

Finally, the switch to 20nm is interesting because after the last couple of generations being so-called “half node” jumps – 45nm to 40nm to 32nm to 28nm – the jump from 28nm to 20nm is a full node jump (note that Apple didn't ever use 40nm, however). This means we are seeing a larger increase in transistor density than in the previous generations, and ideally a larger decrease in power consumption as well.

In practice TSMC’s 20nm process is going to be a mixed bag; it can offer 30% higher speeds, 1.9x the density, or 25% less power consumption than their 28nm process, but not all three at once. In particular power consumption and speeds will be directly opposed, so any use of higher clock speeds will eat into power consumption improvements. This of course gets murkier once we’re comparing TSMC to Samsung, but the principle of clock speed/power tradeoffs remains the same regardless.

Not accounting for minor differences between TSMC and Samsung, in an ideal case Apple is looking at 51% area scaling (the same design on 20nm can be no smaller than 51% of the die area at 28nm). In reality, nothing ever scales perfectly so the density gains will depend on the kind of I/C being laid down (logic, SRAM, etc.). For the complete chip a 60-70% scaling factor is going to be a better approximation, which for Apple means they’ve picked up a lot room to spend on new functionality and reducing their overall die size.

Apple SoC Evolution
  CPU Perf GPU Perf Die Size Transistors Process
A5 ~13x ~20x 122m2 <1B 45nm
A6 ~26x ~34x 97mm2 <1B 32nm
A7 40x 56x 102mm2 >1B 28nm
A8 50x 84x 89mm2 ~2B 20nm

Meanwhile once again this year Apple opened up on die size and transistor counts. A8 weighs in at around 2 billion transistors, as opposed to the “over 1 billion” transistors found on A7. We also have the die size for A8 – 89mm2 – which is some 13% smaller than A7’s 102mm2 die. This makes it clear that Apple has chosen to split their transistor density improvements between adding features/performance and reducing their size, rather than going all-in on either direction.

In the case of using a bleeding edge node this is generally a good call, as Apple and TSMC will need to deal with the fact that chip yields at 20nm will not be as good as they are on the highly mature 28nm process. With lower chip yields, a smaller die will offset some of those yield losses by reducing the number of manufacturing flaws any given die touches, improving the overall yield.


A8 With POP RAM Removed

Moving on, looking at A8 we can see that Apple’s memory subsystem design has not significantly changed from A7. Once again Apple has placed an SRAM cache on the chip to service both the CPU and the GPU. Based on an examination of the die and of latency numbers, this L3 SRAM cache remains unchanged from A7 at 4MB. Meanwhile we also find a series of SDRAM interfaces which drive the A8’s package-on-package (POP) based main memory. Based on teardowns from iFixit, Apple is using 1GB of LPDDR3-1600, the same speed grade of LPDDR3 and capacity that they used for the iPhone 5s. iFixit has found both Hynix and Elpida memory in their phones, so Apple is once again using multiple sources for their RAM.

When we start poking at memory bandwidth we find that memory bandwidths are consistently higher than on A7, but only ever so slightly. This points to Apple having worked out further optimizations to make better use of the memory bandwidth they have available, since as we’ve previously determined they’re still using LPDDR3-1600 speeds.

Geekbench 3 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A8 1.4GHz 9.08 GB/s 5.37 GB/s 5.76 GB/s 5.78 GB/s
Apple A7 1.3GHz 8.34 GB/s 5.21 GB/s 5.67 GB/s 5.69 GB/s
A8 Advantage 9% 3% 2% 2%

The Stream Copy score ends up being the biggest gain at 9%. Otherwise the rest of the benchmarks only show 2-3% memory bandwidth increases.

More interesting is memory latency, which shows some unexpected improvements once we get out of the L1 and L2 caches. At both the 1MB – 4MB region of the SRAM and 6MB+ region of main memory, memory latency is consistently lower on A8 versus A7. In both cases we’re looking at latencies about 20ns faster than A7. This identical 20ns gain tells us that that Apple is still doing main memory lookups after the L3 lookup fails, and this in turn means the 20ns gain we’re seeing is due to L3 cache optimizations. We have a couple of ideas for how Apple could have improved L3 latency by nearly 20% like this, but at this time with Apple staying quiet on their architecture like usual, it’s not apparent which of these ideas are the correct ones.

Turning our eyes back to A8 one final time, we find that while a lot of die space is occupied by the CPU, GPU, and SRAM (as we’d expect), there is also quite a bit of space occupied by other blocks Apple has integrated into their design. Without already knowing what you’re looking for these blocks are difficult to identify, but even without being able to do this we have a reasonable idea of what blocks Apple has integrated. Among these we’ll find audio controllers, USB controllers, video encoders/decoders, flash memory controllers, the camera ISP, and of course all kinds of interconnect.

All of these blocks are fixed function hardware (or at best, limited flexibility DSPs), which are equally important to not only the A8’s functionality but power efficiency. By assigning tasks to dedicated hardware Apple does spend some die space on that hardware, but in return these blocks are more efficient than doing those tasks entirely in software. Hence Apple (and SoC designers in general) have a strong incentive to offload as much work as possible to keep power consumption in check. This move towards more fixed function hardware is part of a general “wheel of reincarnation” cycle that has been a constant in processor design over the years, which sees a continuous shift between fixed function and programmable hardware. SoCs, for the most part, are still going towards fixed function hardware, and this should continue for a while yet.

In any case, while we can’t identify individual blocks on A8 we do know that Apple has added a few features to A8 that are present in some form or another among these blocks. New to A8 is some mix of H.265 (HEVC) hardware, which would be necessary to enable the FaceTime over H.265 functionality that is being introduced on the iPhone 6. Apple’s “desktop class scaler” that is used for handling non-native resolution applications and for down-sampling the internal rendering resolution of the iPhone 6 Plus would also be present here.

Introduction A8’s CPU: What Comes After Cyclone?
Comments Locked

531 Comments

View All Comments

  • KosmiclyComic - Tuesday, September 30, 2014 - link

    How hard is it to be this dense and ignorant? Anand just went into amazing detail about the A8, a 20nm cutting edge processor which is leaps and bounds ahead of the competition.

    Stop comparing specs between Android and Apple phones. The OS's make spec comparisons irrelevant. Only benchmarks can show meaningful differences. But then again based on your comments I bet you'd think that a Toyota Sienna with 220HP is faster than a Ducati Monster with 130 HP
  • akdj - Friday, October 3, 2014 - link

    You're a simple man
    Would've been significantly more paper 'to continue loading their profit margins‘ by keeping the storage sizes the same 16/32/64 & increasing the RAM by double than rhe NAND by double (16/64/128). So back atcha, as an owner of both and someone that held out on my Note 3 to wait on a 64GB to materialize....it never DID! 32 or go home (of which just over 20 is open and with each successive update my micro SD is becoming 'less relevant' than earlier, slower and 'glitchier' Android builds.

    Google doesn't want OEMs to use offboard storage. So if they're coding their OS to NOT utilize those memory sticks...and the 'limit' is typically 32GB with an infrequent 64GB & NO 128s, who's coveting their pennies more? Cupertino or MtVall? Apple could've easily saved more on the BOM by doubling, even quadrupling RAM and not increasing storage (NAND). But...why? I think it's a generational thing. I've got a 2010 MBA with 2GB or RAM, 128 GB SSD, and my son uses it daily without a hiccup. Apple's ability to MANAGE RAM has shown time and time again it's NOT the bottleneck in performance. Where's my proof? I own the 5s and Note 3. Any and EVERY app in parity is SMOKED by the 5s with a third the RAM, half the cores and half the speed. Photo manipulation in Lightroom or iPhoto, audio production in any of hundreds of DAWS, GarageBand which is free and INCREDIBLY powerful, with the ability to mix out 16 or 32 tracks of instruments and vocals, that's unbelievable. If you're surfing, Apple's smart enough to free enough RAM for your current page allowing other cached processes to continue running efficiently with excellent reliability. A good sized web page is 15-20MB...Not a GB!

    Not to mention some of the sites poor coding and builds on top of antiquated code as well as dynamic content sites, ala Facebook. But with FB and so MANY other sites having 'apps' that don't need to be viewed in a browser. This reminds me of the 90s and the Pentium years. Just add RAM! Lol when we could only utilize about 3GB on 32bit rigs, folks were putting 4 in and SWEARING the difference was night and day
    Night and day differences come from better SoC design, more efficiency and battery life, lighter, faster, more secure, better dusplay characteristics and FAST internal storage are ALL bigger benefits than this tired RAM argument. Again, there's three gigs in my n3. Playing Asphalt 8 you'd swear it's my 5s that's got three and quad cores.

    Nope ...it is the other way around and a sad reality
  • nos024 - Tuesday, September 30, 2014 - link

    Wow, looks like a Galaxy now. But still no SD card and removable battery. Don't think I can ever buy a phone that don't have both of those features, android or not.
  • hatty - Tuesday, September 30, 2014 - link

    If you aren't savvy enough to use the cloud, just buy more storage up front. It is more elegant, allows for less complexity (paramount for security, developing applications, and performance), and provides an overall smaller footprint.
  • blackcrayon - Tuesday, September 30, 2014 - link

    Not only that, but Apple is obviously putting some pretty quick storage in these devices. If you need storage get the large, fast, contiguous storage and the rest use the cloud.
  • juicytuna - Tuesday, September 30, 2014 - link

    Yeh, your display white point are not consistent with what others have measured (~7500k @ displaymate, phonearena, various Russian sites). Given your reputation for in-depth technical analysis, it wouldn't surprise me if Apple singled you out for a specially prepared unit.
  • Brandon Chester - Tuesday, September 30, 2014 - link

    Apple has always had significant variance in white point. They would not send a cherry picked sample with a hot pixel right in the middle of the panel.
  • juicytuna - Tuesday, September 30, 2014 - link

    Hot pixels don't show up in graphs and are easily excused as an anomaly.
  • juicytuna - Tuesday, September 30, 2014 - link

    There's very little downside to providing a unit with a hot pixel. A defective unit like that will be replaced under warranty, no questions asked. On the other hand, it is being used by you guys to suppress your 'suspicions' that you have a cherry picked phone. PR is devious by nature and I think you guys might be being a little naive here.

    Why don't you re-run these tests on your personal units and see how much they differ? I think that would be the right thing to do, as after all you admitted you were suspicious.
  • Parhel - Tuesday, September 30, 2014 - link

    All you'd need to do is hold one phone next to another to see if the displays were visually different. If they were, then you retest. If they aren't, then you don't. But the idea that the hot pixel was intentional to "suppress their suspicions" is truly tin-foil hat level ridiculous.

Log in

Don't have an account? Sign up now