Intel Unveils Meteor Lake Architecture: Intel 4 Heralds the Disaggregated Future of Mobile CPUs
by Gavin Bonshor on September 19, 2023 11:35 AM ESTDuring the opening keynote at Intel's Innovation event in San Jose, Chief Executive Officer Pat Gelsinger unveiled a score of details about the upcoming Meteor Lake client platform. Intel's Meteor Lake marks the beginning of a new era for the chipmaker, as they move away from the chaotic Intel 7 node and go into a rollout of their Foveros 3D packaging with EUV lithography for their upcoming client mobile platform. Meteor Lake uses a tiled, disaggregated chiplet architecture for its client-centric processors for the first time, changing the very nature of Intel's consumer chips going forward. And, according to Intel, all of these changes have allowed them to bring some significant advancements to the mobile market.
Intel's first chiplet-based consumer CPU breaks up the common functions of a modern CPU into four individual tiles: compute, graphics, SoC, and an I/O tile. Within the makeup of the compute tile is a new pair of cores, a P-core named Redwood Cove and a new E-core called Crestmont. Both these cores promise IPC gains over their previous counterparts, but perhaps the most interesting inclusion is a new type of E-core embedded directly into the SoC tile, which Intel calls 'Low Power Island.' These new LP E-cores are designed with the idea that light workloads and processes can be taken off the more power-hungry compute tile and offloaded onto a more efficient and lower-powered tile altogether. Other major additions include a first-for-Intel Neural Processing Unit (NPU), which sits within the SoC tile and is designed to bring on-chip AI capabilities for workloads and inferencing, paving the way for the future.
With Meteor Lake, Intel is aiming to put themselves in a more competitive position within the mobile market, with notable improvements to compute core hierarchy, Intel's Xe-LPG Arc-based graphics tile looking to bolster integrated graphics capabilities, and an NPU that adds various AI advantages. Meteor Lake also sets the scene for Intel and modular disaggregation, with Foveros 3D packaging set to become a mainstay of Intel's processor roadmap for the future, with the Intel 4 process making its debut and acting as a stepping stone to what will become Intel's next mainstay node throughout its fabs, Intel 3.
Intel Meteor Lake: Intel 4 Using Foveros 3D Packaging
Intel's Meteor Lake architecture is not just another iteration in the long line of processor advancements; according to the company it's a revolutionary leap forward. Eloquently put by the executive VP and GM of Intel's Client Computing Group (CCG), Michelle Johnston Holhaus, at Intel's Tech Tour 2023 in Penang, Malaysia, she remarked that Intel had reached an inflection point in their client roadmap. Unveiling more details about Meteor Lake at the Intel Technology Tour in Malaysia, the architecture is an advancement from its current client processors regarding performance as we advance through Intel's '5 nodes in 4 years' roadmap.
Meteor Lake is built upon Intel's disaggregated architecture pushed through by Foveros packaging. This is designed to optimize both performance and energy efficiency. The architecture itself consists of four unique and distinct tiles connected via Intel's Foveros 3D packaging technology. This includes the compute tile, which is built on Intel 4, while the graphics tile is built on TSMC's N5 node. The other two tiles Intel implements within Meteor Lake are the SoC tile which acts as the central hub through the embedded NOC. This is the first time Intel has used a Network-on-Chip (NOC) on their client processors, which is a simplified approach to the NOC on their existing Agilex FPGAs. While on Agilex, the NOC is individualized into different NoC targets and switches within the substructure of the NoC, on Meteor Lake, it directly connects to the I/O fabric through the IoC, which then goes into the I/O fabric. The NOC itself is directly connected to the graphics tile, compute tile, and other components within the SoC.
This modular approach allows for a ground-up and scalable power management architecture that supports disaggregation, enabling each tile to function independently. This disaggregated design prioritizes performance by negating bandwidth bottlenecking through things like the I/O on a monolithic design and targets enhanced power efficiency. Perhaps the most notable element of disaggregation is that Intel can select specific silicon processes for each tile and isn't limited to one process node. On top of the power efficiency and package area benefits of a tiled architecture, it is cheaper for Intel to manufacture CPUs with fewer masks through EUV, but it allows Intel to scale out new IP into future products while keeping the same base, which is another cost-saving factor (for Intel).
Compared to mobile Raptor Lake, which was done using Multi-Chip Packaging (MCP), Meteor Lake uses Foveros BGA packaging and offers low-power die-to-die interconnects, which Intel confirmed has a small power penalty of between 0.15 and 0.3 picojoules (pJ) communicating from tile to tile. Some of the advantages of Foveros include better customizability through tiling, which allows Intel to manufacture chips and implement specific tiles and IP depending on the grade of the chip, etc., low power with more I/O, or high-end tiles with all the latest gadgets and gizmos. With the Intel 7 node not being as viable as they would have hoped, Intel promises higher wafer yield on Intel 4, which uses less wafer space for logic-based silicon.
Power management is done using a scalable power management system that supports the independent functioning of each tile. Coordination between multiple power management controllers (PMC) and system software is designed to be optimized for various workloads. Intel's Meteor Lake architecture also introduces a new scalable fabric to improve energy efficiency and extend bandwidth in areas that have been previously bottlenecked, such as I/O.
Touching more on the power controllers within the Meteor Lake architecture, Intel has integrated independent Power Management Controllers within each of the tiles. As part of the disaggregation of using Foveros, each tile needs to be independently power managed, and using PMCs on the NOC, the I/O fabric, as well as each tile allows power management to be agnostic depending on the number of the cores on each package.
Meteor Lake itself represents a monumental architectural shift, not just a mere incremental update, as it represents the most significant architectural transformation in client processors in four decades. This is because it's the first client processor to be made using chiplets instead of a monolithic design. The architecture is designed to be the cornerstone of Intel's strategy to push PC innovation for the next decade. Looking at some of the finer details of Intel's Meteor Lake architecture, it uses Intel's Foveros packaging technology, which uses 3D chip stacking to tackle the pitfalls of traditional 2D chip layouts.
As we can see from the above disclosure from Intel at Hot Chips 2023, the top and bottom layers have bumps for interconnecting each die together. Using the Foveros FDI packaging offers a low-voltage complementary metal oxide semiconductor (CMOS) interface, which means the power circuitry can be run with lower voltages and, thus, with a lower power envelope. Another benefit of FDI is both synchronous and asynchronous signaling, which means signal transmission can handle fully duplexed data blocks.
Constructing a Meteor Lake SoC includes a package substrate, which is the foundation for the base tile to sit onto, which uses the Foveros Die Interconnect (FDI). The base tile has a 36µm die to die pitch with metal layering and a 0.15 to 0.3 pJ operating power at 2 GHz; this may fluctuate or vary depending on voltages, amps, and frequency. As it's a base tile that isn't an active chiplet itself, its only function is to serve as a base for all of the various logic, and metal layers for the chiplets to be placed upon.
The makeup of Intel's Meteor Lake architecture uses four distinct tiles to create a Meteor Lake CPU. This includes a Compute, SoC, GPU, and an I/O tile, all with distinct uses, capabilities, and flexibility in regard to IP. Power management also gets a revamp. With disaggregation through Intel's Foveros packaging technology, each tile in Meteor Lake requires its own power management. Intel's solution is a hierarchical power management system that uses Power Management Controllers on the NoC, IO fabric, and each individual tile.
Over the next few pages, we'll give an insight into each of the four tiles, what each tile brings to the table, and more about the various technologies driving innovation through Meteor Lake.
107 Comments
View All Comments
FWhitTrampoline - Wednesday, September 20, 2023 - link
I'm more focused the on eGPU usage for OCuLink so I'm not stating that TB4/USB4 connectivity does not have its usage model for your use case. But pure PCIe is lowest latency for eGPU usage and can be easily adopted by more OEMs than just GPD for their handhelds as that OCuLink will work with any makers' GPUs as long as one is using an OCuLink capable eGPU adapter or enclosure.And ETA Prime has extensively tested OCuLink adapters with plenty of Mini Desktop PCs and even the Steam Deck(M.2 slot is only PCIe 3.0 capable). It's the 64Gbs on any PCIe 4.0/x4 connection(M.2/NVMe or other) that's what good for eGPUs via OCuLink relative to the current bandwidth of TB4/USB4 40Gbs.
Exotica - Wednesday, September 20, 2023 - link
I’ve seen those videos and the performance advantages for EGPUs. But most of the EGPUs in the market use alpine ridge. A chipset known to reserve bandwidth for DP and have less available for PCIe (22 Gbps). Perhaps there may be one or two based on Titan ridge with slightly more pcie bandwidth. It’s hard to say how barlow ridge will perform in terms of the amount of pcie bandwidth made available to peripherals. But a 64 Gbps pcie connection will not saturate the 80 Gbps link so hopefully we can have most of the available 64 Gbps pcie bandwidth. Another problem with occulink is that there’s no power delivery so you need to have a separate wire for power.So Barlow ridge TB5 has the potential to be a one cable solution, power upto 240W, pcie up to 64 Gbps, and it will also tunnel DisplayPort. Occulink is cool. But thunderbolt tunnels more capabilities over the wire.
FWhitTrampoline - Wednesday, September 20, 2023 - link
OCuLink is lower latency as was stated in the earlier posts! And TB4/TB# or USB4/USB# will not be able to beat Pure PCIe connectivity for low latency and latency is the bigger factor for gaming workloads. TB tunneling protocol encapsulation of PCIe/Any other Protocol will add latency the result of having to do the extra encoding/encapsulation and decoding/de-encapsulation steps there and back whereas OCuLink is just unadulterated PCIe passed over an external cable.More Device makers need to be adding OCuLink capability to their systems as that's simple to do and requires no TB#/USB4-V# controller chip to be hung off of MB PCI lanes as the OCuLink port is just passing PCIe signals outside of the device. And TB5/USB4-V2 is more than 64Gbs but that will require more PCIe lanes be attached to the respective TB5/USB4-V2 controller and use more overheard to do that whereas if one has the same numbers of PCIe lanes connected via OCuLink then that's always going to be lower overhead with more available/usable bandwidth and lower latency for OCULink.
Most likely the PCIe lane counts will remain at 4 lanes Max and that will just go from PCIe 4.0 to PCIe 5.0 instead to support TB5 and USB4-V2 bandwidth but whatever PCIe standard utilized OCuLink will always have lower overhead and lower latency than TB/Whatever or USB4/Whatever as with OCuLink that's skipping the extra tunneling protocol steps required.
Plus by extension and with any OCuLink Ports being pure PCIe Protocol Based, that opens up the possibility of OCuLink to TB/USB/Whatever Adapters being utilized for maximum flexibility for other use cases as well.
Exotica - Wednesday, September 20, 2023 - link
OCulink has merit for sure, but again, it is clunky. Unlike thunderbolt, it doesn't tunnel displayport or provide power delivery. It also doesn't support hotplugging. That is why it will most likely remain a niche offering. Also you're saying OcCulink is lower latency, but by how much? Where is the test data to prove that ?And does it really matter? Operating systems can be run directly off of thunderbolt NVME storage, the latency is low enough for a smooth experience. And even if OcCulink is technically faster, a GPU such as a 4080 or 4090 or 7900XTX in a PCIe4x4 or even PCIe5x4 eGPU thunderbolt 5 enclosure will be much faster than the iGPU or even internal graphics. And if the eGPU enclosure is thunderbolt enabled, it can power the laptop or host device and probably act as a dock and provide additional downstream thunderbolt ports and possibly USB as well. Thunderbolt provides flexibility that OcCulink does not. Both standards have merit.
But I have a feeling Thunderbolt 5, if implemented properly in terms of bug-free firmware NVMs from Intel, will gain mass market appeal. The mass market is hungry for the additional bandwidth. AsMedia will probably do extremely well as well with its USB4 and upcoming USB4v2 offerings.
TheinsanegamerN - Thursday, September 21, 2023 - link
Dont waste your time, Trampoline is an OCUlink shill who will ignore any criticism for his beloved zuckertech. The idea that most people dont want to disassemble a laptop to use a dock is totally alien to him.FWhitTrampoline - Thursday, September 21, 2023 - link
LOL, OCuLink's creator PCI-SIG is a not for profit Standards Organization that's responsible for the PCIe standards so it's not like they are any Business Interest with a Fiduciary responsibility to any investors.OCuLink is just a Port/electrical PCIe extension cabling standard that was in fact originally intended to be used in consumer products but Intel, a member of PCI-SIG along with other industry members, had a vested interest in that Intel/Apple co-developed Thunderbolt IP, because of TB controllers and sales of TB controllers related interests.
And TB4/Later and USB4/Later will never have as low latency owing to the fact that any PCIe signalling will have to be intercepted and encapsulated by the TB/USB/Whatever protocol controller in order to be sent down the TB cabling whereas over the OCuLink ports/cabling that's just the PCIe signalling/packets there and no extra delays there related to any extra tunneling protocol encoding/encapsulation and decoding/de-encapsulating steps required.
So OCuLink represents the maximum flexibility as that's the better lowest latency solution for eGPUs being just pure unadulterated PCIe signaling. And because it's just PCIe that opens up the possibility of all sorts of external adapters that take in PCIe and can convert that to Display Port/HDMI/USB/TB/Whatever the end users need because all Motherboard external I/O, for the most part, is in the from of PCIe and OCulink just brings that PCIe directly out of devices via Ports/External cables.
And to be so dogmitacilly opposed to OCulink is the same as being opposed to PCIe! And does any rational person think that that's logical! OCuLink is External PCie and that's all there is to that and it's the lowest latency method to interface with GPUs via any PCIe Slot or externally via an OCuLink connection(PCIe is PCIe).
Give me a Laptop with at least One OCuLink PCIe X4/4.0 port and with that I can interface to an eGPU at 64Gbs bandwidth/lowest latency possible! And there can and will be adapters that can be plugged into that One OCulink port that can do what any other ports on the laptop can do because those ports are all just connected to some MB PCIe lanes in the first place.
Kevin G - Wednesday, September 20, 2023 - link
The main advantage of the TB4 is that the form factor is USB-C which can be configured for various other IO. This is highly desirable in a portable form factor like laptops or tablets. Performance is 'good enough' for external GPU usage. OCuLink maybe faster but doesn't have the flexibility like TB4 over the USB-C connector does. OCuLink has its niche but a mainstream consumer IO solution is not one of them.FWhitTrampoline - Thursday, September 21, 2023 - link
OCuLink is just externally routed PCIe lanes and really there can be one OCuLink port on every laptop specifically for the best and lowest latency eGPU interfacing and even OCuLink to HDMI/Display Port/whatever adapters that can make the OCuLink port into any other port at the end users discretion. So for eGPUs/Enclosures that have OCuLink ports that's 64Gbs/Lowest latency there and for any Legacy TB4/USB only external eGPU devices just get an OCuLink to TB4/USB4 adapter in the interim and live with the lower bandwidth and higher latency.GPD already has a line of Handheld Gaming devices that utilize a dedicated OCuLink port and a portable eGPU that supports both OCUlink interfacing and TB4/USB4 interfacing. And I do hope that GPD Branches out into the regular laptop market as GPD's external portable eGPU works with other makers products and even products that have M.2/NVMe capable slots available via an M.2/NVMe to OCuLink adapter! LOL, only Vested Interests would Object to OCuLink in the consumer market space, specifically those Vested Interests with Business Models that do not like any competition.
TheinsanegamerN - Thursday, September 21, 2023 - link
Because most people dont want to disassemble their laptop to plug in a m.2 adapter, you knucklehead.FWhitTrampoline - Thursday, September 21, 2023 - link
No one is forcing you to do that and for others that's an option, albeit and inconvenient one. But really the adapters are not meant for Laptops in the first place and even for Mini Desktop PCs is not an easy task there but still more manageable that doing that with a laptop. It would just better if there was more Mini Desktop PC OEMs/Laptops OEMs where those OEMs would adopt an OCuLink PCie 4.0/x4 Port for eGPU usage like GPD has done with their line of handheld gaming devices. And with mass adoption of OCuLink there could also be adapters as well to support all the other standards as OCuLink being PCIe based by extension will support that as well.