Tiger Lake will be released with up to 6 cores for the -H version (45W). Strangely it will be also be fabbed at 10nm+, the same as Ice Lake, so I am not sure what kind of "transistor optimizations" it could have since it's the same node variant (unless Intel calls the node of Ice Lake "10nm", pretending that Cannon Lake and the beta 10nm-- node it was fabbed on never existed).
Since Alder Lake will be fabbed on 10nm++ perhaps they will manage to reach up to 8 cores, though still for up to -H parts; presumably with up to 6 cores for -U parts. So yes, 2022 will be the year of Alder Lake laptops - assuming minimal delays. Since Alder Lake will sport Golden Cove cores its 7nm successor that will be released in 2023 (at the *absolute* earliest), tentatively called "Meteor Lake" as of today, should sport Ocean Cove cores. That's the μarch Jim Keller and his team have been developing for a few years now, which is said to be Intel's *true* next-gen μarch (Intel's Zen equivalent).
New architecture (Sunny Cove / Willow Cove) as well as a new process. Look at the most current 14nm and the very first 14nm... quite different.
Cannon Lake was 10nm-, the 2.7x density was too aggressive - so 10nm is 2x density and 10nm+ is the 2.7x density increases. Tiger Lake is 10nm+ and would think that ICL Xeon would be as well.
Sara Jackson Makes $140 to $180 per day online work and i received $16894 in one month online acting from home.I am a daily student and work simply one to a pair of hours in my spare time.Everybody will do that job and monline akes extra cash by simply open this link... WWW.iⅭash68.ⅭOⅯ
2 completely different markets. Most hyperscalers are interested in bfloat16 and the high socket count - so makes sense. The people buying Ice Lake Xeons are going to be replacing 2 socket with 2 socket.
2022 is the general launch date for that node - 1st will be Xe HPC (Ponte Vecchio) and Sapphire Rapids (Xeon) - I am sure there will be a 7nm laptop SOC - but I doubt it will be in the initial 7nm launch.
Performance fixes. Security fixes. It's all lipstick on a pig.
Does anyone have any faith/trust in SGX or TSX after the various security debacles? Can Intel really be trusted with total memory encryption given their track record with, you know, security vulnerabilities and system crippling exploits?
Intel need to go back to the drawing board and start from scratch, eliminating all the security shortcuts they took in their misguided quest for speed. Tacking on new instructions to work around their defective and compromised microarch isn't going to cut it.
Why wait 2 years for "new" (basically tweaked) hardware from Intel that only papers over the cracks when you can instead buy the competition today with more performance, fewer security worries and significantly cheaper. Win, win, win.
I've seen the TPAUSE, UMONITOR, and MOVDIR instructions associated with fast accelerator accesses in a Tremont description. Interesting that they appear in Sapphire Rapids.
Were any of the new instructions associated with the CXL operations?
Interesting in that they will be used with Intel Xe, Intel Agilex (FPGA) and the Habana AI stuff - and programmed under One API. They are all gunning for CXL which is transported over PCIe5 - which is already in Agilex and will be in Sapphire Rapids.
The problem for Intel is if they removed all the shortcuts then the performance would suffer - quite possibly to the point where even the fastest Intel CPUs were slower than AMD's CPUs. Intel already has the problem that its current monolithic architecture can not match the number of CPU cores per package that AMD can get with its chiplet design (eg in the EPYC 7742). If Intel moved from its current monolithic design to a chiplet design like AMD and fixed the security busting shortcuts then it might well have difficulty matching the performance of the current AMD Zen 2 devices at the time that AMD Zen 3 is on the market.
Why it would have problems matching the performance of Zen 3? Ice Lake in mobile form factor (so smaller cache size) already has 10% better IPC than Zen 2. Given Zen 3 is rumoured to have 10-15% better IPC than Zen 2, then it would be a tie between Ice Lake core for server and Zen 3. The biggest advantage that AMD has IMO is 7nm process, courtesy of TSMC. Chiplets wouldn't be such a success with an old process like 14nm from Intel. Efficiency would be crap. So all in all, sure, Intel is blamable for its stupid mistakes related to security. But there isn't really much they can do given the process issue still persisting and TSMC going along so well.
And again, the Zen 2 core is nothing spectacular on its own, it barely overtakes Skylake in IPC, which is ancient now by technology standards. Again, 90% of AMD success is thanks to TSMC, not a real technical superiority that AMD has.
too bad most places, ice lake is no were to be found. " it barely overtakes Skylake in IPC" um yea ok. clock any intel cpu the same as Zen 2, and see what happens, the only reason intel has any performance advantage, is because of clock speed, and it gets that performance, while using more power.
The problem is that the basic Sky Lake core is essentially what Coffee Lake currently is. Matching IPC means the the performance difference stems mostly from clock speeds which Intel has a clear edge at the high end but due to their product segmentation they’re keeping their midrange and lowend chips handicapped a bit in terms of clocks and cache.
The other factor is that for those keeping up-to-date on security patches, Intel’s platforms have gotten slower over time because of them. This varies based upon workload as I haven’t felt much of a difference at home but an older box at work did indeed take a 25% performance hit (IO heavy and uses VMs). AMD’s technical superiority here clear and all they had to do was not screw up security as bad (they’re not perfect).
Not a single one of those "security issues" can be exploited in the wild and can not even be reliably exploited in a lab environment.
Intel didn't need to salvage every single tiny little piece of silicon because they made a TERRIBLE deal with TSMC. Intel was able to make very large monolithic dies that TSMC an only approach with a GPU (alot more forgiving of defects).
If Intel 14nm followed the TSMC model - 14+ would be 13nm, 14++ would be 12nm and 14+++ would be 11nm - and we know Intel's 10nm+ is denser than any of TSMCs "7nm" fluff.
If I understand this right, it seems that BF16 will be hard to find in the near term. That is, if I want to use it for inference on the client, no dice.
Yes, the only CPUs set to have it are server chips, one of which won't even be generally available (i.e. Cooper Lake).
For inference, you'll probably find fp16 more than adequate, and still better than int8. The two main reasons BFloat16 is gaining popularity are: better training efficiency and smaller silicon footprint. Easy conversion to fp32 is an added bonus, but Intel had fp16/fp32 conversion instructions since Ivy Bridge.
No one in their right mind would use fp16 or bfp16 for inference. It makes zero sense because even INT4 is enough. Nvidia gets it, Intel still doesn’t.
"hard fixed-point and IEEE 754 compliant hard floating-point variable precisiondigital signal processing (DSP) blocks providing up to 40 TFLOPS of FP16 orBFLOAT16 compute performance"
The main advantage of bfloat16 is that it has been demonstrated to substitute well for fp32 in training. In Intel's case Cooper Lake avx512 should be able to double the fp32 equivalent training operations per cycle.
"Our results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters. "
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
34 Comments
Back to Article
soresu - Wednesday, April 1, 2020 - link
Wow, they really didnt learn their lesson from the AVX512 fragmentation mess at all.ksec - Wednesday, April 1, 2020 - link
Exactly, I think what Intel is trying to do is keeping Cooper Lake for 4S/ 8S, while Icelake for 1S and 2S without BFLOAT16.This is just another sign they are rushing ahead to 7nm. But then Alder Lake is 10nm++. So there wont be Intel 7nm in Laptop Chip even in 2022?
Santoval - Thursday, April 2, 2020 - link
Tiger Lake will be released with up to 6 cores for the -H version (45W). Strangely it will be also be fabbed at 10nm+, the same as Ice Lake, so I am not sure what kind of "transistor optimizations" it could have since it's the same node variant (unless Intel calls the node of Ice Lake "10nm", pretending that Cannon Lake and the beta 10nm-- node it was fabbed on never existed).Since Alder Lake will be fabbed on 10nm++ perhaps they will manage to reach up to 8 cores, though still for up to -H parts; presumably with up to 6 cores for -U parts. So yes, 2022 will be the year of Alder Lake laptops - assuming minimal delays. Since Alder Lake will sport Golden Cove cores its 7nm successor that will be released in 2023 (at the *absolute* earliest), tentatively called "Meteor Lake" as of today, should sport Ocean Cove cores. That's the μarch Jim Keller and his team have been developing for a few years now, which is said to be Intel's *true* next-gen μarch (Intel's Zen equivalent).
Deicidium369 - Tuesday, April 28, 2020 - link
New architecture (Sunny Cove / Willow Cove) as well as a new process. Look at the most current 14nm and the very first 14nm... quite different.Cannon Lake was 10nm-, the 2.7x density was too aggressive - so 10nm is 2x density and 10nm+ is the 2.7x density increases. Tiger Lake is 10nm+ and would think that ICL Xeon would be as well.
Skylake is Intel's Zen equiv.
ashleysguy43 - Sunday, April 5, 2020 - link
Sara Jackson Makes $140 to $180 per day online work and i received $16894 in one month online acting from home.I am a daily student and work simply one to a pair of hours in my spare time.Everybody will do that job and monline akes extra cash by simply open this link... WWW.iⅭash68.ⅭOⅯDeicidium369 - Tuesday, April 28, 2020 - link
2 completely different markets. Most hyperscalers are interested in bfloat16 and the high socket count - so makes sense. The people buying Ice Lake Xeons are going to be replacing 2 socket with 2 socket.2022 is the general launch date for that node - 1st will be Xe HPC (Ponte Vecchio) and Sapphire Rapids (Xeon) - I am sure there will be a 7nm laptop SOC - but I doubt it will be in the initial 7nm launch.
mode_13h - Wednesday, April 1, 2020 - link
Well, if the main use case for BFloat16 is training, then you're not going to have people doing a lot of training on their laptop CPUs.It'd be great to have all instructions everywhere, but when you're talking about 512-bit, some of those extensions eat a lot of die space.
CityBlue - Wednesday, April 1, 2020 - link
Performance fixes. Security fixes. It's all lipstick on a pig.Does anyone have any faith/trust in SGX or TSX after the various security debacles? Can Intel really be trusted with total memory encryption given their track record with, you know, security vulnerabilities and system crippling exploits?
Intel need to go back to the drawing board and start from scratch, eliminating all the security shortcuts they took in their misguided quest for speed. Tacking on new instructions to work around their defective and compromised microarch isn't going to cut it.
Why wait 2 years for "new" (basically tweaked) hardware from Intel that only papers over the cracks when you can instead buy the competition today with more performance, fewer security worries and significantly cheaper. Win, win, win.
azfacea - Wednesday, April 1, 2020 - link
What good is a new instruction, if your products exist only in slidesJayNor - Wednesday, April 1, 2020 - link
I've seen the TPAUSE, UMONITOR, and MOVDIR instructions associated with fast accelerator accesses in a Tremont description. Interesting that they appear in Sapphire Rapids.Were any of the new instructions associated with the CXL operations?
Deicidium369 - Tuesday, April 28, 2020 - link
Interesting in that they will be used with Intel Xe, Intel Agilex (FPGA) and the Habana AI stuff - and programmed under One API. They are all gunning for CXL which is transported over PCIe5 - which is already in Agilex and will be in Sapphire Rapids.Duncan Macdonald - Wednesday, April 1, 2020 - link
The problem for Intel is if they removed all the shortcuts then the performance would suffer - quite possibly to the point where even the fastest Intel CPUs were slower than AMD's CPUs.Intel already has the problem that its current monolithic architecture can not match the number of CPU cores per package that AMD can get with its chiplet design (eg in the EPYC 7742). If Intel moved from its current monolithic design to a chiplet design like AMD and fixed the security busting shortcuts then it might well have difficulty matching the performance of the current AMD Zen 2 devices at the time that AMD Zen 3 is on the market.
yeeeeman - Wednesday, April 1, 2020 - link
Why it would have problems matching the performance of Zen 3? Ice Lake in mobile form factor (so smaller cache size) already has 10% better IPC than Zen 2. Given Zen 3 is rumoured to have 10-15% better IPC than Zen 2, then it would be a tie between Ice Lake core for server and Zen 3.The biggest advantage that AMD has IMO is 7nm process, courtesy of TSMC.
Chiplets wouldn't be such a success with an old process like 14nm from Intel. Efficiency would be crap.
So all in all, sure, Intel is blamable for its stupid mistakes related to security. But there isn't really much they can do given the process issue still persisting and TSMC going along so well.
yeeeeman - Wednesday, April 1, 2020 - link
And again, the Zen 2 core is nothing spectacular on its own, it barely overtakes Skylake in IPC, which is ancient now by technology standards.Again, 90% of AMD success is thanks to TSMC, not a real technical superiority that AMD has.
Qasar - Wednesday, April 1, 2020 - link
too bad most places, ice lake is no were to be found. " it barely overtakes Skylake in IPC" um yea ok. clock any intel cpu the same as Zen 2, and see what happens, the only reason intel has any performance advantage, is because of clock speed, and it gets that performance, while using more power.name99 - Wednesday, April 1, 2020 - link
"90% of success is showing up"And 90% of business success is knowing what to do in-house, and what to sub-contract out to experts...
mode_13h - Wednesday, April 1, 2020 - link
You're forgetting the decade (2005 to 2015, roughly) where Intel basically lead the world in semiconductor manufacturing, by up to a couple years.Intel's mistake wasn't failure to subcontract out, but rather its failure to adequately invest in protecting that competitive advantage.
Kevin G - Wednesday, April 1, 2020 - link
The problem is that the basic Sky Lake core is essentially what Coffee Lake currently is. Matching IPC means the the performance difference stems mostly from clock speeds which Intel has a clear edge at the high end but due to their product segmentation they’re keeping their midrange and lowend chips handicapped a bit in terms of clocks and cache.The other factor is that for those keeping up-to-date on security patches, Intel’s platforms have gotten slower over time because of them. This varies based upon workload as I haven’t felt much of a difference at home but an older box at work did indeed take a 25% performance hit (IO heavy and uses VMs). AMD’s technical superiority here clear and all they had to do was not screw up security as bad (they’re not perfect).
Jorgp2 - Wednesday, April 1, 2020 - link
you're wasting your breath.These people don't think before they speak
kaspar737 - Wednesday, April 1, 2020 - link
Ice Lake 10% better IPC than Zen 2 if security mitigations are disabled?Deicidium369 - Tuesday, April 28, 2020 - link
Not a single one of those "security issues" can be exploited in the wild and can not even be reliably exploited in a lab environment.Intel didn't need to salvage every single tiny little piece of silicon because they made a TERRIBLE deal with TSMC. Intel was able to make very large monolithic dies that TSMC an only approach with a GPU (alot more forgiving of defects).
If Intel 14nm followed the TSMC model - 14+ would be 13nm, 14++ would be 12nm and 14+++ would be 11nm - and we know Intel's 10nm+ is denser than any of TSMCs "7nm" fluff.
Jorgp2 - Wednesday, April 1, 2020 - link
It's crazy how many people have no idea what they're going on aboutmode_13h - Wednesday, April 1, 2020 - link
Thanks for the useless comment. Try enlightening us.Qasar - Thursday, April 2, 2020 - link
why would he ? timecop1818 says things like this, and he never does :-) :-)Sahrin - Wednesday, April 1, 2020 - link
>Tiger Lake (what we know so far) is a quad-core mobile chip due for launch at the end of 2020Uh...good luck with that, Intel.
mode_13h - Wednesday, April 1, 2020 - link
I believe it'll launch. Whether or not you can actually buy them is a separate question.PaulHoule - Wednesday, April 1, 2020 - link
If I understand this right, it seems that BF16 will be hard to find in the near term. That is, if I want to use it for inference on the client, no dice.Is that right?
mode_13h - Wednesday, April 1, 2020 - link
Yes, the only CPUs set to have it are server chips, one of which won't even be generally available (i.e. Cooper Lake).For inference, you'll probably find fp16 more than adequate, and still better than int8. The two main reasons BFloat16 is gaining popularity are: better training efficiency and smaller silicon footprint. Easy conversion to fp32 is an added bonus, but Intel had fp16/fp32 conversion instructions since Ivy Bridge.
p1esk - Thursday, April 2, 2020 - link
No one in their right mind would use fp16 or bfp16 for inference. It makes zero sense because even INT4 is enough. Nvidia gets it, Intel still doesn’t.Machinus - Thursday, April 2, 2020 - link
14nm is not a real product anymore.JayNor - Thursday, April 2, 2020 - link
Intel's Agilex FPGAs also support bfloat16, according to this:https://www.intel.com/content/dam/www/programmable...
"hard fixed-point and IEEE 754 compliant hard floating-point variable precisiondigital signal processing (DSP) blocks providing up to 40 TFLOPS of FP16 orBFLOAT16 compute performance"
JayNor - Thursday, April 2, 2020 - link
The main advantage of bfloat16 is that it has been demonstrated to substitute well for fp32 in training. In Intel's case Cooper Lake avx512 should be able to double the fp32 equivalent training operations per cycle.https://arxiv.org/abs/1905.12322
"Our results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters. "
abufrejoval - Monday, April 6, 2020 - link
Never read beyond April 1st...Yet I wonder, why I didn't see it pop up before? Did it get delayed in some pipeline?
Where is the Renoir vs. iCan'tCompeteAny14ore comparison promised for today?
JayNor - Sunday, April 12, 2020 - link
MSFT uses ms-fp8 in their Brainwave project, which is about 3x faster than int8 on an intel stratix 10 or aria 10 fpga.