Probably, although there is always the possibility they moved their ARM design team from designing SoC cores to Server chip cores. They are in the process of porting their entire software stack to have full ARM support in order to target edge servers such as 5G radio area networks. It's possible with the addition of Mellanox they want to offer most of the high margin hardware for that space: the CPU, the GPU, and the interconnect. Just a possibility...
What I'm most interested to see, is how this will play into Nintendo's future device strategy. While I think choosing the Tegra X1 for their current console made sense, I see this as causing them long term issues. I feel fairly certain Nvidia game them a discount on the X1 since it was designed for mobile devices, but had no takers. As Nvida has designed subsequent generations of Tegra chips, they've moved further and further away from chips that would work well in a future mobile console. I fear that this puts Nintendo in a situation similar to where they were at with the WiiU - working with a vendor that will do the bare minimum to increase performance - because its no longer in there interests to develop the line. As their technology falls further behind, they get into an untenable position. The WiiU used processors designed in the late 90s, and I firmly believe it was part of the consoles failure. I don't believe you have to be the fastest, but I do believe you need to be current.
The PowerPC 750 derivative they used is a sexy little beast and a poster child of efficiency for its time: it had OoOE, had branch prediction, caches and came in at under 7 million transistors! I think it was about as simple as it could have been for the features it had which I think is great.
Nintendo is always leading with more interesting interfaces and form factors rather than processing horse power. With that in mind, I think Qualcomm would actually be the most natural partner for them to work with for their next gen. console given their investment into VR and AR. They could probably get a deal on some long in the tooth 835 derivatives and release a proper Virtual Boy 2... :)
To sing more praises of the 750, the radiation hardened versions (RAD750) are commonly found on space probes like the Curiosity rover. The part is well proven and loved for a reason.
I don't disagree that the 750 was a good chip for its time. I'm also aware of the RAD750. I'm also aware that it was used because 1. The specs for the Curiosity rover were finalized years before its launch (testing began in the 2004-2005 timeframe) putting it much closer to the original PPC750 release. And 2. Developing a new radiation hardened chip is not economically advantageous for the companies that sell them (complex tooling and design process for literally just a few chips). They are finally designing a replacement for the RAD750 specifically because its so old.
As to it being the poster child of efficiency, I think you nailed it on the head when you said 'for its time', which was 1998. the wii U was released in 2012. By that time there were processors, from a myriad of vendors and price points, that exceeded the WiiU's triple core PPC750 (which was never designed for multi processors). They were bolting on features to outdated architecture to try to eeak out enough performance to not bottelneck their low end graphics solution, and they were making compromises to do it. I love classic computing and I actually think it was pretty cool that they were able to take the 750 that far - but I don't think it was a wise business decision.
As to using Qualcomm, I don't think its a good fit. Qualcomm tailors there chip designs for the much higher volume of mobile phone sales. At lease with the switch, they can claim that at the time of the switches original release that it was competitive with the other mobile platforms graphics wise. They wouldn't be able to do that with a Qualcomm solution. Long term, if AMD were more committed to the ARM platform, they would be an ideal choice. They have quite a bit of experience with Semi-Custom and can provide top-tier graphics capabilities. Or perhaps I'm wrong about Nvidia and an adapted Orin chip would do just the trick. I suppose we'll see. I just know that historically, Nintendo has failed to capitalize on newer technology in the way that they should. I agree that they primarily rely on gaming experience, but at some point the hardware needs to move forward as well. You could easily make the N64 into a mobile gaming device and add a couple of nun-chuck controllers - but that doesn't mean that people would buy it.
not true, it is coming out two years later, most likely on 8nm or 5nm EUV, it won't be that large 2 years later, just twice as many GPU cores as we already have in Xavier
Qualcomm still have the likes of 8cx and it's successors. And they did a customized version for Microsoft. I don't see why they can't/won't do it again for Nintendo.
As an additional counterpoint, I could sing the praises of the Z80 which still has modern incarnations in production and use. Its likewise a model of efficiency at utilizes only about 500k Transistors (thats a rough guess) :). It was also used in Video Game consoles. All that being said, I don't think it'd be a good decision to use it as a consoles primary compute processor.
When the Switch was announced, NVIDIA talked about a long-term partnership with Nintendo. I think something like 20 years was mentioned. I'm pretty sure the Switch has high enough volumes to warrant NVIDIA creating an SoC for it. That SoC can also be used for Shield devices, wearables, and future generations of infotainment processors. I don't think NVIDIA is interested in doing the bare minimum to increase performance. They will either make the SoC, in which case it will be up to Nintendo how much they want to pay to get what level of performance, or they won't. They will have all the pieces in place to make as powerful an SoC as Nintendo might request.
Nvidia has always chosen whichever CPU is more prudent at the moment - no vanity in forcing a CPU design into a product just to show off, like Samsung (or Qualcomm in the past). This isn't the first time they have "ditched" their in-house ARMv8 design "in favor" of ARM.
NVIDIA couldn't get past Qualcomm's stranglehold on the smartphone market due to Qualcomm's modem capabilities. NVIDIA accused Qualcomm of being anticompetitive and Qualcomm were eventually fined for just that, but by that time NVIDIA had exited the market. NVIDIA tried to target the tablet market but the non-ipad tablet market never really took off and they abandoned that too and pivoted to in-car infotainment systems. They found the margins there too low for their liking and so now are re-pivoting to autonomous machines (self driving cars and robots) and automotive AI cockpits, as well as the Nintendo Switch and the Shield TV, the last of which probably only still exists because of the Switch.
The main issue was that their Icera unit employed soft modems that were never going to be competitive with hardware accelerated solutions from other vendors.
Maybe. I never really saw any comparisons. There certainly was room at the time for vendors to try the soft modem or use two chips solutions. Qualcomm was found to have sold their modem chipsets below cost to force Icera out of the market, that is something I do know.
That was the EU's interpretation of a pricing discount on a very low volume of chip shipments whereas the average cost over the lifetime of a contract was in no way below cost or anticompetitive. Cost recovery is a standard business practice and didn't affect the longer term business decisions here: the main issue remains that Icera didn't have a competitive modem with which to secure a longer term contract.
Qualcomm's anti-competitive practices both affected the long term business decisions and the EU's decision. I'm not sure what you are trying to say here. If the EU felt that the reason Icera failed was because of an inferior product then there woul dbe no reason to issue the ruling.
LTE was not integrated into them. That is the issue. Apple had/have the same issue - if you see battery tests for iPhones on wifi or on 4G there is a stark contrast. While everyone praises high-power cores the beast lives in low power cores for background tasks and this is where A55 (or whatever) fails.
I just got a Shield that has a Tegra chip, same with the Nintendo Switch and the Jetson boards. Obviously you won't see the automotive chips in consumer tablets, and I think consumer tablets don't have any demand for faster and faster SoCs. Look at how old the SoCs in the Amazon tablets are.
Actually, let's do look at Amazon tablets. The Fire HD10 uses an Octo-Core High/Low CPU released by mediatek at the end of last year. The Cortex-A73 Processors that it utilizes were released by Arm in 2016. Given that the tablet can be had for $100, and how long it takes from the release of design to it actually making it into devices, a 3yr gap is reasonable. Amazon's SoCs aren't old, there just cheap - which is to be expected of very low priced devices. As to the demand, I'm pretty sure Apple and Microsoft's very profitable tablet divisions would disagree with you.
I think its better for nvidia to use ARM designs which are pretty good BTW. Even though nvidia has the resources and engineering team to do its own, they don't really need an ultra fast CPU. Their forte is in the gpu which is where they should put their efforts on.
You guys are comparing to the wrong chips, just like the other websites. Seriously, did Nvidia provide you with the list and you just copied it?
Nvidia has a very similar chip to this ALREADY OUT. It's called the Tesla T4. It's a 2060 SUPER underclocked to 75 Watts. It gets 130 TOPS. About 2 TOPS/Watt. So Compared to Current Gen, its 30% more efficient when underclocked.
While a T102 is 295Watts and 206 TOPS. With 30% Efficiency added, you get 267.8 Tops. And at a 320W Envelope you'd have about 300 TOPS.
_____ And for AI, judging by claims from Tesla, they only hit about 3.4 outta 30 TOPS real world (10%) Because TOPS is calculated synthetic, with easiest possible calculation (like 1+1 over and over). Real-world uses dont work like this.
Even in Resnet-50, only about 25% of the TOPS can be used, and that's still synthetic but compared to real-world it's best case. If you lower Batch size from 15-30 down to a 1 batch size, the performance is closer to what Tesla got, about 10%.
Tesla's design is getting 50% efficiency (37 of 74 TOPs). So even Orin at 200 TOPS would only get about 20 in real world. And that's for its intended purpose. Autonomous Driving.
Edit:: And if the CPU cores include the AI accelerators on it like SoC's for mobile phones and is included in TOPS calculation, then it's even further off. So 30% efficiency vs turing is best case. And that's with a node shrink.
Edit:: 50% more efficient than Tesla T4. Once you go up to gaming clock speeds, 30% or less is more likely. But most of performance will be in RT cores.
Drive Xavier (Pascal) was 30TOPS / 30 Watts. Turing (Non RT cores - 30% improvement) - 40TOPs/30 watts. x2 = 80TOPs/60 watts Turing with RT Cores Tesla T4 - 130TOPs, 70 watts. So about 50 TOPs is the RT cores. Add another 60TOPs in RT cores (about 77, so 150 Total). and 12.5% improvement to Turing itself (12 TOPs) = 200 TOPs. Or double RT cores to 128, (100 TOPs), and 20 TOPS from Arch enhancement, so 25% boost. = 200 TOPS.
"Nvidia has a very similar chip to this ALREADY OUT. It's called the Tesla T4."
The Tesla T4 (like the rest of the Tesla lineup) is an add-in accelerator card. The Drive series chipsa re SoCs that include not only the GPU, but also multiple CPU cores, interfaces (for a larger number of cameras and network ports), and other ancillary fixed-function blocks. The two are not comparable products.
And on top of that there's all the internal redundancy and fail-safe error handling that does into ASIL certification. In practice, the T4 is not a comparable product on that issue alone: you cannot just drop a few T4s into a system, shove it in the boot of a car, and go "here's our self-driving system!".
My point was the T4 is a current-gen product that you can compare performance against, instead of using 2-3gen old hardware to compare. And in Xavier, their DL TOPS (Nvidia GPUS itself) was only 20-21. they added 10TOPS by counting the arm core.
While the T4 is a current-gen that you can compare to future gen and get an idea of the improvement of their generation they are offering.
The Xavier was 30TOPS, but it was 20 TOPS per GPU (redundancy) and 10 TOPS arm core.
Where has Ian been for this stuff? He actually does research and makes sure what Nvidia or Intel or other companies provide in slides is actually relevant to what their current products are. It's easy to make slides look good when your comparing to a 4-5 year old arch by now.
I thought of Nvidia driving my car, then next I thought of their gaming Heritage, and the the place where many buy their games , the steam platform. I then realized in future I may have a steam powered car , Chitty-chitty-Bang-bang (2 large cores and two little cores)
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
33 Comments
Back to Article
Raqia - Wednesday, December 18, 2019 - link
It looks like nVidia is also starting to leave the custom ARM v8 core space in favor of ARM's designs.Yojimbo - Wednesday, December 18, 2019 - link
Probably, although there is always the possibility they moved their ARM design team from designing SoC cores to Server chip cores. They are in the process of porting their entire software stack to have full ARM support in order to target edge servers such as 5G radio area networks. It's possible with the addition of Mellanox they want to offer most of the high margin hardware for that space: the CPU, the GPU, and the interconnect. Just a possibility...blazeoptimus - Wednesday, December 18, 2019 - link
What I'm most interested to see, is how this will play into Nintendo's future device strategy. While I think choosing the Tegra X1 for their current console made sense, I see this as causing them long term issues. I feel fairly certain Nvidia game them a discount on the X1 since it was designed for mobile devices, but had no takers. As Nvida has designed subsequent generations of Tegra chips, they've moved further and further away from chips that would work well in a future mobile console. I fear that this puts Nintendo in a situation similar to where they were at with the WiiU - working with a vendor that will do the bare minimum to increase performance - because its no longer in there interests to develop the line. As their technology falls further behind, they get into an untenable position. The WiiU used processors designed in the late 90s, and I firmly believe it was part of the consoles failure. I don't believe you have to be the fastest, but I do believe you need to be current.Raqia - Wednesday, December 18, 2019 - link
The PowerPC 750 derivative they used is a sexy little beast and a poster child of efficiency for its time: it had OoOE, had branch prediction, caches and came in at under 7 million transistors! I think it was about as simple as it could have been for the features it had which I think is great.Nintendo is always leading with more interesting interfaces and form factors rather than processing horse power. With that in mind, I think Qualcomm would actually be the most natural partner for them to work with for their next gen. console given their investment into VR and AR. They could probably get a deal on some long in the tooth 835 derivatives and release a proper Virtual Boy 2... :)
Raqia - Wednesday, December 18, 2019 - link
To sing more praises of the 750, the radiation hardened versions (RAD750) are commonly found on space probes like the Curiosity rover. The part is well proven and loved for a reason.blazeoptimus - Thursday, December 19, 2019 - link
I don't disagree that the 750 was a good chip for its time. I'm also aware of the RAD750. I'm also aware that it was used because 1. The specs for the Curiosity rover were finalized years before its launch (testing began in the 2004-2005 timeframe) putting it much closer to the original PPC750 release. And 2. Developing a new radiation hardened chip is not economically advantageous for the companies that sell them (complex tooling and design process for literally just a few chips). They are finally designing a replacement for the RAD750 specifically because its so old.As to it being the poster child of efficiency, I think you nailed it on the head when you said 'for its time', which was 1998. the wii U was released in 2012. By that time there were processors, from a myriad of vendors and price points, that exceeded the WiiU's triple core PPC750 (which was never designed for multi processors). They were bolting on features to outdated architecture to try to eeak out enough performance to not bottelneck their low end graphics solution, and they were making compromises to do it. I love classic computing and I actually think it was pretty cool that they were able to take the 750 that far - but I don't think it was a wise business decision.
As to using Qualcomm, I don't think its a good fit. Qualcomm tailors there chip designs for the much higher volume of mobile phone sales. At lease with the switch, they can claim that at the time of the switches original release that it was competitive with the other mobile platforms graphics wise. They wouldn't be able to do that with a Qualcomm solution. Long term, if AMD were more committed to the ARM platform, they would be an ideal choice. They have quite a bit of experience with Semi-Custom and can provide top-tier graphics capabilities. Or perhaps I'm wrong about Nvidia and an adapted Orin chip would do just the trick. I suppose we'll see. I just know that historically, Nintendo has failed to capitalize on newer technology in the way that they should. I agree that they primarily rely on gaming experience, but at some point the hardware needs to move forward as well. You could easily make the N64 into a mobile gaming device and add a couple of nun-chuck controllers - but that doesn't mean that people would buy it.
Yojimbo - Friday, December 20, 2019 - link
The Orin chip is huge and expensive. No way it finds its way anywhere near a Switch or Switch follow-on.Alistair - Sunday, March 21, 2021 - link
not true, it is coming out two years later, most likely on 8nm or 5nm EUV, it won't be that large 2 years later, just twice as many GPU cores as we already have in Xavierlevizx - Sunday, December 22, 2019 - link
Qualcomm still have the likes of 8cx and it's successors. And they did a customized version for Microsoft. I don't see why they can't/won't do it again for Nintendo.blazeoptimus - Thursday, December 19, 2019 - link
As an additional counterpoint, I could sing the praises of the Z80 which still has modern incarnations in production and use. Its likewise a model of efficiency at utilizes only about 500k Transistors (thats a rough guess) :). It was also used in Video Game consoles. All that being said, I don't think it'd be a good decision to use it as a consoles primary compute processor.Yojimbo - Friday, December 20, 2019 - link
When the Switch was announced, NVIDIA talked about a long-term partnership with Nintendo. I think something like 20 years was mentioned. I'm pretty sure the Switch has high enough volumes to warrant NVIDIA creating an SoC for it. That SoC can also be used for Shield devices, wearables, and future generations of infotainment processors. I don't think NVIDIA is interested in doing the bare minimum to increase performance. They will either make the SoC, in which case it will be up to Nintendo how much they want to pay to get what level of performance, or they won't. They will have all the pieces in place to make as powerful an SoC as Nintendo might request.jeremyshaw - Thursday, December 19, 2019 - link
Nvidia has always chosen whichever CPU is more prudent at the moment - no vanity in forcing a CPU design into a product just to show off, like Samsung (or Qualcomm in the past). This isn't the first time they have "ditched" their in-house ARMv8 design "in favor" of ARM.TheinsanegamerN - Wednesday, December 18, 2019 - link
I miss the tegra chips. Shame Nvidia never bothered to support that line any further.drexnx - Wednesday, December 18, 2019 - link
something about them just didn't play nice with LTE modems, that's what really killed the Tegra lineYojimbo - Wednesday, December 18, 2019 - link
NVIDIA couldn't get past Qualcomm's stranglehold on the smartphone market due to Qualcomm's modem capabilities. NVIDIA accused Qualcomm of being anticompetitive and Qualcomm were eventually fined for just that, but by that time NVIDIA had exited the market. NVIDIA tried to target the tablet market but the non-ipad tablet market never really took off and they abandoned that too and pivoted to in-car infotainment systems. They found the margins there too low for their liking and so now are re-pivoting to autonomous machines (self driving cars and robots) and automotive AI cockpits, as well as the Nintendo Switch and the Shield TV, the last of which probably only still exists because of the Switch.Raqia - Wednesday, December 18, 2019 - link
The main issue was that their Icera unit employed soft modems that were never going to be competitive with hardware accelerated solutions from other vendors.Yojimbo - Wednesday, December 18, 2019 - link
Maybe. I never really saw any comparisons. There certainly was room at the time for vendors to try the soft modem or use two chips solutions. Qualcomm was found to have sold their modem chipsets below cost to force Icera out of the market, that is something I do know.Raqia - Wednesday, December 18, 2019 - link
That was the EU's interpretation of a pricing discount on a very low volume of chip shipments whereas the average cost over the lifetime of a contract was in no way below cost or anticompetitive. Cost recovery is a standard business practice and didn't affect the longer term business decisions here: the main issue remains that Icera didn't have a competitive modem with which to secure a longer term contract.Yojimbo - Friday, December 20, 2019 - link
Qualcomm's anti-competitive practices both affected the long term business decisions and the EU's decision. I'm not sure what you are trying to say here. If the EU felt that the reason Icera failed was because of an inferior product then there woul dbe no reason to issue the ruling.Karmena - Friday, December 20, 2019 - link
LTE was not integrated into them. That is the issue. Apple had/have the same issue - if you see battery tests for iPhones on wifi or on 4G there is a stark contrast. While everyone praises high-power cores the beast lives in low power cores for background tasks and this is where A55 (or whatever) fails.webdoctors - Wednesday, December 18, 2019 - link
???I just got a Shield that has a Tegra chip, same with the Nintendo Switch and the Jetson boards. Obviously you won't see the automotive chips in consumer tablets, and I think consumer tablets don't have any demand for faster and faster SoCs. Look at how old the SoCs in the Amazon tablets are.
Alistair - Wednesday, December 18, 2019 - link
That chip is based on 5 year old Cortex A57 cores (from the Galaxy Note 4...). That's not a new chip by any stretch of the imagination.blazeoptimus - Wednesday, December 18, 2019 - link
Actually, let's do look at Amazon tablets. The Fire HD10 uses an Octo-Core High/Low CPU released by mediatek at the end of last year. The Cortex-A73 Processors that it utilizes were released by Arm in 2016. Given that the tablet can be had for $100, and how long it takes from the release of design to it actually making it into devices, a 3yr gap is reasonable. Amazon's SoCs aren't old, there just cheap - which is to be expected of very low priced devices. As to the demand, I'm pretty sure Apple and Microsoft's very profitable tablet divisions would disagree with you.Stoly - Wednesday, December 18, 2019 - link
I think its better for nvidia to use ARM designs which are pretty good BTW.Even though nvidia has the resources and engineering team to do its own, they don't really need an ultra fast CPU. Their forte is in the gpu which is where they should put their efforts on.
Fataliity - Thursday, December 19, 2019 - link
You guys are comparing to the wrong chips, just like the other websites. Seriously, did Nvidia provide you with the list and you just copied it?Nvidia has a very similar chip to this ALREADY OUT. It's called the Tesla T4. It's a 2060 SUPER underclocked to 75 Watts. It gets 130 TOPS. About 2 TOPS/Watt. So Compared to Current Gen, its 30% more efficient when underclocked.
While a T102 is 295Watts and 206 TOPS. With 30% Efficiency added, you get 267.8 Tops. And at a 320W Envelope you'd have about 300 TOPS.
_____
And for AI, judging by claims from Tesla, they only hit about 3.4 outta 30 TOPS real world (10%)
Because TOPS is calculated synthetic, with easiest possible calculation (like 1+1 over and over). Real-world uses dont work like this.
Even in Resnet-50, only about 25% of the TOPS can be used, and that's still synthetic but compared to real-world it's best case. If you lower Batch size from 15-30 down to a 1 batch size, the performance is closer to what Tesla got, about 10%.
Tesla's design is getting 50% efficiency (37 of 74 TOPs).
So even Orin at 200 TOPS would only get about 20 in real world.
And that's for its intended purpose. Autonomous Driving.
Fataliity - Thursday, December 19, 2019 - link
Edit:: And if the CPU cores include the AI accelerators on it like SoC's for mobile phones and is included in TOPS calculation, then it's even further off. So 30% efficiency vs turing is best case. And that's with a node shrink.Fataliity - Thursday, December 19, 2019 - link
Edit:: 50% more efficient than Tesla T4. Once you go up to gaming clock speeds, 30% or less is more likely.But most of performance will be in RT cores.
Drive Xavier (Pascal) was 30TOPS / 30 Watts.
Turing (Non RT cores - 30% improvement) - 40TOPs/30 watts. x2 = 80TOPs/60 watts
Turing with RT Cores Tesla T4 - 130TOPs, 70 watts.
So about 50 TOPs is the RT cores.
Add another 60TOPs in RT cores (about 77, so 150 Total). and 12.5% improvement to Turing itself (12 TOPs) = 200 TOPs.
Or double RT cores to 128, (100 TOPs), and 20 TOPS from Arch enhancement, so 25% boost. = 200 TOPS.
BenSkywalker - Thursday, December 19, 2019 - link
When was Tesla using Xavier? I never saw any information on them using any of nVidia's tensor core products, can't find any now either.edzieba - Thursday, December 19, 2019 - link
"Nvidia has a very similar chip to this ALREADY OUT. It's called the Tesla T4."The Tesla T4 (like the rest of the Tesla lineup) is an add-in accelerator card. The Drive series chipsa re SoCs that include not only the GPU, but also multiple CPU cores, interfaces (for a larger number of cameras and network ports), and other ancillary fixed-function blocks. The two are not comparable products.
edzieba - Thursday, December 19, 2019 - link
And on top of that there's all the internal redundancy and fail-safe error handling that does into ASIL certification. In practice, the T4 is not a comparable product on that issue alone: you cannot just drop a few T4s into a system, shove it in the boot of a car, and go "here's our self-driving system!".Fataliity - Monday, December 23, 2019 - link
That's what the ARM core does. The redundancy.My point was the T4 is a current-gen product that you can compare performance against, instead of using 2-3gen old hardware to compare. And in Xavier, their DL TOPS (Nvidia GPUS itself) was only 20-21. they added 10TOPS by counting the arm core.
While the T4 is a current-gen that you can compare to future gen and get an idea of the improvement of their generation they are offering.
The Xavier was 30TOPS, but it was 20 TOPS per GPU (redundancy) and 10 TOPS arm core.
Fataliity - Thursday, December 19, 2019 - link
Where has Ian been for this stuff? He actually does research and makes sure what Nvidia or Intel or other companies provide in slides is actually relevant to what their current products are. It's easy to make slides look good when your comparing to a 4-5 year old arch by now.MASSAMKULABOX - Thursday, December 19, 2019 - link
I thought of Nvidia driving my car, then next I thought of their gaming Heritage, and the the place where many buy their games , the steam platform. I then realized in future I may have a steam powered car , Chitty-chitty-Bang-bang (2 large cores and two little cores)