Machine Inference Performance

The core aspects of the Xavier platform are its machine inferencing performance characteristics. The Volta GPU alongside the DLA core represent significant processing power in a compact and low-power platform.

To demonstrate the machine learning inference prowess of the system, NVIDIA provides the Jetson boards with a slew of software development kits as well as hand-tuned frameworks. The TensorRT framework in particular does a lot of heavy lifting for developers and represents the main API through which the GPU’s Tensor units as well as the DLA will be taken advantage of.

NVIDIA prepared a set of popular ML models for us to test out, and we’d be able to precisely configure the models in terms of how they were run on the platform. All the models running on the GPU and its Tensor core were able to run at either quantized INT8 forms, or in FP16 or FP32 forms. The batch sizes were also configurable, but we’ve kept it simple at just showcasing the results with a batch size of 32 images as NVIDIA claims this is the more representative use-case for autonomous machines.

Tegra Xavier AGX - NVIDIA TensorRT - GPU Performance

The results of the GPU benchmarks are a bit esoteric because we have few comparison points against which we can evaluate the performance of the AGX. Among the more clear results we see here is that the inferencing performance in absolute terms is reaching rather high rates, particularly in the INT8 and FP16 modes, representing sufficient performance to run a variety of inferencing tasks on a large number of input sets per second. The only real figure we can compare to anything in the mobile market is the VGG16 results compared to the AImark results in our most recent iPhone XS review, where Apple’s new NPU scored a performance of 39 inferences/second.

Tegra Xavier AGX - NVIDIA TensorRT - DLA vs GPU Performance

NVIDIA also made it possible to benchmark the DLA blocks, however this came with some caveats: The current version of the TensorRT framework was still a bit immature and thus doesn’t currently allow for running the models in INT8 mode, forcing us to resort to comparisons in FP16 mode. Furthermore I wasn’t able to run the tests with the same large batch size as on the GPU, so I’ve reverted to using smaller sizes of 16 and 8 where appropriate. Smaller batch sizes have more overhead as it takes proportionally longer time on the API side of things and less actual processing time on the hardware.

The performance of the DLA blocks at first glance seems a bit disappointing, as their performance is just a fraction of what the Volta GPU is able to showcase. However raw performance isn’t the main task of the DLA, it serves as a specialized offloading block which is able to operate at higher efficiency points than the GPU. Unfortunately, I wasn’t able to directly measure the power differences between the GPU and the DLA, as introducing my power measurement equipment into the DC power input of the board led to system instabilities, particularly during the current spikes when the benchmarks were launching their workloads. The GPU inference workloads did see the board power reach around ~45W while in its peak performance mode.

NVIDIA's VisionWorks Demos

All the talk about the machine vision and inferencing capabilities of the platform can be something that’s very hard to grasp if you don’t have a more intimate knowledge of the use-cases in the industry. Luckily, NVIDIA’s VisionWorks SDK comes with a slew of example demos and source code projects that one can use as a baseline for one’s commercial applications. Compiling the demos was a breeze as everything was set up for us on the review platform.

Alongside the demo videos, I also opted to showcase the power consumption of the Jetson AGX board. Here we’re measuring the power of the platform at the 19V DC power input with the board at its maximum unlimited performance mode. I had board’s own fan disabled (it can be annoyingly loud) and instead used an externally-powered 120mm bench fan blowing onto the kit. At a baseline power level, the board used ~8.7-9W while sitting idle and actively outputting to a 1080p screen via HDMI while also being connected to Gigabit Ethernet.

The first demo showcases the AGX’s feature tracking capabilities. The input source is a pre-recorded video to facilitate testing. While the video output was limited to 30fps, the algorithm was running in excess of 2-300fps. I did see quite a wide range of jitter in the algorithm fps, although this could be attributed to scheduling noise due to the low duration of the workload while in a limited FPS output mode. In terms of power, we see total system consumption hover around 14W, representing an active power increase of 5W above idle.

The second demo is an application of a Hough transform filter which serves as a feature extraction algorithm for further image analysis. Similarly to the first demo, the algorithm can run at a very high framerate on a single stream, but usually we’d expect a real use-case to use multiple input streams. Power consumption again is in the 14W range for the platform with an average active power of ~4.5W.

The motion estimation demo determines motion vectors of moving objects in a stream, a relatively straightforward use-case in automotive applications.

The fourth VisionWorks demo is the computational implementation of EIS (Electronic image stabilisation), were given an input video stream the system will crop out margins of the frame and use this space as the stabilisation window in which the resulting output stream will be able to elastically bounce against, reducing smaller juddery motions.

Finally, the most impressive demo which NVIDIA provided was the “DeepStream” demo. Here we see a total of 25 720p video input streams played back all simultaneously all while the system is performing basic object detection in every single one of them. This workload represented a much more realistic heavy use-case being able to take advantage of the processing power of the AGX module. As you might expect, power consumption of the board also rose dramatically, averaging around 40W (31W active work).

Introduction - What Is It? NVIDIA's Carmel CPU Core - SPEC2006 Speed
Comments Locked

51 Comments

View All Comments

  • speculatrix - Sunday, February 3, 2019 - link

    The problem then is a race to the bottom, showing more adverts to fewer people, with installation of ad-blockers accelerating because of that.
    You need to do a deal with context-based advertising operators like Grapeshot, which should massively improve both relevance and conversions.
  • rahvin - Friday, January 4, 2019 - link

    You should remind your publisher that the users of this site are tech savy and if the advertisements are annoying they will see a huge boost in adblocker use on the site or a corresponding drop is use. In fact I'd wager that after you turned those auto play video ads on ad block use went up 25% or more and unique views went down.

    We understand you need to make money, but your publisher is destroying the site with ads like this. They are the very definition of bad ad's, the only way to make them worse would be to embed malware in them at this point. No video ad should _ever_ be autoplay.
  • StevoLincolnite - Friday, January 4, 2019 - link

    I won't use the internet without an AdBlocker. - Remember years ago when one of Anandtechs adverts was propogating viruses? Not good.

    On mobile it's just a waste of CPU cycles/battery life.
  • Lolimaster - Friday, January 4, 2019 - link

    Firefox mobile + ublock
  • speculatrix - Sunday, February 3, 2019 - link

    Ghostery works well for me
  • ikjadoon - Friday, January 4, 2019 - link

    I'm OK with the autoplaying ad...but why does it have to scroll down with me? It covers up the article text once it jumps around the screen. :(
  • voicequal - Friday, January 4, 2019 - link

    Hopefully they will consider an ad-free subscription at some point. Google Contributor is one way to do it.
  • Ryan Smith - Saturday, January 5, 2019 - link

    It's something I want to do this year.=) However it's not my call to make, so I can't offer any promises.
  • speculatrix - Sunday, February 3, 2019 - link

    I pay actual money to Phoronix to go ad-free. I'd do the same for Anandtech and Arstechnica.
  • mr_tawan - Sunday, January 6, 2019 - link

    I used to have a problem with this video ad, when I read Anandtech at work (as I uses Citrix virtual desktop and it's slow as hell. Now I don't have this problem anymore.

    I've quit my job :).

Log in

Don't have an account? Sign up now