
Original Link: https://www.anandtech.com/show/1017
NVIDIA's nForce2 Part II: Diving Deeper
by Anand Lal Shimpi on October 21, 2002 4:05 PM EST- Posted in
- CPUs
When we first looked at NVIDIA's nForce2 platform it was alongside AMD's Athlon XP 2800+ at the beginning of this month. The only motherboard available was a hand picked ASUS A7N8X using pre-production nForce2 silicon.
We were disappointed in NVIDIA's launch of a product that has only recently begun mass production but we saw a great deal of potential in nForce2 as a platform. One of the biggest drawbacks of the original nForce was that it could not outperform VIA's cheaper KT333, which kept its incredible feature set out of the hands of most enthusiasts.
With the highly anticipated successor to NVIDIA's nForce, there were a number of technologies at work that nullified the performance debate. NVIDIA had finally produced a platform that could not only deliver an outstanding feature set but also perform just as well as its closest competitor from VIA.
In the weeks since our original nForce2 review we've been hard at work on a follow-up to Part I of our coverage. We left a number of loose ends with the first review, including the performance of the integrated graphics and a thorough comparison of 64 and 128-bit memory configurations; with this follow-up we're able to provide those data points as well as answer a number of questions that remained a mystery from the first review.
The timing of Part II couldn't have been better; both the nForce2 IGP and SPP are both in mass production and motherboards are starting to make their way into our West Coast Motherboard Evaluation labs. So before you start seeing reviews of the motherboards you'll be able to purchase soon let's delve deeper into the nForce2 chasm. There's a lot about this chipset that you don't know…
nForce2: More than meets the eye
In Part I we ran the vast majority of our benchmarks with a single stick of memory, utilizing only one of the two 64-bit memory controllers the nForce2 IGP/SPP is equipped with. Our reasoning was that the added bandwidth provided by a 128-bit memory interface is not utilized unless integrated graphics is enabled; this turned out to be both true and false.
If you'll remember back to our review of the original nForce chipset, the nForce-420 (128-bit dual channel DDR) was not any faster than nForce-220 (64-bit single channel DDR); the reason being that the Athlon XP was being provided enough memory bandwidth by a single 64-bit DDR channel, rendering the additional 64-bit channel relatively useless. The exception to this was if integrated graphics was enabled since, as we all know, graphics cores are very bandwidth dependent and will easily make use of any additional bandwidth they share with a CPU.
![]() |
![]() |
The nForce2 chipset behaves relatively similarly; if you look in most applications, the benefit from going to 128-bit DDR mode (DualDDR) is under 3% - less than the normal variance in these benchmarks. Unlike the original nForce however, there are some exceptions to the rule with nForce2.
Let's start out by taking a look at the performance gains resulting from running in DualDDR mode:
DDR
vs. DualDDR
|
||
Benchmark
|
DualDDR333
vs. DDR333
(% Gain) |
DualDDR400
vs. DDR400
(% Gain) |
Content Creation Winstone 2002 |
1.3%
|
1.1%
|
Business Winstone 2002 |
0.7%
|
0.6%
|
SYSMark 2002 - Internet Content Creation |
3.2%
|
-0.4%
|
SYSMark 2002 - Office Productivity |
0.0%
|
0.0%
|
3DSMAX5 - SinglePipe2.max |
0.5%
|
0.0%
|
3DSMAX5 - Underwater_Environment_Finished.max |
0.0%
|
0.0%
|
Maya 4.0.1 - Rendertest |
1.4%
|
0.0%
|
Lightwave 7.5 - Raytrace |
0.2%
|
0.0%
|
Lightwave 7.5 - Radiosity Reflective Things |
0.2%
|
0.1%
|
XMpeg DiVX/MPEG-4 Encoding |
0.5%
|
1.9%
|
LAME MP3 Encoding |
0.0%
|
0.0%
|
UnrealTournament 2003 Demo Flyby |
0.5%
|
0.3%
|
Jedi Knight 2 |
1.4%
|
1.3%
|
Serious Sam 2 |
0.3%
|
0.2%
|
Comanche 4 |
1.2%
|
0.5%
|
SPECviewperf 7 - 3dsmax-01 |
8.3%
|
5.1%
|
SPECviewperf 7 - ugs-01 |
23.0%
|
15.6%
|
SPECviewperf 7 - proe-01 |
20.5%
|
7.7%
|
SPECviewperf 7 - drv-08 |
2.1%
|
0.8%
|
SPECviewperf 7 - dx-07 |
16.9%
|
11.3%
|
SPECviewperf 7 - light-05 |
0.3%
|
0.1%
|
Everything here pretty much follows our hypothesis with the exception of SPECviewperf where we see some incredible performance gains when going to DualDDR. Boosts of over 20% in some cases are larger than you'd get from upgrading an Athlon XP 2000+ to a new 2800+, but why are they isolated to SPECviewperf and not the rest of the benchmark suite?
In order to figure out exactly what was going on we went back to NVIDIA with our data, hoping for an explanation; we got that and much more in return.
First of all, as the nForce2 chipset, BIOSes and motherboards matured after numerous revisions it turns out that the performance difference between single channel and dual channel DDR configurations narrowed. NVIDIA's original performance numbers for DDR333 vs. DualDDR333 yielded 5 - 8% gains in business/content creation tests while now we're seeing numbers well under 3%.
Secondly, there has been a lot that has been improved under the hood of NVIDIA's nForce2.
DASP Take Two
On paper one of the most attractive features of the original nForce was its Dynamic Adaptive Speculative Pre-Processor (DASP). We explained the idea behind DASP in our original nForce2 piece:
"As you will remember from our nForce Computer 2001 Preview, NVIDIA's DASP acts much like the hardware prefetch logic found on Pentium 4s and Athlon XP processors. The logic makes educated guesses about future memory accesses based on where in main memory data was recently accessed from as well as how frequently it was accessed in the past. After making these guesses the logic pre-fetches the data it thinks will be requested into its buffer; should the data be required by the CPU then access latency is reduced by tens of nanoseconds by not requiring a memory access. If the data is never requested by the CPU then it will eventually get replaced in the DASP buffer by other pre-fetched data without incurring a performance hit or gain."
Unfortunately with the original nForce, there were a handful of problems with its DASP; the two most significant being that:
1) Prefetches got in the way of "real work", meaning they took bandwidth away from the CPU when it needed it, and
2) The latency reduction resulting from a correctly predicted prefetch wasn't as high as it could have been.
With the 2nd generation DASP that made itself a part of nForce2, NVIDIA learned from their mistakes with the original nForce and improved things considerably.
The nForce2 DASP now has better prefetching intelligence allowing it to correctly predict data that will be used next more correctly. The improvement in prefetch intelligence comes partially from an improvement in the prefetching algorithm that detects when to prefetch data streams.
Even more important to the prefetching intelligence however is that the memory arbiter has an improved algorithm to deal with prefetches and memory requests coming from the CPU. The improvements in the way the memory arbiter handles these two types of memory requests results in nForce2 doing a better job of keeping prefetches away from "real work" (memory requests coming from the CPU).
The second improvement NVIDIA made to their DASP comes from sheer optimization of the nForce2 IGP/SPP's internal datapaths. The result of these datapath level optimizations (based on data collected from nForce performance tests) is a significantly larger reduction in latency when the DASP correctly prefetches data into its cache.
The combination of these two improvements to NVIDIA's DASP results in very competitive performance and in some cases, a significant performance boost when using DualDDR. It is the improved DASP that NVIDIA attributes the extremely large performance gains in SPECviewperf to. But if SPECviewperf were the only situation that the second generation DASP improved performance in it wouldn't be all that useful, so where else will the new DASP increase performance?
Unfortunately most of the situations where DASP and DualDDR will really make a performance difference are difficult to quantify; it's a problem we, as well as NVIDIA, have had a tough time solving. SPECviewperf is a good example of a situation where you're bound by the ability of the memory controller to fulfill requests which is where nForce2's dual memory controllers can come in handy.
The chipset's success in SPECviewperf and NVIDIA's architecture behind it leads us to believe that nForce2 (and its successors) would do very well in workstation and server applications. We're currently putting our theories to test in the server arena but the results were not ready in time for publication in this article; who knows, there may even be a Part III in the works if things turn out right.
What's even more interesting is that NVIDIA's improved DASP would seem to be the perfect companion to Intel's Hyper-Threading technology. In situations where there isn't a lot of locality between concurrently executing threads, the number of memory requests will increase. The more memory requests that exist, the more likely DASP will generate positive results and the potential for the crossbar based memory controllers to shine increases as well. Only time will tell how long it will be before nForce meets Intel outside of the Xbox, but in theory it would be a great fit.
Update on Networking
In our first nForce2 article we investigated the performance of NVIDIA's two integrated Ethernet MACs (Media Access Controllers); we concluded that the integrated NVIDIA MAC was an excellent high-performance Ethernet controller, able to rival even Intel's highly regarded PCI NICs.
However, in our tests we were not so pleased with the secondary 3Com MAC which is present in the more expensive MCP-T. We once again went to NVIDIA with our results and got a couple of responses to our problem:
1) The 3Com MAC and drivers are not as tuned for performance as the NVIDIA MAC was. This supports the idea that the main purpose of the 3Com MAC is to win over the corporate community with the 3Com brand name, NVIDIA could have just as easily outfitted MCP-T with two of their own controllers instead.
2) The motherboards that AMD sent out for the XP 2800+ launch were not in fact using final nForce2 silicon. The performance we saw was a result of the A02 rev of the nForce2 MCP-T, while the bug is fixed in A03. The performance of the 3Com MAC is virtually identical to the NVIDIA MAC in the bandwidth tests, although CPU utilization is still higher. We have an explanation for this below.
The A03 stepping of the NVIDIA MCP-T fixes the 3Com performance issues
We also got a listing of the differences between the NVIDIA and 3Com MACs directly from NVIDIA:
1) The 3Com MAC supports IP, TCP and UDP Checksum offloads while the NVIDIA MAC does not.
2) The 3Com drivers include diagnostics software for DOS and Windows (a huge plus with the corporate community) while the NVIDIA drivers do not have that functionality yet.
3) The NVIDIA MAC supports interrupt moderation resulting in lower CPU utilization, the 3Com MAC does not.
NVIDIA's Secret Weapon
The next issue we had was that the performance of the nForce2 platform in Business and Content Creation applications wasn't on par with our expectations. As you'll remember from our original article, the platform performed just as well as our KT333 test bed but NVIDIA had promised much more. It turns out that the added performance was there, it just needed to be unlocked.
The latest revision of the nForce2 drivers (2.81) have a bug in which the IDE drivers are not installed; this is quite unfortunate as a good deal of the nForce2's performance advantage over the competition in I/O bound situations (e.g. Business/Office applications) is because of their IDE drivers. The improvements in NVIDIA's IDE drivers are actually much like what we saw with Intel 845/850 chipsets with the Intel Application Accelerator, leading us to believe that the internal driver improvements are similar in nature. We already tried the new drivers with the old nForce, unfortunately an infinitely rebooting system was the result of our tests.
It took Intel around a year to produce the first revision of their IAA tools and it has been more than a year since NVIDIA released the first nForce, more than enough time to work on similar optimizations.
Armed with the latest version of NVIDIA's chipset drivers, an ATA registry patch and the latest IDE drivers we re-visited the nForce2 vs. KT333 comparison to see how things have changed…
nForce2
vs. KT333
|
|||
Benchmark
|
nForce2
|
KT333 |
nForce2
Advantage
|
Content Creation Winstone 2002 |
46.7
|
41.4
|
12.8%
|
Business Winstone 2002 |
86.3
|
69.1
|
24.9%
|
SYSMark 2002 - Internet Content Creation |
287
|
281
|
2.1%
|
SYSMark 2002 - Office Productivity |
193
|
183
|
5.5%
|
It is a well known fact that the Ziff Davis Media Winstone benchmarks are considerably I/O bound (as is most daily PC usage) which is why the nForce2 does so well in those tests. Once again we see that the performance advantage isn't due to DASP or DualDDR (recall our earlier benchmarks of DualDDR) but rather NVIDIA's significantly improved IDE drivers.
The performance gains are much less in SYSMark as the benchmark never uses more than 10MB/s of disk bandwidth but they are still present.
NVIDIA was mentioning to us that they've received a lot of subjective feedback on nForce2 performance saying that nForce2 systems just "feel" faster than the competition. With these sorts of improvements in I/O intensive scenarios, we can see why. After all, when does your PC feel the slowest? When you're waiting on that poorly warranteed hard disk of course.
Just for completeness we've included benchmarks in the rest of our application suite to compare to KT333; as you can see, the large performance advantages vanish in most of the tests as they barely hit the disk.
nForce2
vs. KT333
|
|||
Benchmark
|
nForce2
|
KT333 |
nForce2
Advantage
|
3DSMAX5 - SinglePipe2.max* |
219
|
220
|
0.5%
|
3DSMAX5 - Underwater_Environment_Finished.max* |
292
|
293
|
0.3%
|
Maya 4.0.1 - Rendertest* |
70
|
71
|
1.4%
|
Lightwave 7.5 - Raytrace* |
131.6
|
132.9
|
1.0%
|
Lightwave 7.5 - Radiosity Reflective Things* |
88.7
|
89.3
|
0.7%
|
XMpeg DiVX/MPEG-4 Encoding |
55.1
|
66.9
|
-17.6%
|
LAME MP3 Encoding* |
84
|
84
|
0.0%
|
UnrealTournament 2003 Demo Flyby |
158.6
|
156.5
|
1.3%
|
Jedi Knight 2 |
164.1
|
156.8
|
4.7%
|
Serious Sam 2 |
134.7
|
135.0
|
-0.2%
|
Comanche 4 |
49.73
|
48.25
|
3.1%
|
SPECviewperf 7 - 3dsmax-01 |
9.352
|
8.724
|
7.2%
|
SPECviewperf 7 - ugs-01 |
51.14
|
45.62
|
12.1%
|
SPECviewperf 7 - proe-01 |
58.83
|
52.44
|
12.2%
|
SPECviewperf 7 - drv-08 |
13.44
|
12.44
|
8.0%
|
SPECviewperf 7 - dx-07 |
12.4
|
10.54
|
17.6%
|
SPECviewperf 7 - light-05 |
5.306
|
5.291
|
0.3%
|
* Score in seconds, lower is better
The nForce2 leadership in SPECviewperf can be attributed to DASP and DualDDR as we discussed earlier.
We did notice a performance anomaly with our MPEG-4 encoding test, where the nForce2 platform was not able to come close to outperforming VIA's KT333. Considering the performance in all of the other areas we're assuming that this anomaly is a bug in the current nForce2 drivers with Xmpeg and the DiVX codec, we'll let you all know as soon as we have more information on the issue.
Integrated Graphics Performance
Our final area of investigation on nForce2 resides inside the North Bridge, within the GeForce4 MX graphics core that gives the IGP its name. We weren't able to bring you IGP performance in our first review because the ASUS board used the nForce2 SPP and NVIDIA's reference board refused to POST.
An updated NVIDIA reference board with a fully functional IGP gave us the opportunity to see exactly how competitive the nForce2 IGP actually is. Although we wanted to compare to the original nForce, none of the boards in lab had support for our XP 2800+ testbed so for competitive performance scores you'll have to remember that the original nForce IGP performed a lot like a GeForce2 MX 400. As you'll see from the following benchmarks, a GeForce2 MX 400 is no where near the performance of nForce2 and thus NVIDIA has raised the bar for their integrated graphics performance.
Before we get to "nForce2 IGP vs. The World" let's look at the effects of DualDDR on integrated graphics performance. As we've mentioned before, DualDDR's primary use is with integrated graphics enabled in order to feed the bandwidth hungry GPU that lurks within nForce2. How much of a gain do we see?
DDR
vs. DualDDR - IGP Enabled
|
||
Benchmark
|
DualDDR333
vs. DDR333
(% Gain) |
DualDDR400
vs. DDR400
(% Gain) |
Content Creation Winstone 2002 |
3.1%
|
1.8%
|
Business Winstone 2002 |
2.0%
|
0.6%
|
SYSMark 2002 - Internet Content Creation |
2.2%
|
1.1%
|
SYSMark 2002 - Office Productivity |
2.1%
|
2.1%
|
UnrealTournament 2003 Demo Flyby |
56.3%
|
48.1%
|
Jedi Knight 2 |
51.1%
|
44.3%
|
Serious Sam 2 |
55.7%
|
44.9%
|
Comanche 4 |
25.0%
|
18.2%
|
In 2D applications there's barely 3% to gain from moving to DualDDR, which isn't out of the ordinary. These 2D situations don't eat up the GB/s of memory bandwidth that we see in 3D games, but rather in the tens - hundreds of megabytes per second range.
In 3D games, the performance gain is phenomenal. The DDR400 to DualDDR400 gains are less than the similar transition from DDR333 to DualDDR333 because single channel DDR400 offers more memory bandwidth to start from and thus the step up to DualDDR offers somewhat less of an impact. As you are about to find out however, unlike in the previous tests where DDR400 offered no performance advantage over DDR333 - in the IGP world things are much different.
Before we get to that, there's one other question that must be answered. One of the biggest drawbacks to older integrated graphics chipsets was that enabling the integrated graphics would reduce 2D performance by 5 - 15% over a conventional add-in graphics card. With the nForce2 IGP core based off of a GeForce4 MX and using NVIDIA's excellent driver set, would the same hold true here?
Discrete
Graphics vs. nForce2 IGP - 2D Performance
|
|||
Benchmark
|
Discrete Graphics
|
nForce2 IGP |
Discrete
Advantage
|
Content Creation Winstone 2002 |
46.7
|
46.4
|
0.6%
|
Business Winstone 2002 |
86.3
|
85.2
|
1.3%
|
SYSMark 2002 - Internet Content Creation |
287
|
284
|
1.1%
|
SYSMark 2002 - Office Productivity |
193
|
193
|
0.0%
|
The conclusion: there's no performance drop in 2D when using the nForce2's IGP over a GeForce4 Ti 4600. We're running similar benchmarks for another review and to give you an idea of how Intel's 845G fares in this test, discrete graphics is around 11% faster than the 845G's integrated graphics in Business Winstone 2001. Do you see one reason why nForce isn't high on Intel's list of chipsets to support the Pentium 4?
Integrated Graphics Performance (continued)
Now it's time for the grande finale - the gaming tests. We compared the nForce2 (and all of its memory configurations) to a discrete GeForce4 MX 420, GeForce4 MX 440 and an ATI Radeon 9000 Pro.
All of the aforementioned graphics cards are considered to be entry level cards all with a street price of less than $100. We ran all benchmarks on the same nForce2 board at 1024x768. For all discrete tests we ran the nForce2 in DualDDR mode as it provided the highest performance with the IGP disabled and it's how most nForce2 owners using discrete graphics will configure their systems.
3DMark 2001
3DMark
Score @ 1024x768
|
Multitextured
Fill rate (MTexels/s)
|
High
Polygon Count - 1 Lights (MTriangles/s)
|
High
Polygon Count - 8 Lights (MTriangles/s)
|
EMBM
(fps)
|
DOT3
(fps)
|
Vertex
Shader (fps)
|
Pixel
Shader (fps)
|
Advanced
Shader (fps)
|
Point
Sprites (MSprites/s)
|
|
ATI Radeon 9000 Pro |
8301
|
1085.5
|
21.6
|
5.0
|
142.1
|
80.5
|
85.8
|
111.4
|
77.7
|
18.0
|
NVIDIA GeForce4 MX 420 |
4121
|
491.6
|
24.7
|
5.8
|
N/A
|
32.0
|
35.0
|
N/A
|
N/A
|
6.2
|
NVIDIA GeForce4 MX 440 |
7487
|
1090.7
|
34.4
|
6.7
|
N/A
|
77.5
|
52.2
|
N/A
|
N/A
|
11.0
|
NVIDIA nForce2 IGP (DDR333) |
2949
|
461.9
|
19.8
|
4.7
|
N/A
|
29.8
|
32.0
|
N/A
|
N/A
|
8.2
|
NVIDIA nForce2 IGP (DDR400) |
3525
|
557.3
|
21.9
|
4.7
|
N/A
|
36.0
|
38.7
|
N/A
|
N/A
|
9.7
|
NVIDIA nForce2 IGP (DualDDR333) |
4618
|
763.9
|
24.0
|
4.8
|
N/A
|
50.0
|
49.7
|
N/A
|
N/A
|
12.2
|
NVIDIA nForce2 IGP (DualDDR400) |
5183
|
791.0
|
24.6
|
4.8
|
N/A
|
55.8
|
51.5
|
N/A
|
N/A
|
13.2
|
As you can see by these preliminary numbers, with only a single channel of DDR333 memory the nForce2 IGP cannot outperform even the slowest GeForce4 MX 420.
In fact, performance isn't reasonable unless you utilize both 64-bit memory controllers and enable DualDDR; only then is the nForce2 IGP able to outperform the GeForce4 MX 420, although it is still unable to reach the performance of the MX 440.
IGP Performance - Unreal Tournament 2003
We've been using Unreal Tournament 2003 (the engine at least) for nine months now, but we're finally able to switch to the final demo and retail releases of the game for our benchmarks. In this test we're still using the demo version of the game but you have to change a couple of things in order to produce comparable numbers between different graphics cards.
By default the game will detect your video card and assign its internal defaults based on the capabilities of your video card to optimize the game for performance. In order to fairly compare different video cards you have to tell the engine to always use the same set of defaults which is accomplished by editing the .bat files in the X:\UT2003Demo\Benchmark\ directory.
Add the following parameters to the statements in every one of the .bat files located in that directory:
-ini=..\\Benchmark\\Stuff\\MaxDetail.ini -userini=..\\Benchmark\\Stuff\\MaxDetailUser.ini
For example, in botmatch-antalus.bat will look like this after the additions:
..\System\ut2003 dm-antalus?spectatoronly=true?numbots=12?quickstart=true -benchmark -seconds=77 -exec=..\Benchmark\Stuff\botmatchexec.txt -ini=..\\Benchmark\\Stuff\\MaxDetail.ini -userini=..\\Benchmark\\Stuff\\MaxDetailUser.ini -nosound
Remember to do this to all of the 7 .bat files in that directory before running Benchmark.exe.
|
With only a single channel of DDR SDRAM, the performance of the nForce2 IGP hovers around the GeForce4 MX 420. Going to DualDDR helps tremendously but still not enough to match the GeForce4 MX 440.
|
We see much of the same picture in the Botmatch performance comparison, there's still not enough memory bandwidth to compete with the Radeon 9000 Pro or GeForce4 MX 440.
IGP Performance - Comanche 4, Jedi Knight 2 & Serious Sam 2
We finish off our IGP performance investigation with numbers from Comanche 4, Jedi Knight 2 and Serious Sam 2. The performance standings continue to be as they were under UT2003.
|
|
|
Final Words
When we first looked at the nForce2 platform we held it in high regard, but after BIOS revisions, improved drivers and simple more time with the solution it's clear that the chipset is nothing short of amazing.
There's no question that nForce2 is the chipset to pair with AMD's Athlon XP. It is not only the clear performance leader in all situations, but it is also at the forefront of the race to integrate features into chipsets. The only thing NVIDIA is lacking with nForce2 is support for integrated wireless networking which is still at least a year away.
Motherboard manufacturers are also keen on taking advantage of the extensive nForce2 feature set. For example, a number of manufacturers will support on-board multimonitor configurations of their IGP boards - a configuration that no other integrated chipset offers. As you'll remember, TwinView functionality was not present in the original nForce chipset although it was in the discrete GeForce2 MX cards.
Our only performance complaint with nForce2 is the performance of the integrated GPU. Its performance is understandable, considering the limited memory bandwidth at its disposal; with that said, in order for the IGP to be taken seriously even as a gaming platform we'll need to see some serious improvements in performance. The nForce2 IGP is already the fastest integrated graphics solution out for any platform, AMD or Intel; but NVIDIA will have to continue to raise the bar in order to undo the impact of years of poor integrated graphics solutions on the minds of consumers and enthusiasts alike. We expect the first major step to come with the NV34 based nForce solution due out next year; with a NV30 derived core at its disposal, NVIDIA should have an even more compelling chipset a year from now.
With boards in production now, we're hoping to see retail availability of nForce2 solutions before Comdex. The nForce2 launch still left us with a sour taste but unlike the first nForce, this one looks like it'll be well worth the wait.