Testing the 10GBase-T

Many thanks to Brett Howse for his help!

As an indication of how far away from mainstream adoption of 10GBase-T we are, our testing setup was not ready to receive 10GBase-T – we have no switches or other PCIe cards in house to test the ports.  Apart from that, one of the downsides of testing network ports as a whole is that any test out one machine and into another, meaning that and any result will always be at the whims of the lowest common denominator between the two systems. The easiest way to test is therefore to essentially loop back on itself. As the X540-BT2 is a dual port solution, this was ideal for the testing scenario.

We set up the system with ESXi and two Windows Server 2012 VMs, each allocated with 8 threads, 16 GB of DRAM and one of the 10GBase-T ports with custom IPs. We then used LAN Speed Test to set up a server on one VM and a client on the other. LAN Speed Test allows us to simulate multiple clients through the same cable, effectively probing our ports similar to a SOHO/SMB environment.

We organized a series of send/receive commands to go through the connections with differing numbers of streams and differing amounts of data per stream. The system was set to repeat for 10 minutes, and the peak transfer rate was recorded. Results shown are in Gbps.

There are several key points to note with these results.

  • Firstly, single client speed never broke 2.2 Gbps. This puts an absolute limit on individual point-to-point communication speed in our system setup.
  • Next, with between 4-9 clients the speed is fairly consistent between 6.7 and 8 Gbps, no matter what the size of the transfer is.
  • Thus in order to get peak throughput, at least 10 concurrent connections need to be made. This is when 8+ Gbps is observed.
  • With a small number of clients, a longer transfer results in higher overall speed. However with a larger amount of clients, faster transfers results in higher peak speed.

As part of the test, we also examined CPU usage when each stream was set for 1GB transfers. Normally CPU usage for a standard 1Gbit Ethernet port is minimal although one of the factors that some companies like to point to for increased gaming rates. Normally this rarely goes above a few percent on a quad core system, but with the X540-BT2 that changed, especially when the system was being hammered.

As the main VM alternated between reading and writing from the server VM, reading CPU usage peaked after 10 concurrent clients but write CPU usage shot up very quickly to 50% and stayed there. This might explain the slow increases in peak performance we observed, if the software simply ran out of threads and was only geared for four threads.

With a 4 core VM we saw 100% usage during writes, and above was the result in terms of CPU performance monitoring.

This marks an interesting juncture, suggesting that a faster single threaded CPU could deliver better performance and that the X540T-BT2 would be better attached to a Haswell/Broadwell platform – at least in our testing scenario.  The truth of the matter is that fast connectivity technology runs on optimized FPGAs because general purpose CPUs cannot keep up with what is needed.

So where does leave the X99 WS-E/10G? The best example I could propose is in that SOHO/SMB environment where the system is connected to a 10Gbit/1Gbit mixed switch that has 24+ clients that all need to access the system, either as a virtualized workspace, some form of storage or a joint streaming venture. As the former, it allows the rest of the office to use very basic machines and rely on the grunt of the virtualized environment to perform tasks.

Additional: As mentioned in the comments, this is almost the complete out-of-the-box scenario where the only thing configured is the RAMDisk transfers and multiple stream application, whereas most users might be limited by the SSD speed. Jammrock in the comments has posted a list in helping to optimize the connection solely for individual point-to-point transfers and is worth a read.

Overview, Visual Inspection, Board Features BIOS
Comments Locked


View All Comments

  • AngelosC - Wednesday, January 7, 2015 - link

    They could have tested it on Linux KVM with SR-IOV or just run iperf on Linux between the 2 interfaces.

    They ruined the test.
  • eanazag - Monday, December 15, 2014 - link

    Okay, so the use case of a board like this is for network attached storage using iSCSI or SMB3. That network storage has to be able to perform above 1GbE bandwith for a single stream. 1 GbE = ~1024 Mbps = ~128 MBps no counting overhead. Any single SSD these days can outperform a 1GbE connection.

    If you're considering this board, there is a Johan written article on Anand that is a couple of years old about 10GbE performance. It will cover why it is worth it. I did the leg work and found them.

  • extide - Monday, December 15, 2014 - link

    At the end of the day, I still think I'd rather the X99 Extreme 11.
  • tuxRoller - Monday, December 15, 2014 - link

    How Is the DPC measurement made? Average (which?), worst case, or just once?
  • Ian Cutress - Tuesday, November 1, 2016 - link

    Peak (worst value) during our testing period, which is usually a minute at 'idle'
  • TAC-2 - Tuesday, December 16, 2014 - link

    Either there's something wrong with your test of the NICs or there is a problem with this board. I've been using 10GBase-T for years now, even with default settings I can push 500-1000 MB/s using intel NICs.
  • AngelosC - Wednesday, January 7, 2015 - link

    I recon they were not testing this board's most important feature properly.

    The reviewer makes it sounds like they don't know how to test…
  • jamescox - Tuesday, December 16, 2014 - link

    This seems more like a marketing thing; who will actually buy this? Given the current technology, it seems like it is much better to buy a discrete card, if you actually need 10GB.

    The feature I would like to see come down to the consumer market is ECC memory. I have had memory start to get errors after installation. I always run exhaustive memory test when building a system (memtest86 or other hardware specific test). I did not have any stability issues. I only noticed that something was wrong when I found that recently written files were corrupted. Almost everything passes through system memory at some point. Why is it okay for this not to be ECC protected? Given how far system memory is from the cpu (with L3 cache, and soon to be L4 with stacked memory), the speed is actually less important. Everything should be ECC protected.

    There may be some argument that the gpu memory doesn't need to be ECC, since if it is just being used for display; errors will only result in display artifacts. I am not sure if this is actually the case anymore though with what gpus are being used for. Can a single bit error in gpu memory cause a system crash? I may have to start running gpu memory test also.
  • petar_b - Thursday, December 18, 2014 - link

    ASROCK solely targets users with need of 10G network. If network card was an discrete option price would be lower and they would target wider audience. I like two PLXes, as I can attach all kind of Network, SAS and GPU cards. PLX and ASROCK quality is the reason I use their mobos.

    Regarding ECC memory for GPU, not agree there. If GPU is used to do math with OpenCL, then avoiding memory errors is very important.
  • akula2 - Thursday, December 18, 2014 - link

    Avoiding memory errors is beyond extremely important in my case when I churn tons of Science and Engineering things out of those Nvidia Titan Black, Quadro and Tesla cards. AMD did an amazing job with FirePro W9100 cards too.

Log in

Don't have an account? Sign up now