No more mysteries: Apple's G5 versus x86, Mac OS X versus Linux
by Johan De Gelas on June 3, 2005 7:48 AM EST- Posted in
- Mac
Mac OS X versus Linux
Lmbench 2.04 provides a suite of micro benchmarks that measure the bottlenecks at the Unix operating system and CPU level. This makes it very suitable for testing the theory that Mac OS X might be the culprit for the terrible server performance of the Apple platform.Signals allow processes (and thus threads) to interrupt other processes. In a database system such as MySQL 4.x where so many processes/threads (60 in our MySQL screenshot) and many accesses to the kernel must be managed, signal handling is a critical performance factor.
Larry McVoy (SGI) and Carl Staelin (HP):
" Lmbench measure both signal installation and signal dispatching in two separate loops, within the context of one process. It measures signal handling by installing a signal handler and then repeatedly sending itself the signal."
Host | OS | Mhz | null | null call |
open I/O |
stat | slct clos |
sig TCP |
sig inst |
Xeon 3.06 GHz | Linux 2.4 | 3056 | 0.42 | 0.63 | 4.47 | 5.58 | 18.2 | 0.68 | 2.33 |
G5 2.7 GHz | Darwin 8.1 | 2700 | 1.13 | 1.91 | 4.64 | 8.60 | 21.9 | 1.67 | 6.20 |
Xeon 3.6 GHz | Linux 2.6 | 3585 | 0.19 | 0.25 | 2.30 | 2.88 | 9.00 | 0.28 | 2.70 |
Opteron 850 | Linux 2.6 | 2404 | 0.08 | 0.17 | 2.11 | 2.69 | 12.4 | 0.17 | 1.14 |
All numbers are expressed in microseconds, lower is thus better. First of all, you can see that kernel 2.6 is in most cases a lot more efficient. Secondly, although this is not the most accurate benchmark, the message is clear: the foundation of Mac OS X server, Darwin handles the signals the slowest. In some cases, Darwin is even several times slower.
As we increase the level of concurrency in our database test, many threads must be created. The Unix process/thread creation is called "forking" as a copy of the calling process is made.
lmbench "fork" measures simple process creation by creating a process and immediately exiting the child process. The parent process waits for the child process to exit. The benchmark is intended to measure the overhead for creating a new thread of control, so it includes the fork and the exit time.
lmbench "exec" measures the time to create a completely new process, while " sh" measures to start a new process and run a little program via /bin/ sh (complicated new process creation).
Host | OS | Mhz | fork hndl |
exec proc |
Sh proc |
Xeon 3.06 GHz | Linux | 3056 | 163 | 544 | 3021 |
G5 2.7 GHz | Darwin | 2700 | 659 | 2308 | 4960 |
Xeon 3.6 GHz | Linux | 3585 | 158 | 467 | 2688 |
Opteron 850 | Linux | 2404 | 125 | 471 | 2393 |
Mac OS X is incredibly slow, between 2 and 5(!) times slower, in creating new threads, as it doesn't use kernel threads, and has to go through extra layers (wrappers). No need to continue our search: the G5 might not be the fastest integer CPU on earth - its database performance is completely crippled by an asthmatic operating system that needs up to 5 times more time to handle and create threads.
116 Comments
View All Comments
Reflex - Friday, June 3, 2005 - link
NT was designed primarily by Dave Cutler, who was one of the guys behind VMS at DEC. NT is not based on Mach and has no relation to it, although it shares some similarities with BSD and VMS.tfranzese - Friday, June 3, 2005 - link
#35, Apple's platform uses HT links (don't ask me specifics).minsctdp - Friday, June 3, 2005 - link
What's with the 24 MB/s memory write time on the Xeon, vs. nearly 2GB/s for the others? Looks bogus.querymc - Friday, June 3, 2005 - link
I'd still like to see a Linux on G5 test. Without one, we still don't know for sure whether the bad performance is due to OS X or the hardware. And it's definitely useful for G5 owners to know whether they can expect Linux to improve server performance.querymc - Friday, June 3, 2005 - link
NT is not built on Mach. NT itself was originally a microkernel-based OS, derived from the design of DEC's VMS OS via the lead architect of both, Dave Cutler. It's currently very monolithic, a bit more than OS X because they stuffed a lot of userspace cruft from Windows 9X in the XP kernel for binary compatibility.Rick Rashid(sp?) was one of the co-developers of Mach, and he went to Microsoft, which is probably what OddTSI is referring to. I don't recall whether he went to research or the OS group, though. Either way, NT has no Mach code and does not share Mach's design.
Netopia - Friday, June 3, 2005 - link
OddTSI (Poster 37)-- Do you have any supporting data for saying that NT is built on Mach?Joe
AluminumStudios - Friday, June 3, 2005 - link
Intersting article. I wish you hadn't left out AfterEffects though because I use it heavily and I'd love to see a comparison between the Mac and x86 on it.OddTSi - Friday, June 3, 2005 - link
There's a semi-big error in your discussion on page 7. NT (and the subsequent Windows OSes based on it) is NOT a monolithic OS. In fact NT is BASED ON MACH. The main developer for the Mach micro-kernel was one of the lead developers of NT.octanelover - Friday, June 3, 2005 - link
I think it would be interesting, on the server side of things, to include Solaris 10 on Opteron in your benchmark list. Seeing as how Solaris is still a major player in the server world it would be nice to see how it fares along with Linux and Mac OSX.By the way, this article, IMHO, is darn near groundbreaking. Excellent work and very illuminating.
exdeath - Friday, June 3, 2005 - link
And before we talk about 10 Gb/sec busses, don't forget the Opteron can have like what 3 HT channels?And Hyper Transport specs allow for 22 GB/sec per channel (11 GB/sec bidirectional?)