[E-Lang] Performance and complete mediation
Jonathan S. Shapiro
shap@eros-os.org
Mon, 6 Aug 2001 17:37:23 -0400
> Nope. You didn't miss anything, but don't attribute my numbers to other
> capability systems. EROS, for example, will be orders of magnitude
faster.
Hmm. Apparently *I* missed something.
> Also, those times were measured on a 300 MHz machine; on a modern 1 GHz
> machine the performance would be proportionally better.
I am always leary of this argument, because it assumes that the demand does
not also increase proportionately. Is there some reason to believe that the
frequency of cross-domain calls is a function of user speed rather than
clock speed?
> A value of 300,000 machine cycles (1 ms/300 MHz) is too large to compete
> with operating system calls. However, as I noted, about 80% of that time
> was context switches.
As you weren't in a position to hack the OS, I hope you will not take it
personally if I say that this is disgustingly bad performance. If I recall,
you said that this 1ms inclued 4 domain transfers, which would be
(.8*300,000/4) = 60,000 cycles per context switch. On a 200Mhz machine,
using an earlier and poorer x86 chip design, both Liedtke and I showed best
case times of 135 cycles, and typical case times in the 240 cycle range. I
can only state the obvious: 1 factor of 250 actually matter in
performance-critical operations. :-) Reaching 240 cycles was a bitch of a
lot of work, but 400 to 450 wasn't hard to reach.
> The biggest part of the time was
> spent in copying the messages from one buffer to another. Had the other
> parts of the overhead been smaller, we could have optimized that piece.
We found this as well, but we also found that it was orthogonal to the rest
of the context switch time. Once you have to copy a string at all, you bite
off marginal overhead in address space manipulation. This overhead is a pure
function of the string size (with a possible constant addition due to copy
misalignment causing a single marginal page mapping on each side). On the
Pentium in particular, copy speed is greatly influenced by differences in
cache handling in each generation, and also by differences in string
alignments. Later caches did string copy in the cache if the copy was
32-byte aligned, which really helped, but your string copy routine had to be
conditioned (at least by startup-time initialization) on the CPU type to get
this well optimized.
Jonathan