[E-Lang] Performance and complete mediation

Karp, Alan alan_karp@hp.com
Wed, 8 Aug 2001 20:23:41 -0700


> -----Original Message-----
> From: Jonathan S. Shapiro [mailto:shap@eros-os.org]
> Sent: Monday, August 06, 2001 2:37 PM
> To: e-lang@mail.eros-os.org
> Subject: Re: [E-Lang] Performance and complete mediation
> 
> 
> > Nope.  You didn't miss anything, but don't attribute my 
> numbers to other
> > capability systems.  EROS, for example, will be orders of magnitude
> faster.
> 
> Hmm. Apparently *I* missed something.
> 
> > Also, those times were measured on a 300 MHz machine; on a 
> modern 1 GHz
> > machine the performance would be proportionally better.
> 
> I am always leary of this argument, because it assumes that 
> the demand does
> not also increase proportionately. Is there some reason to 
> believe that the
> frequency of cross-domain calls is a function of user speed 
> rather than
> clock speed?

It doesn't matter if all you're measuring is minimum achievable latency.  Of
course, under load, you're right; the load goes up proportionally.

> 
> > A value of 300,000 machine cycles (1 ms/300 MHz) is too 
> large to compete
> > with operating system calls.  However, as I noted, about 
> 80% of that time
> > was context switches.
> 
> As you weren't in a position to hack the OS, I hope you will 
> not take it
> personally if I say that this is disgustingly bad 
> performance. If I recall,
> you said that this 1ms inclued 4 domain transfers, which would be
> (.8*300,000/4) = 60,000 cycles per context switch. On a 
> 200Mhz machine,
> using an earlier and poorer x86 chip design, both Liedtke and 
> I showed best
> case times of 135 cycles, and typical case times in the 240 
> cycle range. I
> can only state the obvious: 1 factor of 250 actually matter in
> performance-critical operations. :-) Reaching 240 cycles was 
> a bitch of a
> lot of work, but 400 to 450 wasn't hard to reach.

Hey.  I don't mind if you insult the context switch time.  We had plenty to
say aobut it ourselves.  That value is what we measured with NT.  If those
context switches are too slow, take it up with Redmond.  My point is that
the part that was out of our control was already 3/4 of the overhead, so it
wasn't worth tuning the rest of the code we could control.

> 
> > The biggest part of the time was
> > spent in copying the messages from one buffer to another.  
> Had the other
> > parts of the overhead been smaller, we could have optimized 
> that piece.
> 
> We found this as well, but we also found that it was 
> orthogonal to the rest
> of the context switch time. Once you have to copy a string at 
> all, you bite
> off marginal overhead in address space manipulation. This 
> overhead is a pure
> function of the string size (with a possible constant 
> addition due to copy
> misalignment causing a single marginal page mapping on each 
> side). On the
> Pentium in particular, copy speed is greatly influenced by 
> differences in
> cache handling in each generation, and also by differences in string
> alignments. Later caches did string copy in the cache if the copy was
> 32-byte aligned, which really helped, but your string copy 
> routine had to be
> conditioned (at least by startup-time initialization) on the 
> CPU type to get
> this well optimized.
> 

Our measurements showed that buffer copy overhead started to show up at
around 1 KB messages.  By 500 KB, it dominated.

> 
> Jonathan
> 
> _______________________________________________
> e-lang mailing list
> e-lang@mail.eros-os.org
> http://www.eros-os.org/mailman/listinfo/e-lang
> 

_________________________
Alan Karp
Principal Scientist
Decision Technology Department
Hewlett-Packard Laboratories MS 1U-3
1501 Page Mill Road
Palo Alto, CA 94304
(650) 857-3967, fax (650) 857-6278
https://ecardfile.com/id/Alan_Karp
http://www.hpl.hp.com/personal/Alan_Karp/