[cap-talk] Object-capability vs. monolithic performance
Jonathan S. Shapiro
shap at eros-os.com
Thu Jan 4 08:29:19 CST 2007
[Subject change. Thank God for Google. I'ld *never* find anything on
this list if Google didn't index us.]
Warning: mild flares ahead, but with humor.
On Wed, 2007-01-03 at 17:07 -0800, Jed Donnelley wrote:
>At 09:42 AM 1/3/2007, Marcus Brinkmann wrote:
> >It's hard to argue with your personal judgement of "good performance",
> >but I will admit your point. However, I think there are many
> >applications where "good" may not quite be "good enough". As test
> >cases, I suggest Gigabit Ethernet and live audio processing. That's
> >not particularly high-end, but typical end-user requirements these
> >days.
>
> With appropriate design any "call" overhead from the capability
> interface is comparable to that for a typical system call and
> has no impact on bandwidth intensive applications like
> Gigabit Ethernet or live audio processing.
Jed: since my lab actually *did* this three years ago, I feel pretty
confident that your experience from the 1980s is not predictive. The
necessary latency requirements for Gbit ethernet are sub microsecond and
the packet rate is many decimal orders of magnitude higher than anything
a disk can put out even today. Also (as you acknowledged indirectly
later in your note), a system designed with proper isolation isn't going
to get away with one system call per packet, while a monolithic design
typically gets away with significantly *less* than one system call per
packet.
Yes, we can achieve comparable performance with a comparably monolithic
and unprincipled design (as you apparently did), but in doing so we
would lose most of the benefit of object-capability systems.
Getting good gigabit ethernet performance with a properly POLA-based
design on a modern, fast processor and fast memory system is goddamn
hard. The architectural challenge lies in the need to simultaneously
maintain a zero-copy networking design and a set of protection barriers
between the wire and the application, all while responding with 1
microsecond latency (or less). For large packets there is no real
problem. The case that it completely ball busting is floods of small
packets. It is tempting to think that all of this is slow interactive
character traffic, but that characterization of small packets hasn't
been true since at least 1985. Small packets now make up a surprisingly
large portion of the network packet demographics, and many of the
protocols involved are high speed and/or low latency.
In the torture case, the best we were able to do in EROS was about 80%
of wire speed, which was about 10%-15% worse than linux, and at that I
suspect that our CPU utilization was much higher than Linux. [Aside: we
still beat the living crap out of Windows.] The primary source of the
cost was restrictions that arose from a purely synchronous IPC
mechanism. The problem wasn't the call overhead per se, but the
blocking, unblocking, and context switching entailed. Based on
measurement, the context switch overheads are a substantial proportion
of the total processing time.
I emphasize that the network stack in question was pretty carefully
engineered. We were zero copy on output, mostly zero copy on input,
running packet boxcars through the stack wherever possible, and a whole
bunch of other stuff like that. But we were able to preserve all of the
isolation layers that we wanted.
> To meet these performance goals in some cases
> we needed to push some of the service processes (e.g. the
> "file" server) into what amounted to the kernel (cutting out
> two of four domain changes), but we were still able to keep
> the interface the same, faithful to the object-capability
> model.
Ah. So you cheated. :-)
> The real issue (from my experience) is the number
> of domain transitions required for a given call. This can
> be reduced as much as necessary at the cost of
> weaker integrity, reliability, analyzability, etc. in the
> resulting monolith.
>
> That is, the trade-off isn't in the interface, it's in the
> service implementation.
Yes, the issue is domain switches, but much of this in the ethernet case
is driven by choices in the IPC mechanism. The choice of interface is
critical in the ethernet case because it dictates a lot about the
feasible service implementations. That said, Gbit ethernet is perhaps
*the* extreme sample point in current systems, and I basically agree
with your point.
Good. Now I will switch sides.
Given what we know today about high-performance microkernel
implementation, I claim that we can now achieve performance comparable
to monolithic systems *without* accepting weaker integrity, reliability,
or analyzability when measured on an application benchmark basis. We
cannot afford gratuitous domain switches, but we can afford enough to
preserve these properties.
I assume, for this claim, a decently modern TLB implementation (as is
presently available on all modern processor implementations -- even
later Pentiums) and a consistent cache (which rules out the widely
deployed ARM parts, but not the later core designs).
I'm tempted to require a finite set of registers on the processor, but
I'ld only be saying that to pull Alan's nose a bit. Itanium does a
surprisingly fast switch if the implementation is careful.
I set application benchmarks as the basis because there will always be
some small number of uninteresting operations where one system is slower
than another, and people use these to pick nits.
Jed: You're a very talented guy. You really need to stop setting such
low bars for yourself and the community. It's embarrassing. :-)
> ><snip>
> >It took years for the microkernel community to develop a fast IPC
> >mechanism (due to Liedtke).
Um. Marcus: there were some other leaders in that, like myself and Bryan
Ford. The EROS implementation arguably outperformed the L4
implementation, in the sense that it achieved the same number of cycles
but implemented protection where the L4 IPC did not.
> >Given that
> >even a single level of indirection makes these systems non-competitive
> >with traditional monolithic kernels without object/capability system
> >should make one careful about any exaggerated claims.
>
> I described my approach to dealing with a "single level of indirection"
> above. Namely, eliminate it where needed, smash the mechanism
> into a monolithic kernel (which may be needed to provide the required
> performance), but leave the interface the same.
Can't be done. The level of indirection in question is the indirection
necessary to the implementation of protection in the microkernel.
No. Marcus is quite right here. The state of modern microkernel
engineering is at the point where a difference of 3-5 cache misses in
the total IPC path is the difference between winning and losing, and
this stuff is engineered *way* better in modern systems than your glib
"smash and eliminate" implies.
I will pay you $500 if you can eliminate 80 cycles (just two L1 cache
misses) from the production L4 implementation's IPC subsystem in less
than a year's effort. You get to pick the architecture. It has to be a
mainstream, modern L4 production implementation (i.e. not one of the
research experiments). You don't get to alter the interface or
compromise the L4 protection model (such as it is). You *do* need to
explain how you did it to collect.
When Coyotos is up and running, I'll put out a higher bounty for that.
I'ld put out the higher bounty on EROS, but the production
implementation of that is a decade stale and uses an obsolete trap
interface, so it's hardly a challenge.
[Aside: I'ld rather you wait and take the Coyotos challenge. More money
in it for both of us.:-)]
> Why is it that the comparison is between a
> "system S" built on a capability model and a native implementation
> of S, rather than comparable application run on a capability
> operating system, O? Of course I understand the legacy issues,
> but if this is always the comparison then of course it will be
> impossible to compete.
If you believe this, it's time to shut down cap-talk, because we are all
wasting our time. I don't believe this.
shap
--
Jonathan S. Shapiro, Ph.D.
Managing Director
The EROS Group, LLC
+1 443 927 1719 x5100
More information about the cap-talk
mailing list