[cap-talk] Object-capability vs. monolithic performance - 2
Jed Donnelley
jed at nersc.gov
Thu Jan 4 19:01:39 CST 2007
At 06:29 AM 1/4/2007, Jonathan S. Shapiro wrote:
<more details, perhaps only of interest to Shap.>
>Also (as you acknowledged indirectly
>later in your note), a system designed with proper isolation isn't going
>to get away with one system call per packet, while a monolithic design
>typically gets away with significantly *less* than one system call per
>packet.
When you say "system call" above, I map that to something like
"cross domain call". If one is able (as we were) to turn any such
"cross domain call"s internal to the system's implementation into
a subroutine call, then I argue that the performance can be as
good as with any implementation in a monolithic kernel (see diagram/
discussion below).
>Yes, we can achieve comparable performance with a comparably monolithic
>and unprincipled design (as you apparently did),
Of course I bristle at "unprincipled". I believe we had the highest possible
"principles", including that our user's requirements were paramount.
>but in doing so we
>would lose most of the benefit of object-capability systems.
I guess this is a matter of what "most" means to each of us. For tight
performance requirements and especially with a moderately expensive
domain change (can be dictated by the machine hardware as it was
for our case - though there was a minor additional cost imposed in
our case by the language imposed subroutine convention, basically
caller vs. callee saving of registers) one has no choice but to sacrifice
protection domain exchanges (which can provide their own value in terms
of integrity, reliability, analyzability, etc.) for performance.
What I'm arguing is that even given such sacrifices, the functional
wrappability/mappability and object-capability nature of the "call"
interface need not be sacrificed for performance.
>Getting good gigabit ethernet performance with a properly POLA-based
>design on a modern, fast processor and fast memory system is goddamn
>hard...
I appreciate this. I've been there many times myself - though in
different contexts I accept. All I'm taking issue with is the dependence
on the user interface. Let's see if I can "draw" a picture. You start with
some user call, generically:
/
Net send/receive: | An exchange happens, "system" code runs
\
One can implement the "system" code with an internally POLA
architecture with domain crossings between various modules. Fine.
Doing so has a latency cost. It also has a modularity, integrity,
possibly reliability, etc. value. If performance is the ascendent
requirement (as in my experience it sometimes is), then these internal
values may need to be sacrificed to achieve the needed performance.
One can (at least we were able to) keep the internal modularity but
turn the domain boundary crossings into subroutine calls to meet
performance requirements at the cost of protection that typically
(as it's all internal to the system's implementation) is not vital.
What I argue is that the interface for the base "Net send/receive" can
still be a wrappable/mappable object-capability interface.
I guess I should mention here that if one does wrap that interface
(that is one adds a level of emulation through an extension mechanism),
then of course that will add to the latency of the operation. Is that
an issue in this discussion?
>...Based on
>measurement, the context switch overheads are a substantial proportion
>of the total processing time.
If they are internal to your implementation then I argue that you had/have
the same opportunity to optimize them away while still keeping your
same basic modular structure - just with the barriers lowered.
> > To meet these performance goals in some cases
> > we needed to push some of the service processes (e.g. the
> > "file" server) into what amounted to the kernel (cutting out
> > two of four domain changes), but we were still able to keep
> > the interface the same, faithful to the object-capability
> > model.
>
>Ah. So you cheated. :-)
There you go. I argue that opportunity was also available to you, and
without sacrificing the wrappable/mappable object-capability
interface for the user network send/receive call.
> > The real issue (from my experience) is the number
> > of domain transitions required for a given call. This can
> > be reduced as much as necessary at the cost of
> > weaker integrity, reliability, analyzability, etc. in the
> > resulting monolith.
> >
> > That is, the trade-off isn't in the interface, it's in the
> > service implementation.
>
>Yes, the issue is domain switches, but much of this in the ethernet case
>is driven by choices in the IPC mechanism.
Are you referring to the IPC mechanism within the implementation
or as seen by the user level application? In the former case,
"cheat" away as needed. If the later then we have some details
to get into.
>Good. Now I will switch sides.
>
>Given what we know today about high-performance microkernel
>implementation, I claim that we can now achieve performance comparable
>to monolithic systems *without* accepting weaker integrity, reliability,
>or analyzability when measured on an application benchmark basis. We
>cannot afford gratuitous domain switches, but we can afford enough to
>preserve these properties.
OK, I'll also switch sides (on this narrow issue, not on my fundamental
argument for wrappable/mappable object-capability "user" interfaces).
I believe that no matter what efforts you make to achieve performance
comparable to a monolithic kernel and still preserve your internal
domain changes for stronger (integrity, etc.), I can come up with
a monolithic implementation that beats it by something like the
cost of those internal domain changes. Doesn't this seem obvious?
In fact I can use your own implementation with the barriers reduced
to subroutine calls to compete against you. Why fight it?
>I'm tempted to require a finite set of registers on the processor, but
>I'ld only be saying that to pull Alan's nose a bit. Itanium does a
>surprisingly fast switch if the implementation is careful.
The register business is where you can get into fixed domain
change costs being greater than what amounts to a subroutine
call cost. It seems to me that the trade-offs are pretty obvious
in this area.
What is most important to me is hearing any argument about
the user level interface. That is, anything to the effect that there
is some reason that this can't be a wrappable/mappable
object-capability interface because (abc - including performance
considerations). Remember, I'm not arguing that there won't
be an inevitable cost for doing actual wrapping, just that until
such wrapping is done, the cost of the interface need not be
significantly (on the order of a single subroutine call vs. any
alternative monolithic kernel call - the call, not the internal
implementation) different from that for the same service in
a system like Unix or Windows.
>I set application benchmarks as the basis because there will always be
>some small number of uninteresting operations where one system is slower
>than another, and people use these to pick nits.
>
>Jed: You're a very talented guy. You really need to stop setting such
>low bars for yourself and the community. It's embarrassing. :-)
What "low bar" do you believe I've set?
> > I described my approach to dealing with a "single level of indirection"
> > above. Namely, eliminate it where needed, smash the mechanism
> > into a monolithic kernel (which may be needed to provide the required
> > performance), but leave the interface the same.
>
>Can't be done. The level of indirection in question is the indirection
>necessary to the implementation of protection in the microkernel.
I hope the discussion above has clarified this point. Of course I
bristle a bit when I hear somebody say "Can't be done." about
something that we demonstrably did There is a protection boundary
inevitably crossed for the initial user level call. There may be
additional domain crossings internal to the implementation.
I don't argue that such crossings don't provide protection value,
just that they can be given up to achieve user level performance,
while still maintaining a wrappable/mappable object-capability
system call interface for the user.
>No. Marcus is quite right here. The state of modern microkernel
>engineering is at the point where a difference of 3-5 cache misses in
>the total IPC path is the difference between winning and losing, and
>this stuff is engineered *way* better in modern systems than your glib
>"smash and eliminate" implies.
Of course there was nothing "glib" about the two years of work that
we put into improving the performance of our production operating
system by eliminating two of the four domain changes internal to
the file read/write calls for the solid state disk case (didn't matter
for rotating disk case where the latencies were in the millisecond
range).
I'm delighted if the costs for domain crossings (IPC interfaces)
are less on todays systems. Great. That gives you an opportunity
to keep more of those internal protection boundaries and still
meet your performance requirements (e.g. your Gigabit Ethernet
example).
I still suggest that if you find yourself in a situation where the
difference between an exchange mandated by an IPC interface
vs. the presumed lower cost of a subroutine call makes a difference,
then you have the option of turning that IPC interface into a
subroutine call - While Still Maintaining the functional value of
the wrappable/mappable object-capability interface for the application
level user.
>I will pay you $500 if you can eliminate 80 cycles (just two L1 cache
>misses) from the production L4 implementation's IPC subsystem in less
>than a year's effort. You get to pick the architecture. It has to be a
>mainstream, modern L4 production implementation (i.e. not one of the
>research experiments). You don't get to alter the interface or
>compromise the L4 protection model (such as it is). You *do* need to
>explain how you did it to collect.
Not my area of expertise or interest. $500 is less than one days
consulting fee. It would have to be honor and not reward that would
motivate me, but I don't have any stake in that area.
> > Why is it that the comparison is between a
> > "system S" built on a capability model and a native implementation
> > of S, rather than comparable application run on a capability
> > operating system, O? Of course I understand the legacy issues,
> > but if this is always the comparison then of course it will be
> > impossible to compete.
>
>If you believe this, it's time to shut down cap-talk, because we are all
>wasting our time. I don't believe this.
By "compete" I meant win. The cost of emulation in one direction will
always be there. You may be able to make it small enough to compete
in the sense that the value that you add is sufficient to gain market
share, but I believe it will always be possible for the monolithic kernel
to perform better without it's cost for emulation than your microkernel
will be able to do with it's emulation - even with "cheating" by turning
the internals of your micro-kernel into a monolith to eliminate internal
domain exchange costs.
When stated as above the analogy to the old debates about higher
level languages vs. machine (assembly) level languages comes to
mind. I view the comparison there as what shows up as the occasional
need to "cheat" by writing assembly level subroutines for the performance
critical elements of common use (e.g. vector operations for scientific
systems). Even with this occasional "cheating" (indeed partly because
of it), higher level languages dominate for programming. I hope to see
the day when micro kernel architectures dominate in much the same
way.
I personally hope that they do so with a "user" interface (recognizing
that what level a "user" is at is a matter of interpretation) that is
an object-capability interface and that is wrappable and has been mapped
into a network interoperable object-capability standard. My writing to
this list is with that goal in mind.
--Jed http://www.webstart.com/jed/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.eros-os.org/pipermail/cap-talk/attachments/20070104/71f0e0bb/attachment-0001.html
More information about the cap-talk
mailing list