[cap-talk] Another "core" principle - virtualize capabilities

Jed Donnelley capability at webstart.com
Wed Jan 3 19:07:23 CST 2007


At 09:42 AM 1/3/2007, Marcus Brinkmann wrote:
>At Sun, 31 Dec 2006 13:33:12 -0800,
>Jed Donnelley <capability at webstart.com> wrote:
> > > > All I'm arguing for is the ability to standardize the object-capability
> > > > model sufficiently to map from one system to another - e.g. along
> > > > the lines of the DCCS.  Any such general mapping will automatically
> > > > provide the needed standard and "smooth out" any nuances between
> > > > the systems, provided that the systems were designed with
> > > > virtualizable capabilities only.
> > >
> > >I understand, but some goals are just pipe dreams, and not attainable
> > >in the real world.
> >
> > I disagree.  I believe they have been met or nearly met (needing only
> > minor changes) by many systems.  For example, I believe that the RATS
> > operating system:
>
>I am really confused now.  Are you claiming that any other system that
>was "designed with virtualizable capabilities only" can be ported to
>RATS with only minor modifications to how to express and invoke object
>interfaces, and still perform according to its specification?

No.  I was arguing that the RATS system is an example of an object-capability
system with virtual memory mapping which, with appropriate care, can
be implemented so that all kernel objects are wrappable through the
extension mechanism.

>I think the real problem is that the number of systems that are
>designed with "virtualizable capabilities only" is exactly zero,
>because:
>
>   The capability model is not a complete model of computation.
>
>Also, I claim the following:
>
>   The object model is not a complete model of computation.
>
>Both sentences I think hold for any value of "capability model" and
>"object model" that is usually meant by these terms.  In particular,
>the object model you described in another part of this thread falls
>into this category:
>
>    "There an "object" is the permission to do an 'invocation'
>     consisting of the sending of any amount of data and any
>     number of other objects and the blocking receiving
>     of same."
>
>I claim that this is merely the beginning of a model for computation,
>and that it lacks, among other things, in the words of Jonathan,
>considerations of:
>
>    "durability, resource exhaustion, and synchronicity"
>
>In other words, it lacks *anything* related to the question of
>executing instructions on a real machine architecture.

Hmmm.  I'm afraid I don't understand what you're getting at.
In the object-capability model processes are assumed to
execute the instructions of their underlying processor
architecture in their own memory except for the "invoke"
instruction (virtual instruction if you will) that behaves as
above, namely that it allows:

1.  Some amount of data and some number of capabilities
to be sent to whatever services the invocation, and

2.  The process to block until some amount of data and
some number of capabilities are received in "reply" to
the invocation.

When you say that "resource exhaustion" is not included
in this model, I guess that's true.  I believe "resource exhaustion"
is appropriately included at a higher level.

I'm not quite sure what you mean by "durability", but I expect
that should also be included at a higher level.

"synchronicity" is included (blocking).  Any more asynchronous
models can be developed at a higher level.  E.g. I do an
invocation and immediately receive back a capability that
I can block on at a later time if "I" (an anthropomorphic
process/active object) wish to at a later time.

>The "nuances" in system design may initially fall outside of the
>capability or object model, however, they have real consequences for
>system construction and reflect on the object and interface design
>itself.  The read/write interface for a file object may look very
>different from system to system, for example, depending on how the
>data is transferred (shared memory vs copy in/out, in-band vs
>out-of-band) and when (synchronously, asynchronously), and using which
>resources (client, server, etc).  And although some of these features
>may be emulated on top of the other, doing so will degrade performance
>in a manner that it disappoints expectations.

All of the above sorts of mechanisms can be implemented within
the pure object-capability model and without significant performance
penalty.  I believe that with even a moderate amount of care such
"file" operations can even be wrapped without significant performance
penalty.  This is certainly true for the "attach" (map) sorts of
virtual memory calls.  An initial call may be wrapped and go through
another process/active object, but once the mapping is complete
then the access is directly to memory.

Similarly in a system with direct I/O calls (read, write), the
invocations can be set up so that the ultimate data transfer
goes, in the relevant cases, directly from memory to disk or
visa versa.  There may be some overhead in the setup through
wrapping, but that is generally small compared to disk latency.

>...
> > (note for recent discussion on the topic, I believe that
> > synchronicity of the invoke operation is a matter for the
> > system decide - though any amount of asynchronous operation
> > can be build from a base synchronous call).
>
>Yes, it is possible, from a functional point of view.  But the
>performance is quite different.  Consider a system that emulates the
>select() POSIX system call using a synchronous call primitive, say on
>5000 file descriptors.  There are not many applications that need this
>functionality, but those that do are decidedly performance critical.
>Sometimes additional threads are not cheap enough.
>
>A system which is designed to support this efficiently depends on
>asynchronously delivered notification that can unblock a blocking
>thread.  It is not feasible to implement such a mechanism in a
>synchronous message system.

Fine.  Provide such an interface.  There's nothing about the base
object-capability model that prohibits doing so.

>Note that people who advocate a synchronous call primitive do not say
>that select() can be implemented on top of it if pressed on this
>point.

Sorry, I do (as I did above).  I also say that I don't particularly
care whether there is some sort of asynchronous communication
primitive in the system.  I view this as orthogonal to the
object-capability request/reply paradigm.  The NLTSS system
that we ran in production for ~10 years:

http://en.wikipedia.org/wiki/NLTSS

had at its base an asynchronous model of communication which is
briefly described in:

http://www.webstart.com/jed/papers/Components/

(figures 11 and 12 and the surrounding text).  Despite that, I argue
that it followed the base object-capability paradigm and had
what I consider to be wrappable capabilities.

I guess I should also mention that I'm not concerned about
what I would call "out of band" perfection in the wrapping
mechanism, specifically anything like "EQ?" or "MyCap?".
It is enough for me that functionally a wrapped object
(capability) behaves in the same way as the original
in terms of what happens when data and capabilities
are sent in and come back.

>Instead, their second line of defense is that select()
>semantics are considered by them to be broken.  That's fine, but that
>means dropping the goal of universality.

I don't consider select() broken and I still hold to the goal of universality.

> > There must be an extension mechanism in the system that allows
> > appropriately enabled processes to create new capabilities that
> > they can communicate and service such "invocation"s on.  The
> > extension mechanism must be such that any of the systems
> > primitive capabilities can be virtualized.  This extension
> > is trivially achieved in language and network level systems,
> > and I believe also easy to achieve with OS level systems.
> >
> > This is not difficult to achieve and with what I regard as
> > good performance.  Regarding:
>
>It's hard to argue with your personal judgement of "good performance",
>but I will admit your point.  However, I think there are many
>applications where "good" may not quite be "good enough".  As test
>cases, I suggest Gigabit Ethernet and live audio processing.  That's
>not particularly high-end, but typical end-user requirements these
>days.

With appropriate design any "call" overhead from the capability
interface is comparable to that for a typical system call and
has no impact on bandwidth intensive applications like
Gigabit Ethernet or live audio processing.  I say this from
some experience.  We had applications with VERY demanding
latency requirements for access to a solid state disk device
which had latencies in the 10 microsecond range in ~1980.
We were able to get comparable latencies with an object-capability
model as were achieved with a more traditional system call
interface.  To meet these performance goals in some cases
we needed to push some of the service processes (e.g. the
"file" server) into what amounted to the kernel (cutting out
two of four domain changes), but we were still able to keep
the interface the same, faithful to the object-capability
model.

> > >Given that you
> > >don't consider resource allocation issues a serious concern
> >
> > You must have been listening to others write about me.  I consider
> > functionality the primary concern, but I certainly do consider
> > "resource allocation" issues a serious concern.  I spent some
> > 5-7 years of my career optimizing for such "resource allocation"
> > issues on the NLTSS system at LLNL (~1983-1989).  I'll note that
> > none of the resource/performance issues that I worked on had
> > anything to do with the nature of the object-capability system
> > call interface or on the extension mechanism.  It always
> > puzzles me why many people seem to feel that an object-capability
> > interface imposes performance problems/limitations on
> > systems (whether at the language, OS, or network levels).
>
>It's not that "an" interface imposes necessarily performance problems.
>However, the "wrong" interface certainly will.  What is the "correct"
>and what is the "wrong" interface depends greatly on system details
>that lie well outside of what is covered by your proposal.

At this point I don't see anything we can do but respectfully disagree.
I have my experience in this area, described above.  I can only
guess that you have some experience where an interface faithful
to the object-capability paradigm imposed unacceptable overhead
and you believe no object-capability interface could provide acceptable
overhead.  Perhaps you can describe your experience in this
regard and we can discuss it.

> > Have there been studies that I've missed about object-capability
> > systems that performed poorly because of OS overhead?
>
>I don't know if you missed them, but there certainly have been
>studies.
>
>As a typical example, let's consider Mach, which fits your description
>of a pure capability system.  It even uses the same message interface
>for user and kernel objects.  Its failures have been carefully
>analyzed here: http://citeseer.ist.psu.edu/chen93impact.html

Heh.  One thing to note about the above is that:

"Both operating systems implement the UNIX system
call interface, although their underlying implementations
differ substantially. Ultrix is a monolithic system in which
all operating system code is implemented in the kernel. A
program running on Ultrix invokes the operating system
through a system call interface. In contrast, Mach 3.0 is a
microkernel that exports and implements a small number
of orthogonal abstractions including interprocess com-
munication (IPC), threads, and virtual memory. Higher-
level operating system services are implemented in a user-
level process called the UNIX server."

The above is very similar to the situation we faced in our
NLTSS implementation.  Namely we provided a new set
of interfaces (e.g. a directed graph of process through
capabilities) but user applications only "wanted" (used)
an emulation of an older system supported through libraries
and servers (e.g. a linear process string, though it could
have been a tree structure like with Unix).

This of course puts the system using libraries and emulation
at a significant disadvantage.  If you flipped the test and
required the legacy interface to emulate the newer
interface through libraries and servers you could expect
the performance comparisons to be quite different.

Still, despite that lack of balance in the comparison, I
argue (again from experience) that the object-capability
interface can be made to perform as well as any interface
by what amounts to pushing whatever parts of the system
require higher performance into essentially a monolithic
kernel.  The real issue (from my experience) is the number
of domain transitions required for a given call.  This can
be reduced as much as necessary at the cost of
weaker integrity, reliability, analyzability, etc. in the
resulting monolith.

That is, the trade-off isn't in the interface, it's in the
service implementation.

>An analysis of how these failures can be avoided by careful
>construction of the microkernel primitives is contained in the
>following paper.  Also contains references to other projects.
>http://www.l4ka.org/publications/paper.php?docid=642
>
>This last paper illustrates that the requirements for the
>communication primitives go beyond what you specified.  This does not
>mean that the decisions made by Liedtke are the only feasible ones,
>but they show that the design constraints are tight.  You have to do
>an effort to get a fast IPC mechanism, and you can not design the IPC
>mechanism arbitrarily.

I didn't argue that care isn't required.  Certainly any interface must
be carefully designed with performance in mind.  This is particularly
true for an interface (e.g. an underlying invocation or RPC interface)
on which much of a system will depend.  Even then, however,
I argue that techniques are available for improving performance
that can be done behind the scenes and needn't impact the
base object-capability interface.

><snip>
>It took years for the microkernel community to develop a fast IPC
>mechanism (due to Liedtke).  Now it is taking years for the L4 group
>to solve the resource allocation problems involved in managing the
>mapping database.  And even if they succeed it is unclear if their
>mechanisms are universal.  Comparing KeyKOS/EROS/Coyotos with L4 shows
>many similarities, but also some fundamental differences.  Given that
>even a single level of indirection makes these systems non-competitive
>with traditional monolithic kernels without object/capability system
>should make one careful about any exaggerated claims.

I described my approach to dealing with a "single level of indirection"
above.  Namely, eliminate it where needed, smash the mechanism
into a monolithic kernel (which may be needed to provide the required
performance), but leave the interface the same.  In my experience
this works - though it may not be as pretty or analyzable, etc. as
some may wish.  It can meet the performance goals, and it may
be the only approach that will.  It can also meet the functional
goals of fully wrappable object-capabilities.

Clearly you can't demand a modular design with many domain
changes per request, where domain changes have a significant
cost, and meet a stringent performance goal.  Something
has to give.  If the performance requirement is ascendent
then I argue that the modularity must be sacrificed.  I don't
believe that an appropriately design object-capability interface
is (in my experience) or will be a significant issue.  That interface
has a value of it's own (POLA, virtualization, etc.) that can be
maintained along with meeting performance goals.

> > >Note that I have no problem envisioning such a solution for a set of
> > >high-level applications with no critical demands with regards to
> > >operating system resources.  As soon as you ignore issues of timing
> > >and resource allocation, it all seems very easy, and the solutions are
> > >reasonably well understood.  But not all applications belong into this
> > >category, in fact, in my opinion fewer and fewer do.  Google tries
> > >very hard to keep response time of their web service below a certain
> > >critical threshold (couple hundred of miliseconds).  There is a reason
> > >for that, and that reason applies doubly to interactive computing
> > >environments of all sorts.
> >
> > There is nothing about the performance requirements of the Google or
> > indeed almost any other application that depends critically on
> > OS overhead issues.  These limitations (unless the applications
> > are quite poorly designed) come from basic CPU, memory, and I/O
> > limitations.  It surprises me to hear anybody suggest otherwise.
> > While it would be possible to design a system (e.g. one that
> > required proxying for any capability communication and whose
> > architecture needed many levels of such communication) that
> > would perform poorly and where the "OS" induced overhead
> > would become a significant part of the performance problems,
> > I don't believe it takes very serious thought to insure that
> > such problems aren't evident.
>
>You make it sound like Mach never happened,

Heh.  For me Mach didn't happen.  I existed in a parallel
world.  One in which we did meet our performance requirements
with an approach as I described above.

>and the whole question of
>resource scheduling in an operating system is just a matter of
>throwing more resources at a problem.  And this although there are
>known worst-case scenarios where more resources (like CPU cache)
>actually can degrade performance[1].  I am quite astonished, and don't
>know what to say in reply.

I hope you see that I accept that one can't simply throw more resources
at a modular architecture and make it perform as well as a monolithic
architecture.  In some cases you can reduce the overhead of the
domain changes (e.g. from an OS based domain change to a
language based thread change) and get adequate performance.

For example, in our case while we "moved" our file server into
the kernel of the system, the mechanism we used was such that
all the file server code and its basic structure was maintained.
In our case it was multiprocessed service threads that handled
the requests on the file objects.  These threads still existed
when moved into our kernel.  If the file server was to fault (e.g.
out of bounds memory reference) the system would crash, but
in practice even with a file server in a separate user level process,
if it faulted the system ground to a halt very quickly.

>To finish this, let me give you a very specific, but hypothetical
>example, of the pitfalls in emulation.
>
>Say you have a system S built on an operating system O, using a
>capability model.

I'll point out that you are arguing from an asymmetric viewpoint
to begin with.  Why is it that the comparison is between a
"system S" built on a capability model and a native implementation
of S, rather than comparable application run on a capability
operating system, O?  Of course I understand the legacy issues,
but if this is always the comparison then of course it will be
impossible to compete.

>In addition to the standard object invocation
>mechanisms, the following assumptions are made: All object invocations
>are atomic and stateless, and messages are limited in size by 128
>machine words.
>
>Now the system is to be ported to an operating system O', which has
>very much the same semantics as O, however, the maximum message size
>is limited to 64 machine words.

My goodness.  You are a glutton for punishment.

> >From a theoretic point of view, it seems clear that the system S can
>be run on O' by emulating a message size of 128 machine words, using
>two consecutive messages instead of one.  However, from a practical
>point of view, this raises some serious issues: To use two messages,
>the invocation becomes stateful.  Stateful invocation means not only
>that the operational semantics of the RPC system change completely.
>On the client side, it is simple enough.  But on the server side,
>there is additional state that needs to be stored for each partial
>invocation.  This additional state requires resources.  As a
>consequence, a number of read-only operations may exhaust server
>resources, something that previously may not have been the case at
>all.  Fixing this may require some serious changes to the overall
>system structure.  In any case, it is not a pretty sight.

All I can say is that in our system we had no limitations on message
size (the buffers could be as large as the process memory).  Any of
our operations could be what you regard as "statefull".  What's the
problem?  Our servers handled the worse case behavior by clients
(as I described elsewhere with "good guy" timeouts - an issue I
hope to get back to) and we considered it quite a "pretty sight."

>One should note that we only changed a single parameter in a trivial
>way, namely the maximum message length.  In the real world, there are
>many parameters that change in non-trivial ways, and things are much
>harder still.

I agree that it's possible to design ineffective interfaces, and it's possible
to incur overhead by transiting many costly domain changes.  I also accept
that if one is designing to changing requirements (emulating one system
at one time and another at a later time), then it will be difficult to ever
meet performance requirements.

What I don't believe is that any of this is related to the value of the
basic object-capability programming model.

One other thing I'll mention in passing regarding your designs
with fixed RPC buffers.  I can only assume you are suggesting
such designs because the user buffers are copied at some point
into system buffers apart from their ultimate destination.  This
is not necessary and may not be a wise choice.  As I mentioned
previously, in the NLTSS system we did not use such an approach.
There are more details of our architecture in:

http://www.eros-os.org/pipermail/cap-talk/2007-January/006807.html

if you're interested.

I sense we may be at something of an impasse.  It seems that nothing
less than re designing/implementing a Unix emulation on top of Mach
with performance equal to a Unix implementation with a monolithic
kernel will satisfy you.  Of course I can't do that, even if I had adequate
time and resources.  What I can tell you is that we implemented an
object-capability system where even under the constraint of like
performance of an emulation of a monolithic kernel with that of
the comparable monolithic kernel we were able to achieve
comparable performance.  For us the base "invocation" interface
was simply not a significant issue.  The performance issues lay
in other areas such as in the number of domain changes of various
types and in the number of copies of data (which we reduced to
zero for all the relevant cases - even for I/O to rotating storage).

That's my experience.  I believe it's applicable to the other architectures
that you mention, though to get good performance may require some
munging of domain boundaries to minimize costly domain changes
and may require some redesign to minimize data copying and perhaps
limitations on buffer sizes.

--Jed http://www.webstart.com/jed/ 




More information about the cap-talk mailing list