[cap-talk] Another "core" principle - virtualize capabilities

Marcus Brinkmann marcus.brinkmann at ruhr-uni-bochum.de
Wed Jan 3 11:42:14 CST 2007


At Sun, 31 Dec 2006 13:33:12 -0800,
Jed Donnelley <capability at webstart.com> wrote:
> > > All I'm arguing for is the ability to standardize the object-capability
> > > model sufficiently to map from one system to another - e.g. along
> > > the lines of the DCCS.  Any such general mapping will automatically
> > > provide the needed standard and "smooth out" any nuances between
> > > the systems, provided that the systems were designed with
> > > virtualizable capabilities only.
> >
> >I understand, but some goals are just pipe dreams, and not attainable
> >in the real world.
> 
> I disagree.  I believe they have been met or nearly met (needing only
> minor changes) by many systems.  For example, I believe that the RATS
> operating system:

I am really confused now.  Are you claiming that any other system that
was "designed with virtualizable capabilities only" can be ported to
RATS with only minor modifications to how to express and invoke object
interfaces, and still perform according to its specification?

I think the real problem is that the number of systems that are
designed with "virtualizable capabilities only" is exactly zero,
because:

  The capability model is not a complete model of computation.

Also, I claim the following:

  The object model is not a complete model of computation.

Both sentences I think hold for any value of "capability model" and
"object model" that is usually meant by these terms.  In particular,
the object model you described in another part of this thread falls
into this category:

   "There an "object" is the permission to do an 'invocation'
    consisting of the sending of any amount of data and any
    number of other objects and the blocking receiving
    of same."

I claim that this is merely the beginning of a model for computation,
and that it lacks, among other things, in the words of Jonathan,
considerations of:

   "durability, resource exhaustion, and synchronicity"

In other words, it lacks *anything* related to the question of
executing instructions on a real machine architecture.

The "nuances" in system design may initially fall outside of the
capability or object model, however, they have real consequences for
system construction and reflect on the object and interface design
itself.  The read/write interface for a file object may look very
different from system to system, for example, depending on how the
data is transferred (shared memory vs copy in/out, in-band vs
out-of-band) and when (synchronously, asynchronously), and using which
resources (client, server, etc).  And although some of these features
may be emulated on top of the other, doing so will degrade performance
in a manner that it disappoints expectations.

This is my whole argument, but let me apply it in specific scenarios
for illustration:

> > > If they aren't then I regard their
> > > design as unwise - both from the perspective of remote mapping
> > > and from the perspective of many other "wrapping" mechanisms that
> > > we've discussed like membranes and delegation with responsibility
> > > tracking.
> > >
> > > I regard this virtualizability as relatively easy to achieve if it
> > > is considered worthwhile from day one.  I don't believe there are
> > > any fundamental issues with it.  E.g. I don't believe it conflicts
> > > with any need to provide flexible memory mapping mechanisms as
> > > Shap. seems to suggest.
> >
> >Well, make a proposal for such a set of fundamental mechanisms that is
> >sufficient for all large scale systems built on it.
> 
> Sigh.  I believe I've done so many times - always the same proposal.
> Perhaps you can tell me why you believe it to be inadequate.  The
> proposal is that any object-capability system (whether at the language,
> OS, or network level) provide:
> 
> A single operation, "invoke", on a capability which can send any
> amount of data and capabilities in and receive any amount of data
> and capabilities back (subject to the usual buffer sorts of
> limitations on sending and receiving).
> 
> (note for recent discussion on the topic, I believe that
> synchronicity of the invoke operation is a matter for the
> system decide - though any amount of asynchronous operation
> can be build from a base synchronous call).

Yes, it is possible, from a functional point of view.  But the
performance is quite different.  Consider a system that emulates the
select() POSIX system call using a synchronous call primitive, say on
5000 file descriptors.  There are not many applications that need this
functionality, but those that do are decidedly performance critical.
Sometimes additional threads are not cheap enough.

A system which is designed to support this efficiently depends on
asynchronously delivered notification that can unblock a blocking
thread.  It is not feasible to implement such a mechanism in a
synchronous message system.

Note that people who advocate a synchronous call primitive do not say
that select() can be implemented on top of it if pressed on this
point.  Instead, their second line of defense is that select()
semantics are considered by them to be broken.  That's fine, but that
means dropping the goal of universality.

> There must be an extension mechanism in the system that allows
> appropriately enabled processes to create new capabilities that
> they can communicate and service such "invocation"s on.  The
> extension mechanism must be such that any of the systems
> primitive capabilities can be virtualized.  This extension
> is trivially achieved in language and network level systems,
> and I believe also easy to achieve with OS level systems.
> 
> This is not difficult to achieve and with what I regard as
> good performance.  Regarding:

It's hard to argue with your personal judgement of "good performance",
but I will admit your point.  However, I think there are many
applications where "good" may not quite be "good enough".  As test
cases, I suggest Gigabit Ethernet and live audio processing.  That's
not particularly high-end, but typical end-user requirements these
days.

> >Given that you
> >don't consider resource allocation issues a serious concern
> 
> You must have been listening to others write about me.  I consider
> functionality the primary concern, but I certainly do consider
> "resource allocation" issues a serious concern.  I spent some
> 5-7 years of my career optimizing for such "resource allocation"
> issues on the NLTSS system at LLNL (~1983-1989).  I'll note that
> none of the resource/performance issues that I worked on had
> anything to do with the nature of the object-capability system
> call interface or on the extension mechanism.  It always
> puzzles me why many people seem to feel that an object-capability
> interface imposes performance problems/limitations on
> systems (whether at the language, OS, or network levels).

It's not that "an" interface imposes necessarily performance problems.
However, the "wrong" interface certainly will.  What is the "correct"
and what is the "wrong" interface depends greatly on system details
that lie well outside of what is covered by your proposal.

> Have there been studies that I've missed about object-capability
> systems that performed poorly because of OS overhead?

I don't know if you missed them, but there certainly have been
studies.

As a typical example, let's consider Mach, which fits your description
of a pure capability system.  It even uses the same message interface
for user and kernel objects.  Its failures have been carefully
analyzed here: http://citeseer.ist.psu.edu/chen93impact.html

An analysis of how these failures can be avoided by careful
construction of the microkernel primitives is contained in the
following paper.  Also contains references to other projects.
http://www.l4ka.org/publications/paper.php?docid=642

This last paper illustrates that the requirements for the
communication primitives go beyond what you specified.  This does not
mean that the decisions made by Liedtke are the only feasible ones,
but they show that the design constraints are tight.  You have to do
an effort to get a fast IPC mechanism, and you can not design the IPC
mechanism arbitrarily.

> >I am
> >however doubtful that your confidence in this matter is on solid
> >grounds.  I have thought quite a bit about the differences in the
> >design of OS level features and I don't see a common set of
> >fundamentals that would be sufficient for all large scale systems
> >built on top of them.
> 
> Tell me then what you regard as inadequate about the above.   Is
> there something fundamental about such a simple interface that
> suggests a resource allocation (performance?) problems?

I am not sure what you are looking for here.  The proposal is not
specific enough to entice a specific response to this question.  The
claim was, as far as I understood it, that a single mechanism is good
enough for everybody, and that small differences between mechanisms
can be smoothed out by emulation of features.  I think that this claim
is wrong because it ignores that the capability model is not a
complete mechanism of computation and that a sufficiently detailed
model will bear characteristics which makes it non-universal, at least
to my knowledge.  A universal set of mechanism is very much sought
after.

It took years for the microkernel community to develop a fast IPC
mechanism (due to Liedtke).  Now it is taking years for the L4 group
to solve the resource allocation problems involved in managing the
mapping database.  And even if they succeed it is unclear if their
mechanisms are universal.  Comparing KeyKOS/EROS/Coyotos with L4 shows
many similarities, but also some fundamental differences.  Given that
even a single level of indirection makes these systems non-competitive
with traditional monolithic kernels without object/capability system
should make one careful about any exaggerated claims.
 
> >Note that I have no problem envisioning such a solution for a set of
> >high-level applications with no critical demands with regards to
> >operating system resources.  As soon as you ignore issues of timing
> >and resource allocation, it all seems very easy, and the solutions are
> >reasonably well understood.  But not all applications belong into this
> >category, in fact, in my opinion fewer and fewer do.  Google tries
> >very hard to keep response time of their web service below a certain
> >critical threshold (couple hundred of miliseconds).  There is a reason
> >for that, and that reason applies doubly to interactive computing
> >environments of all sorts.
> 
> There is nothing about the performance requirements of the Google or
> indeed almost any other application that depends critically on
> OS overhead issues.  These limitations (unless the applications
> are quite poorly designed) come from basic CPU, memory, and I/O
> limitations.  It surprises me to hear anybody suggest otherwise.
> While it would be possible to design a system (e.g. one that
> required proxying for any capability communication and whose
> architecture needed many levels of such communication) that
> would perform poorly and where the "OS" induced overhead
> would become a significant part of the performance problems,
> I don't believe it takes very serious thought to insure that
> such problems aren't evident.

You make it sound like Mach never happened, and the whole question of
resource scheduling in an operating system is just a matter of
throwing more resources at a problem.  And this although there are
known worst-case scenarios where more resources (like CPU cache)
actually can degrade performance[1].  I am quite astonished, and don't
know what to say in reply.

To finish this, let me give you a very specific, but hypothetical
example, of the pitfalls in emulation.

Say you have a system S built on an operating system O, using a
capability model.  In addition to the standard object invocation
mechanisms, the following assumptions are made: All object invocations
are atomic and stateless, and messages are limited in size by 128
machine words.

Now the system is to be ported to an operating system O', which has
very much the same semantics as O, however, the maximum message size
is limited to 64 machine words.

>From a theoretic point of view, it seems clear that the system S can
be run on O' by emulating a message size of 128 machine words, using
two consecutive messages instead of one.  However, from a practical
point of view, this raises some serious issues: To use two messages,
the invocation becomes stateful.  Stateful invocation means not only
that the operational semantics of the RPC system change completely.
On the client side, it is simple enough.  But on the server side,
there is additional state that needs to be stored for each partial
invocation.  This additional state requires resources.  As a
consequence, a number of read-only operations may exhaust server
resources, something that previously may not have been the case at
all.  Fixing this may require some serious changes to the overall
system structure.  In any case, it is not a pretty sight.

One should note that we only changed a single parameter in a trivial
way, namely the maximum message length.  In the real world, there are
many parameters that change in non-trivial ways, and things are much
harder still.

Thanks,
Marcus

[1] http://www.l4ka.org/publications/paper.php?docid=644




More information about the cap-talk mailing list