[cap-talk] Object-capability vs. monolithic performance - 1
Jed Donnelley
capability at webstart.com
Thu Jan 4 19:03:04 CST 2007
At 06:29 AM 1/4/2007, Jonathan S. Shapiro wrote:
>[Subject change. Thank God for Google. I'ld *never* find anything on
>this list if Google didn't index us.]
>
>Warning: mild flares ahead, but with humor.
>
>On Wed, 2007-01-03 at 17:07 -0800, Jed Donnelley wrote:
> >At 09:42 AM 1/3/2007, Marcus Brinkmann wrote:
>
> > >It's hard to argue with your personal judgement of "good performance",
> > >but I will admit your point. However, I think there are many
> > >applications where "good" may not quite be "good enough". As test
> > >cases, I suggest Gigabit Ethernet and live audio processing. That's
> > >not particularly high-end, but typical end-user requirements these
> > >days.
> >
> > With appropriate design any "call" overhead from the capability
> > interface is comparable to that for a typical system call and
> > has no impact on bandwidth intensive applications like
> > Gigabit Ethernet or live audio processing.
>
>Jed: since my lab actually *did* this
Did what? Compared two system call interfaces for performance, one a
wrappable/mappable object-capability call interface and another not
(hard to describe, e.g. whatever it comes down to in a Unix system)?
It sounds (below) as though you were comparing two internal implementations,
not the cost of the system call interface. It's the relative cost of the
system call interface that is the issue I'm concerned about. More
details below.
>three years ago, I feel pretty
>confident that your experience from the 1980s is not predictive. The
>necessary latency requirements for Gbit ethernet are sub microsecond and
>the packet rate is many decimal orders of magnitude higher than anything
>a disk can put out even today.
I believe that in our time frame the latency requirements for the solid
state disk had those same properties. Remember, we're talking about
the latency of access to memory as seen through a file system
interface. Of course it was absolutely impossible to get the latency
through a "file" read/write capability invocation down to the level of a memory
fetch/store, but fortunately we didn't have to do that. 'All' we had to do
was to get the latency down to the level that was available through the
competing more traditional system call interface to a monolithic kernel.
What we found (obvious if you think about it) was:
1. The latency for this user level service was bounded below by the
latency of the domain exchange mechanism from the user to the
"system." At some point no amount of work could improve that cost,
whether through the more traditional system call interface to the
monolithic kernel system or through the wrappable/mappable
object-capability interface. The cost of this exchange was
rather high for us - some thousands of instruction times.
2. There were initially substantial additional costs in our micro-kernel
implementation, but these costs were dominated by the domain
change overhead in switching between the protected modules within
the system. In our case our file server ran in user mode. This meant
that to process a file I/O call (independent of the interface) we had to
exchange to the kernel, then back up to the file server, then back
to the kernel, and finally back to the user. By moving the file server
into the kernel (even with the same multithreaded code, same
processing of requests as through a capability invocation, etc., etc.)
we reduced our latency by about this factor of two, bringing it in line
with the latency for the monolithic kernel system - which was deemed
adequate for the solid state disk access application (though I'm not
sure why as higher performance was possible if application level
code could run in the kernel).
Perhaps you can imagine how many years (literally) I spent pouring
over system traces, cycle analyses, etc. in the effort to improve
this performance and how much work was involved by my whole
team with this "server in the kernel" effort (the way we referred
to it). There was nothing "glib" about it. We were fighting for our
lives. As I recall this performance improvement was one of the
milestones that we had to reach for "Big P" production:
http://www.computer-history.info/Page5.dir/pages/Chronicles.dir/images/big-p.gif
(for your amusement, part of:
http://www.computer-history.info/Page5.dir/pages/Chronicles.dir/index.html
). That's me with the hat at the rightmost, chasing that big P. The
Condor flying over us is one of our benchmark applications from the time, also
seen in this cartoon:
http://www.computer-history.info/Page5.dir/pages/Chronicles.dir/images/nltss-road-work.gif
where I'm again the one in the hat.
One thing we were particularly proud about in this effort was that
at the end of the process we could basically throw a switch and
build a system with any given service process in the kernel
(lowering overhead but increasing risk) or out of the kernel
(reducing overhead but decreasing risk). To quite a low level
the server code was the same. This "switch" throwing basically
amounted to loading a different set of libraries for the two different
cases.
Of course for our production systems (with their requirement for
low latency access to the solid state disk through the file
I/O interface) we never ended up building another system with
the file server as a user level process. Still, the opportunity
was there and we did build such systems for testing. The users
were content and we were content. From out perspective the user
interface was still wrappable/mappable. This file server in the kernel
could even receive requests directly from remote user applications
on the network and process them appropriately (safely) just as it did
for local "user" or "system" processes.
I guess every generation has some challenges which, if met,
result in a certain amount of pride. Your $500 challenge below
suggests such an area for you. You can see at least one of
ours in the above.
Since I side tracked so far anyway, I'll mention again for emphasis
that all the above work was done without any need for changes in the
user level "system call" interface that was essentially wrappable/mappable
object-capability. While this consistency made life easier for our users,
it wasn't strictly necessary. All our applications used libraries
that effectively
hid the base system call interface. We could (at high cost) and occasionally
did change the base "invocation" call (for us it was a "communicate" call,
the only "system" call that our system supported). This required our
codes to be reloaded with new low level libraries (perhaps the equivalent
of glibc), but the applications themselves didn't need to change. I'll note
that the only times we found it worthwhile to do so were for functional
reasons, not for performance reasons - though as you see above performance
was vital to us.
<more details in #2>
--Jed http://www.webstart.com/jed/
More information about the cap-talk
mailing list