[cap-talk] Claim: correct generic wrapping is not possible in principle
Jed Donnelley
capability at webstart.com
Tue Jan 2 01:30:18 CST 2007
At 09:15 AM 1/1/2007, Jonathan S. Shapiro wrote:
>On Sun, 2006-12-31 at 19:58 -0800, Jed Donnelley wrote:
>
> > A general solution seems simple to me - namely partial buffering
> > of messages, both for data and for capabilities. This immediately
> > occurs to me because it was the approach used in NLTSS (though
> > there only for data as capabilities were data).
>
>I suspect that you are once again thinking first about the network case.
>Yes?
>
>In the local case:
>
> Partial buffering with dynamic bound leads to unpredictable blocking
> behavior and kernel resource exhaustion.
I don't agree. I have detailed experience with the NLTSS system
in this regard (only data buffering, not capability buffering, but
I don't think the distinction is relevant). I'm sure you agree
that the buffers in the sending and receiving process ("user" level
buffers) are necessary. In the NLTSS system almost all local
transfers were from user memory to user memory. The only exception
was the very rare case where it wasn't possible (or likely) to
get both the sending and receiving buffers into memory at the
same time. In that case the transfer worked like a network
transfer (packet by packet) even in the local case. In practice
that never happened except in some testing I did with a test of
a very small memory configuration and large application buffers.
> Partial buffering with static bound gives nominally predictable
> behavior, but pragmatically unpredictable behavior as programmers
> come to rely on the fact that short message sends always go to
> available buffer space. In practice, programmers come to assume that
> short messages are non-blocking and this has been observed to lead to
> deadlock in real systems.
No such deadlock situations showed up during the ~11 years that the
NLTSS ran production scientific computing at LLNL.
> ANY buffering costs a factor of four in communication performance.
> There's really good measurement on this available in the literature.
You must be referring to buffering beyond the sending and receiving
buffers in the communicating processes which are of course necessary.
In the NLTSS system there was almost never any additional buffering,
so the costs you refer to weren't there. I don't see why this same
basic approach can't work in any system.
>So yes. Partial static buffering is good, and in the local case the
>empirically correct number of bytes and capabilities to buffer is zero.
>
> > Perhaps I should also point out that for long delays or
> > issues managing receive buffers, the buffer sizes can
> > be reduced to zero so that the receiver can be notified
> > when any data is available (when buffers can be allocated).
>
>In the local case, the concerns with buffering mostly aren't on the
>receiver side. They are on the sender side.
>
>There is exhaustive discussion of the (local case) IPC buffering issue
>in the OSDI/SOSP literature, and it's now universally agreed that
>buffering in the local case is very bad from almost every conceivable
>perspective.
Again assuming you are referring to buffers beyond those necessary
sending and receiving buffers in the processes, then I agree. That's
why we designed a system that almost never needed any such buffering.
Any such buffering was only required in what amounts to a thrashing
situation. In that case of course performance is bad as I assume
it is in any system - regardless of external buffering needed for
communication.
>In the network case, buffering is inevitable and the source of a bunch
>of problems.
What problems? Are you referring to something like network congestion?
While I agree that congestion can be a problem, in practice this
problem can be managed - efficiently.
> > > > but this is
> > > > not the main source of my objection.
> > > >
> > > > Let is consider solutions to (a):
> > > >
> > > > (i) We might assume infinite storage that is trusted to be promptly
> > > > allocatable. A system embedding this assumption is
> intrinsically
> > > > incorrect, because it does not admit ANY guarantee of liveness.
> > > >
> > > > (ii) We might ask somebody for the protocol schema. Ultimately, the
> > > > only party who *really* knows the protocol is the implementing
> > > > server. Unfortunately, if the object we are wrapping is hostile
> > > > it will lie (either to us or to the hypothetical protocol
> > > > registry).
> > > >
> > > > (iii) We might rely on a higher-level protocol (one that the service
> > > > also relies on). For example, we might design the transport
> > > > layer to disclose to us how many bytes we received as distinct
> > > > from how many bytes were transmitted, providing a basis to
> > > > allocate more receive buffer and ask for a message retransmit.
> > > >
> > > > Note, however, that this approach is not fully transparent,
> > > > because it may involve an observable deviation from
> the protocol
> > > > of the real object.
> > > >
> > > > (iv) We might implement unbounded message transmit, allowing storage
> > > > allocation while transmitting but admitting allocation failure
> > > > as a reason for transmission failure.
> > > >
> > > > Unfortunately, this does not work when a server replies to a
> > > > presumed-untrusted client. This is a long, complicated issue.
> > > > The paper:
> > > >
> > > > Vulnerabilities in Synchronous IPC Designs
> > > > http://www.eros-os.org/papers/IPC-Assurance.ps
> > > >
> > > > scratches the surface, but isn't comprehensive.
> > > >
> > > > If we do not know the protocol, we cannot anticipate the
> maximal size of
> > > > incoming messages that will be forwarded.
> > > >
> > > > As you try to sort out solutions to all of this,
> >
> > Sorry, but I really don't see the problem. The above issues are common
> > to many network services and have been managed just fine for many years.
>
>No. Respectfully, they have not. They have been universally "managed
>just fine" only in systems that accept one of the constraints above. In
>practice, the overwhelming majority of network systems blindly assume
>infinite resource without actually checking that assumption.
>
>However, I was focusing on the local case.
As am I. Since I designed and implemented a system that dealt with
these issues "just fine" in production for many years (NLTSS:
http://www.webstart.com/jed/papers/Components/
) I guess we will have to remain in respectful disagreement. To try to
make some progress (though I despair of this being an effective forum)
I'll try to tease out some more detail from your cases above and
from your paper in this message. I'll refer to the production NLTSS
systems in all my notes. Let me just describe at a high level how
those systems worked. As described briefly in figures 11 and 12
and surrounding text in:
http://www.webstart.com/jed/papers/Components/
that system used what can be partial or full client send and
receive buffering for all communication. If both buffers
were present in real memory when initiated, the transfer
went memory to memory with no additional buffering. If one
or the other buffer wasn't in real memory then the transfer
started it worked as if it was a network transfer - namely a
packet was buffered. Just as with any network transfer, it was
assumed that this packet could be lost to congestion. In
the overwhelmingly usual case the receive buffer was already
in memory when the send happened (usual with servers). In
the next most likely case (very rare actually) the receive
buffer would be brought into memory while the send buffer
was still in memory and the transfer would complete memory
to memory - with the buffered packet being thrown away.
In the case that was so rare as to only exist under
artificial test conditions (small machine memory
configuration and artificially large send/receive
memory segments), the transfer proceeded as a network
transfer, with the send and receive buffers being
successively pulled into memory (thrashing) and the
data transferred by packets. Performance was of
course horrible in this case, but it worked and
was never observed in the wild (a production system).
This system did run on multiple computer systems
(generally large Crays) on a network between about
1984 and 1995.
So, considering this implementation with your above cases:
(i) Of course infinite memory was not assumed. Even in
the exceedingly rare case of actual buffering (local packet
transfer), the packets could be thrown away at any time
(like the network congestion case) and the transfer could
still complete eventually.
(ii) At the point where the kernel gets involved we're talking
about one send buffer and one receive buffer. That data or any
"protocol" notions are at a higher level and properly ignored
at this level. I don't even understand how they might come into
play.
(iii) I'm not sure I understand what you mean by this case.
In our system delivery of data was visible in an incremented
pointer in the buffers (send and receive). When a receive
buffer was terminated, it could happen either because it
was full (partial transfer) or because the transfer was
complete. Do you see a problem with that approach?
(iv) I guess this is the case we implemented - unbounded
message transfer. The overwhelmingly most common case was
full transfers buffer to buffer. A vanishingly small
case in terms of numbers, but still important in terms
of overall system performance was the transfer of large
amounts of data to or from a file. This worked as a single
large (unbounded - except for a fixed 64 bit file
address) transfer with partial buffers in the writing or
reading process. The NLTSS system even avoided buffering
data outside these processes for transfers to/from disk.
The send or receive buffers were locked in memory during
the disk transfer (except for partial sector transfers
at the beginning and end - which of course programs
interested in performance avoided).
At the beginning of the above paper you say:
>Recent advances in interprocess communication (IPC)
>performance have been exclusively based on threadmigrating
>IPC designs. Thread-migrating designs assume
>that IPC interactions are synchronous, and that user-level
>execution will usually resume with the invoked process
>(modulo preemption). This IPC design approach offers
>shorter instruction path lengths, requires fewer locks, has
>smaller instruction and data cache footprints, dramatically
>reduces TLB overheads, and consequently offers
>higher performance and lower timing variance than previous
>IPC designs. With care, it can be performed as an
>atomic unit of operation.
I'm not quite sure how to describe what we implemented
in terms of synchronous vs. asynchronous. Our design
supported both styles of communication (so I guess in
that sense it was asynchronous), but the overwhelmingly
most common use was for synchronous communication as
with a remote procedure call - which meant that
performance was quite good for that case.
<note, in this paragraph I hadn't yet adopted what
is apparently your terminology of referring to buffered
communication as asynchronous and un buffered as
synchronous>
> > > > be sure to consider how
> > > > your solution might be exploited by hostile clients to implement denial
> > > > of resource and/or denial of service attacks.
> >
> > Denial of service attacks can be a problem in any network service.
>
>I was not concerned about network services. I was concerned about the
>*local* case.
Sorry, I should have said in any communication service. Denial of
service attacks work just as well locally as in the network case.
In fact local attacks are much more serious because the bandwidth
available locally is so much higher and the latencies can be so
much lower. Administratively denial of service attacks locally
in our experience tend to be less common. The most common sort
from our experience tend to be like this:
http://www.computer-history.info/Page5.dir/pages/Boot.dir/
where the problem was a self inflicted mistake. In that
case performance tends to go to hell for a while until
an unintended over consumer can be shut down. This seems
pretty inevitable to me. Trying to somehow deny this
case would deny equivalent intended cases.
> > I agree with this as well. These issues exist today and are dealt
> > with in current servers. What do these resource issues have to
> > do with wrapping? I don't see anything in the above that's
> > specific to the wrapping problem.
>
>A correct wrapper must not issue errors or observable differences in
>behavior w.r.t. the things that it wraps. If it needs to allocate
>storage, and consequently needs to demand a message re-send, it isn't
>doing a faithful virtualization.
>
>Jed: please read the paper I cited. I'm simply not going to replay the
>entire paper here when I've already dealt with this in the literature
>with considerable care.
I read the above paper. Naturally it can't show that something
which was demonstrated in production for over 10 years isn't possible.
You seem to suggest that the only alternatives are what you refer
to as synchronous (non buffering) or asynchronous (buffering) - though
that terminology is a bit odd to me (I usually think of synchronous
as blocking send/receive and asynchronous as separated sending and
receiving, but I will try to adapt to your terminology)
Of course what we implemented was a hybrid where the dominant
case was synchronous, but asynchronous transfers were used
when needed (e.g. large blocks being buffered to rotating
storage).
Regarding "reproducibility" - any timeouts at the application
level necessarily leads to the possibility of non reproducibility.
I don't see how this can be avoided. In our system I introduced
what was referred to as "good guy" timers for use at the application
level. The basic idea was that a timeout was involved, but that
even if the timer expired the timeout event wasn't triggered
unless there was a simultaneous need for the event because of
a resource oversubscription. A generic example might help describe
this. Servers often must allocate receive buffers to accept
requests. If a partial request is received then a timer is
initiated awaiting the completion. As long as there are plenty
of receive buffers, the event that triggers the throwing away
of some number of old receive buffers need not be triggered.
However, if some pool of receive buffers has been exhausted
and the timeout expires, then some receive buffers are
terminated and thrown away - resulting in a failure.
We found this "good guy" approach to be quite effective in
dealing with timeouts. For example, it allowed us to effectively
do interactive debugging (breakpoints, etc.) of our test systems
without having timeouts cause problems all the time.
Still, I argue that there is no way to guarantee reproducibility
in the face of application level timeouts. I suppose you can
try to get around this by not making a clock or a timeout
mechanism available. This seems unwise to me.
I'm afraid your discussion of issues related to revelation of
a sender's "task id" are lost on me. Any such "task id" was
hidden within the process servers designation and visible only
to another process in possession of a process capability in
our system.
Regarding Asymmetric Trust, I believe the NLTSS system worked
perfectly fine. As discussed above it was in principle un buffered,
though buffering could be used in the thrashing situation as
with network communication. NLTSS also used multithreaded
servers. There were a fixed number of such threads. Timeouts
protected against denial of service attacks. There were never
a practical problem. When the inadvertent denial of service
attack occurred it was quickly remedied and the only consequence
was a slight degradation of system performance.
Regarding timeouts, I described our "good guy" approach above.
It seemed to work fine for us.
Regarding "Invokee in Wrong State" and the issues of "open" or
"closed" waits - if I'm understanding correctly we used open
waits. As you note with a sufficient number of requests before
a server ran, these open waits could be exhausted, resulting in
a client having a send blocked. This is another case that only
happened in testing and rare unintended denial of service
attacks. Even in those cases the sends were eventually
processes. In the vast majority of RPC like cases, the requestor
was waiting not on the send to be delivered but on their
receive to complete, so the temporary blocking of the
send was invisible to them.
I view the "Trusted Object Buffer" mechanism that you describe
as over the top. That is to me it's addressing a problem that
is not a practical one and introduces too much mechanism for
the value that it attempts to provide.
Regarding your concluding remarks such as "To our knowledge,
no previous papers have been published that expose in depth
or adequately address the interprocess denial of service
vulnerabilities that are implicit in synchronous IPC designs."
We were aware of many systems that had dealt successfully with
the same issues that you describe and that we dealt with
effectively in the NLTSS system, so we saw no need to publish
what seemed to us as obvious. Once again this seems to me
another case where "The more things change the more they stay
the same." Seems sad in some ways.
These issues are not of vital interest to me at this time,
so I'm afraid I won't want to allocate too much time dealing
with them. The only aspect that is vital to me is that
object-capability system can have all fully wrappable
objects. I don't believe this discussion bears materially
on that topic.
--Jed http://www.webstart.com/jed-signature.html
More information about the cap-talk
mailing list