[EROS-Arch] Questioning need for Call Count
Jonathan S. Shapiro
shap@eros-os.org
Fri, 10 Nov 2000 10:24:59 -0500
Charles Landau wrote:
>
> So far I can see five problems with this proposal (listed in order of
> increasing severity).
>
> 1. The resume capability will most likely be deprepared/unlinked
> eventually. If this is deferred, it is less likely that its neighbors
> (nearly always just the client process) are in the cache.
I agree that this will occur, but I'm not sure it's a problem. Start
keys, for example, already exhibit the behavior you are describing, and
I don't see how resume keys will be different in this regard. Am I
missing something?
In any case, I have an unrelated design proposal coming (a TR will go up
at our lab as soon as I can rerun latex -- it's advising silly season
here at Hopkins) that should mitigate such issues.
> 2. By duplicating the code to check the call count in every stub you
> increase the size of code.
Agreed. This is a case of trading four instructions (six on a risc
machine) in every stub for an improvement in speed overall and a
reduction in kernel complexity.
> 3. To work reliably the way EROS and KeyKOS currently do, every call in a
> process needs to increment the same counter. Is this to be a global
> variable? Then how do you handle someone who wants to use a thread model,
> with multiple threads sharing the same address space and the same globals?
> The stubs would need to use a counter in the per-thread data, wherever
that
> is. This problem may be solvable, but it adds complexity.
I thought about this. It works with either a per-thread variable or a
shared global variable, as long as there is a reasonably efficient means
for atomic increment. Compare and swap would do it, for example.
> 4. This won't work with keeper invocations, where there is no client code
> executed.
I don't think this is true. There would appear to be two cases here:
4a) indirect invocation via a red segment. In this case the protocol is
between caller and receiver, as before.
4b) invocation during page fault or exception by the kernel. In this
case the caller (the process) already trusts the callee (the fault
handler), so there is no need for a counter at all and zero can be used.
> > Frankly, a malicious server is
> > able to mess with the client in so many other ways that I don't really
> > see this as a compelling problem.
>
> 5. Consider the other side. Servers rely on the fact that invoking a
resume
> key is prompt. Your proposal introduces the possibility that a valid
> invocation of a resume key will find the client busy (because he was
> already invoked via a stale resume key). A malicious client can mess with
a
> server by remaining busy.
This may be a misunderstanding. The current resume capability behavior
is that if a client is busy the resume key is invalid (actually, in the
current system this simply cannot occur). I propose that if the server
invokes a resume capability via the returner, and the client is not in a
waiting state, then it behaves as a resume to the zero number capability
(which is what the resume key would have been otherwise).
So a malicious client cannot mess with a server this way.
There remains the possibility that a malicious server can cause the
client to get bad responses, but per our earlier discussion there was
already an exposure to malicious clients.
Basically, I am proposing that we step back from the current design for
a moment. We need the resume/start distinction to capture the difference
between top of event loop and waiting for LRPC response. It's not
actually clear to me that we need a call count **at all**, but there is
some marginal protection in having one and it doesn't cost that much.
Jonathan