[Rick: since this is of interest to others, I'm forwarding this specific response to cap-talk as well as to you. ]
>> Thus, the EROS kernel does not need a per-process stack. On a uniprocessor,
>> there is a single kernel stack that is used to hold the state of the current
>It sounds like process state is some other structure, like a tree or a
The *only* per-process state known to the kernel is whether a process is currently running, or if not, what in-kernel queue it is sleeping on. While the kernel does cache process state internally for ease of reference, the definitive version of the process state is the state that resides in nodes.
>> If an (EROS) process suspends within the kernel, every mutation it has caused
>> is undone and the process is punted all the way out. On resumption, the
>> resumes at user level by re-invoking the trap instruction.
>What's the cost of "every mutation it has caused is undone" ?
First, I slightly misstated.
All kernel operations require a bounded number of objects to be in memory to complete. On every kernel-implemented operation, the operation proceeds in two phases. First, any entities (pages, nodes, processes) needed are forced into memory and "pinned" for the duration of the operation. If all needed objects are in memory and successfully pinned, the operation is then "committed" and runs to completion. Thus, if some necessary component is missing there haven't actually been any mutations to undo.
While at OSDI in New Orleans, Jay Lepreau and I had a discussion about the overhead of re-entering the kernel versus keeping an in-kernel stack. The short version is that if you're a little careful about it it's actually better to blow away the in-kernel stack. The problem is locks. The main reason to keep an in-kernel stack is so that an in-kernel process can hang onto locks across suspensions. The catch is that some other process may need to steal these locks (if it cannot, you can deadlock), at which point you end up needing to re-acquire the locks when the process resumes. All of the code to manage and perform this acquisition and theft and to deal with the consequences of lost locks costs a lot. If the operations are small and indivisible, you're better off just redoing the whole thing.
One exception to this statement is re-crossing the user/supervisor boundary. There are some tricks to avoid doing this.
There is an ugly exception to the no-undo rule buried in the I/O system, where initiating an I/O requires several data structures to be successfully allocated because of the duplexing design (one for the overall request, one for each disk queue). It proves that they all happen in rapid sequence, and undoing them appropriately isn't really so bad. Actually, this is more of an issue for duplexed reads than for duplexed writes, and we have a paper design to eliminate it.
Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595