Re: "Harprit Kaur Barha": Info on Eros shapj@us.ibm.com
Mon, 26 Apr 1999 09:52:27 -0400

> I'm interested in knowing more about Eros, but when visiting
> the site i noticed that some of the vital information is unavailable
> since the pages are still under progress. In particular, i am
> interested in learning more about the features rearding the handling
> of deadlocks and distribution, which appear on the page

Harprit:

The page you chanced to look at is one of the very earliest pages on the EROS web. After a lot of work, I now know how to make EROS distribute wihtin a cluster, but the current system does not do so. It is not clear that distributing a single level store beyond a cluster makes sense -- there are too many places where the resulting rollbacks would present practical difficulties. The rule of thumb seems to be that if you can afford to treat the collection of machines in the distributed system as a single point of failure then distribution will work fine, otherwise you shouldn't distribute.

As to deadlocks, there is nothing to handle because EROS is deadlock-free. This is accomplished very simply:

  1. All operations performed by the kernel are atomic. They either execute completely or not at all. If, for example, a process "blocks" for a page fault while performing a kernel operation, it gets bounced all the way out of the kernel. It sleeps while the page is coming in, and then retries the operation from the beginning. Because of this design, no resources are held in the kernel by sleeping processes, and there are therefore no cross-process locking dependencies. This removes one common source of deadlock.
  2. EROS does not allocate any memory after initialization. It is therefore not possible for an action in the kernel to become blocked for lack of memory, and memory starvation is therefore not a possible source of deadlock.
  3. The EROS kernel contains no unbounded loops or recursions. While there are algorithms in the kernel that are expressed recursively, *all* of these check the recursion depth and quit at some known depth. All of these algorithms could, with sufficient inlining, be expressed as for(;;) loops with some upper bound on the number of iterations. Because of this all kernel operations have bounded duration and bounded stack depth. Bounded duration is important, because it allows us to use a design in which only one process is conceptually in the kernel at a time. If only one process can be in the kernel, many issues of deadlock are avoided. Bounded stack is important to make sure that the kernel stack does not overflow.

All that having been said, much of this would break down in a multiprocessor implementation. The kernel includes calls in all of the necessary places to pin objects in memory, but the pinning architecture would need to be modified for multiprocessors, and it is possible with low likelihood that two processors executing in the kernel simultaneously on behalf of two processes would arrive at deadlock situations. A couple of observations about this and how to solve it in practice:

  1. First, the fact that the pins() exist solves the main problem -- which is getting all the right lock() operations to happen in all the right places. The current calls to pin() would need to be replaced by locking calls, but this is not hard.
  2. There exists a design for a locking discipline that does not require unlocks. It is sketched in my dissertation. This has *dramatic* performance consequences.
  3. For systems with only two to four processes, a single grand universal kernel lock would probably suffice, and would certainly be more efficient than introducing locking overhead pervasively. I'm actually dubious about this, but it's the obvious first thing to try and would "solve" the problem for commodity multiprocessors. Keep in mind that the operations performed by the EROS kernel are very fine grain relative to POSIX system calls. This solution therefore *should* work rather better than it did under, say, Linux.
  4. Once you have to do the full multiprocessor design I think that the right solution to deadlock will prove to be optimistic locking with scheduled fallback. Basically, you do shared read locks and exclusive write locks. If a process tries to lock something for writing that was previously locked for writing (or reading) by someone else, it backs all the way out of the kernel and blocks on the party who locked that resource. To prevent starvation, the "retry" in this case grabs the grand unified kernel lock, thus assuring uncontended access to resources and preventing starvation. Here again, it's a painfully simple design that ought to go a long way toward solving the problem.

Anyway, I hope that some of these notes are useful to you. Please feel free to follow up with me.

Jonathan S. Shapiro
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595