Re: Is a VM really required? shapj@us.ibm.com
Sun, 15 Aug 1999 21:14:00 -0400

R.J. Writes:

> As I understand it, the virtual machine provides
> the "primitive services and a few basic object
> types" and the operating system puts them to use,
> providing its own services for the user and the
> application.

> 1. - am I correct in thinking that the design
> intentionally allows for more than one of these
> operating systems to run under the VM?

Yes, though in this case you would ordinarily expect to hand each operating system access to virtualized devices. The device would get terribly confused if two operating systems sent it conflicting commands, for example. Generally speaking, control of these devices is owned by a single operating system (the "core EROS" system) and from there is virtualized for use by other systems (e.g. the rehosted POSIX system).

> 2. - the kernel (system image?) combines the
> VM and the hardware drivers, while the operating
> system is a seperate domain?

The operating system may consist of a number of separate domains. In "core EROS", for example, there are about 10 or 15 domains (ignoring drivers) that perform the rough equivalent to the function of a traditional operating system kernel. There are a bunch more that (eventually will) deal with login, database implementations, password management, etc.

3. - or are the drivers running in seperate domains?

In the EROS case, the drivers are split into upper and lower halves. The lower half lives in the kernel and has direct access to the hardware. This is necessarily trusted. The upper half generally handles queueing, strategizing, and device handling policy. The upper half generally has exclusive access to the lower half. In some cases, the upper half may be prepared to have multiple clients, in others it isn't. We need to get some examples of this into the source tree.

> So, either
> 2.a. - an OS provides two new services: an io bank
> and an irq bank. The VM isn't the right place to add
> these services since the VM is 'abstraction neutral'?
>
> or, simpler,
>
>2.b. - a new 'domain' (in the sense of the OS and
> VM being seperate) provides these services.

Setting aside issues of lexicon, IRQs, DMA channels, and device registers are sufficiently universal that they may be considered "abstraction neutral."

That said, I explained in the previous note why splitting this stuff out doesn't add protection. It could greatly simplify dynamically loaded drivers, which is worth considering. Another option I have considered is having a central database that knows how to do the hardware interface for each card and contains a kernel-downloadable driver implemented in something like VCODE (Engler). A higher-level program can then be given a capability to the in-kernel driver without compromising things like physical memory.

However, in the current EROS and KeyKOS designs, all code that has access to the hardware resides in the kernel. This is as much to limit the complexity of eventual certification as for any other reason. Also, it limits the number of bits of code that need to be examined to understand how the hardware is used.

> - if I've read this properly, then the Page
> object is responsible for mapping disk pages
> to memory, and therefore it must be capable of
> communicating with the disk driver? Under 2.b
> I can't seem to [completely] figure out how
> the Page object gains access to the disk services...

Yep. That's a mess for sure. Actually, the solution for that one isn't so bad. The page driver is really the driver for the object cache. You simply give it access to the entire object cache (i.e. memory map the object cache), and then implement a capability interface under which the disk driver sits waiting for the kernel to hand it the next page-in or page-out request. In a previous note to the eros-arch list I hinted at a system structure that would work this way.

However, there is a nasty catch-22 hiding in the disk driver: how do you load the disk driver off the disk before you have the disk driver loaded off the disk? Answer: cheat and have the bootstrap loader prefetch a predefined range of stuff, but this just pushes additional complexity into the bootstrap loader, which is plenty complex enough already. Alternatively, you can cave in and keep the disk drivers in the kernel. The IRQ manager can then ask the kernel what IRQs are already allocated.

Finally, note that the division between disks and other devices isn't as clean as one might like it to be. People really do rearrange their SCSI chains without rebooting. Access to SCSI devices and SCSI controllers, for example, gets kind of interesting, and SCSI is a multi-master bus, which makes things *really* entertaining. The really cute part is that DMA channel allocations can sometimes change as a result of the addition of new devices -- think what happened when you stuck those cool new speakers on the USB wire! Similar problems abound in intentionally removable devices like PCMCIA. For example, you might be willing to give up the sound card temporarily to run the ethernet PC card.

> P.S. I've been reading quite a bit about EROS
> for a while now... though I have a serious
> disability in not completely understanding the
> use of notation in the proof of confinement
> (rather, the chapter leading up to it [in
> Shapiro's dissertation]). Is there any way
> I could find a [freely available] reference or
> something similar to help me along?

You are a brave and intrepid soul if you are trying to slog through those chapters.

Regrettably there is no standard notation for such things. On the one hand, mathemeticians and formal methods folks tend to define notation according to the problem at hand. On the other, there just haven't been enough successful proofs in this space for a "preferred" notation to have developed in the OS community. The proof chapters borrow heavily from information flow proofs in the programming languages world, but even there there isn't much of a universal notation.

> Well, hopefully I've been clear enough about
> this first lot of things bothering me... and
> hopefully there's a reply or two, none too
> head-bashing...

Heck, everything *else* about this system is head bashing (or at least head scratching). Why should *this* reply be different?

     "On all other nights we bash our heads sitting up.  Why on this night
     are we bashing our heads on EROS?"

          -- Excerpted from the EROS Haggadah. :-)


Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595