EROS protection domains
Jonathan S. Shapiro
Tue, 9 Sep 1997 10:53:32 -0400
[Bill Arbaugh asked whether EROS used ring protection, and whether
root runs as ring 0. This is my reply, sent to the list in case
anyone else finds it useful.]
I'm unclear on one part of your question, but I think I can answer the
rest clearly enough. Apologies if this is more detail than you wanted.
Conceptually, EROS uses only ring 0 and ring 3. Nested rings is
actually a bug from the standpoint of security, because any service
used by an inner ring has to be brought into the ring that uses it,
even if the service per se is protection neutral. Intel's
"conforming" segments don't really help.
As a convenience of implementation, the EROS kernel implements some
kernel processes that run in ring 1. These originally ran in ring 0,
but the exception frame for a ring 0 exception is quite different from
a ring 1 process, and it simplified matters for all process interrupts
(i.e. both in-kernel and user) to generate the same stack frame. I
have set the processor up to grant things in ring 1 essentially all of
the privileges of ring 0. [This nit is why I said "conceptually"
Most of the kernel runs in ring 0, including drivers. I concluded
that exporting drivers gave essentially no marginal security for the
+ The ability to program physical DMA means you can do
anything; all DMA on this machine is physical.
+ Due to a variety of chip flaws, there are a disturbing
number of places where drivers must disable interrupts while
transferring data (e.g. during programmed I/O). This
implies that a driver has the authority to halt the system.
>From a reliability standpoint, I might have done better to export the
drivers into separate address spaces. The complicating factor is that
everything gets to touch the object cache, which is most of main
Kernel size numbers:
47,356 Kernel core function + trap handlers (.cxx + .hxx)
2,384 Startup, low trap handling, misc HW setup (.S)
6,567 Drivers for IDE, net cards, keyboard (.cxx + .hxx)
There is additional code for the kernel debugger. The above numbers
are bloated by being C++ (subtract 25%-30% to get C equivalent line
counts) and by a large per-file copyright (some files are small). The
assembler number will go up by a couple of hundred lines when I
re-implement the fast IPC code.
The total kernel size when compiled without the kernel debugger:
text data bss dec hex filename
131479 31169 54864 217512 351a8 eros.small
FYI, I consider this to be utterly bloated. About 30k of it is recent
code explosion due to a bad transaction handling macro (which will be
fixed shortly). The high flying modules are:
2292 0 0 2292 8f4 mk_DomainTool.o
2331 0 0 2331 91b UserContextInvoke.o
2535 64 8192 10791 2a27 interrupt.o
2548 30 0 2578 a12 pk_NodeKey.o
2735 649 44 3428 d64 kern_ObjectCache.o
2903 279 4 3186 c72 kern_Thread.o
3855 1128 0 4983 1377 ide_ide.o
4383 1032 2304 7719 1e27 kern_Persist.o
4396 165 0 4561 11d1 pk_DomainKey.o
4628 635 0 5263 148f kern_Invoke.o
5519 1272 0 6791 1a87 ide_drive.o
6065 1054 0 7119 1bcf net_3c509.o
6977 1892 0 8869 22a5 net_3c59x.o
6996 1069 0 8065 1f81 UserContext.o
13711 3436 44 17191 4327 kern_Checkpoint.o
The 'ide_' and 'net_' files are drivers. The rest are mostly obvious.
You asked if 'root' is ring 0. I'm not sure I understand this
question. If by "root" you mean the UNIX administrative user id, then
the answer is no. The EROS kernel has no notion of user ID nor any
notion of a specially privileged user. Administrative authority is
conferred by conferring appropriate capabilities to appropriate
applications, and then letting the administrators use those
The UNIX emulator will provide a notion of a root user, but this user
has no special privileges outside of the unix ``box.''
Did I understand your question correctly?
Two comments on root/ring0:
1. Just because the administrator over you doesn't mean that they
should be able to compromise kernel security overall.
2. There may be distinct administrators for distinct environments.
I.e. one should not assume a single root user.
Also, note that EROS takes capabilities all the way to the disk, which
eliminates a problem of bootstrap. Mach, for example, is forced to
reconstruct descriptors on startup, which totally screws their
Please feel free to send any other questions you may have.