multithreaded kernel
Jonathan S. Shapiro
shap@cobra.cis.upenn.edu
Sun, 2 Jul 1995 14:03:56 -0400
Charlie writes:
Threads are a good idea. I don't know the difference between a
kernel thread and a process that happens to be in priv mode. The
latter should require no extra support.
In the Tandem system disks are run by processes. We considered this
in KeyKOS but thought the performance might suffer.
Charlie: now that you've had a chance to look at how Tandem does
things, do you still feel that there needs to be a performance issue?
I had considered the performance issue, and initially I felt it was a
close call. Experience, however, tells me that if Charlie thinks
there's an issue I'ld do well to consider the matter much more
carefully. :-)
With that in mind, I'm going to try to lay out my thinking in greater
detail, in the hope that if I have missed something one of you can
point it out. I've tried to clearly identify my assumptions to draw
them to your attention. I apologize in advance for the length of this
email.
Before I go on, I want to thank Bryan Ford. Bryan pointed out that
the kernel threads could be dispatched by the generic kernel process
scheduler by doing some minimal tricks with the DIB and allowing the
DIB to contain supervisor-mode processes. Given this, no special-case
kernel scheduler is necessary. As it happens, my scheduler code is
already intelligent about not blasting the mapping pointer
indiscriminately, so there turned out to be a natural place to put the
necessary hooks. Following his lead, I have now abandoned the notion
that there is any special user proxy thread.
Preamble:
In the current EROS design (which may not be a good one), there is a
test just before returning to user code to determine if any kernel
daemons need to be run. In the EROS design, kernel daemons are
responsible for I/O initiation, page ageing, etc. etc. Every kernel
daemon has an associated procedure, which is called at this point.
The daemon is required to return without blocking. The I/O task, for
example, may start up a number of I/O operations, but doesn't wait for
them to complete - the disk interrupt handlers collect error info and
mark the I/O daemon as needing to be run again.
In effect, then, what I have is a thread scheduler, but the threads
have no stack or registers between invocations. In the current
implementation, they simply run on the kernel stack. Because they are
not preempted, there is no need in general to save/restore registers.
Proposal:
Replace these daemon functions with first class threads.
The down side to the current design is that the daemons have to start
up from scratch every time they are invoked. Typically, it takes a
couple of procedure calls for them to get to the point where they can
do anything useful. The state machines in the disk drivers are
getting hideously complicated, and the resulting source code does not
lend itself to optimization by the compiler because of its branching
structure. Capturing the daemon state in a register save area plus
stack gives a certain amount of cache locality for the state data that
may not be present in the old design.
[ASSUMPTION] Collectively, these factors mean that performing a full
register save/restore, even on a 32 register machine, will take
something very close to the same amount of time that the current
mechanism takes.
If I tune the context switcher not to save those registers that were
already saved on the stack due to compiler conventions, I'm pretty
sure things come out about even. If they are within a small number of
cycles of each other, the resulting code clarity in the drivers
probably has some value.
If the register save/restore is roughly a wash, then the other cost to
consider is the cost of switching address spaces. If I had to do
this, I would wholeheartedly agree that the kernel threads would be
too expensive.
[ASSUMPTION] In the proposed design, kernel threads will be
"parasitic," by which I mean that they will run in supervisor mode in
the kernel address space (anyone know a proper term for this?). This
eliminates the cost of the address space change.
On architectures that lend themselves to fast space switching, the
kernel space may be a dedicated space. On other architectures, the
kernel needs to be mapped into all user address spaces, and the kernel
threads can run in any of these. The kernel scheduler must now be
aware of a new kind of process in order to know that the address space
pointer of these parasitic tasks should be specially handled. The
crucial point is that switching between the kernel and kernel thread
does not (in general) require an address space change or a TLB/cache
flush.
I wouldn't propose to put things like serial interrupt handlers (or
even drivers) in seperate threads, but for things like the disk
driver, the migrator, the checkpointer, etc. etc. it makes some sense
to isolate them.
Initially, I thought that these daemons should run in supervisor mode.
Because of a problem with the way the x86 handles interrupts, Bryan
and I thought for a while that maybe they should run in user mode
after all, but I've come back to the view that they should (in
general) run in supervisor mode in a parasitic space.
The essential problem is that the kernel must now be structured to run
interrupt handlers on the stack of any kernel thread, and space for
the kernel thread stacks must be allocated accordingly. The solution
is to dedicate an interrupt handler stack and switch to it in the
interrupt handler, which makes as much sense as anything else I have
come up with. At the moment, the interrupt handler switches stacks
(courtesy of the hardware) when interrupting from user mode, but does
not do so when interrupting the kernel.
A middle ground, which can only be used on the i386 family but
simplifies the interrupt handler logic, would be to run these programs
at ring 2. Ring 2 is considered supervisor for purposes of page
protection, and therefore preserves the parasitic address space
property. I'm reluctant to do this, since I suspect that it adds
overhead and yields a design that is less conceptually portable (the
interrupt code isn't portable, but perhaps the basic architectural
model that it follows should be).
An advantage to the seperate interrupt stack is that there isn't a one
to one relationship between processes and interrupts. On the EISA and
PCI busses, interrupt lines can be shared, so it isn't as simple as
just scheduling the process that will field the interrupt.
A basic design assumption is that the interrupt handler must be
short, and that if it isn't it would be better designed as a kernel
thread.
Have I missed any issues in considering the costs involved? In
particular, is there some other factor in addition to address space
switch and register switch that needs to be factored in to things?
The major reason to consider this design is that Bryan planted a
design virus in my brain a few months back which has firmly taken
root. He suggested that it would be convenient, both for debugging
and for compatibility, to be able to use the BIOS to support those
devices that we didn't want to explicitly build drivers for. It makes
a lot of sense, but because the BIOS was never designed for
multitasking, it requires that the BiosDisk driver be implemented in a
process that the kernel can preempt and reschedule. Once the kernel
needs to support a second thread, it makes sense to ask if threads
shouldn't be adopted in general. Making the kernel multithreaded to
begin with lends itself to a multiprocessor implementation.
So... I'm still concerned about the performance issue, and I'ld
appreciate any insights you all may have!