Re: Question from Bakin shapj@us.ibm.com
Tue, 6 Jul 1999 10:05:47 -0400

>So is it the case that on reboot the device drivers - presumably part of
>the kernel - reinitialize all external hardware to some known state? And
>then processes which talk to hardware, on discovering that their
>capability is now invalid, know that they need to recover which may
>including setting the hardware to other states? I guess part of the
>question may be are "device drivers" entirely kernel mode, or do they tend
>to be written as a kernel mode part and a process part, or what? In
>practice what kinds of device state are kept in the kernel and which in
>processes (checkpointed, and then verified?)

David:

Sorry for the delay -- I was out of town briefly.

You have it very close to correct. Actually, in normal circumstances, devices are reset as a consequence of resetting the bus that they sit on. The per-board BIOS ROMS are then executed to bring the device to a known initial state. As a rule, drivers assume that the BIOS PROM author was a bozo, and redo much of this work (better paranoid than hung, I suppose).

On some systems it is possible for a reset of this type to be reliably issued from software. On such systems, a warm boot can be executed without powering the machine down.

In EROS, it is the practice that at least the so-called "bottom half" of the driver generally lives in the kernel. This is because the bottom half needs access to I/O registers and DMA channels. It therefore effectively has access to all of physical memory, and must be trusted. While we could sensibly declare selected processes to be trusted and give the I/O register privileges, it is difficult for such processes to do virtual to physical address translation, and consequently it is difficult for them to set up DMA correctly on a physical DMA channel. A possibly feasible design is to give them read-only access to their own address space mapping table, but note that memory involved in DMA needs to be pinned, which suggests that a low-level DMA interface capability might be appropriate in any case to ensure that the single-level store cooperates properly. Alternatively, we might put in place a mechanism for working sets (already on the table) and rely on the driver to use it correctly.

Back to your question:

On reboot, devices are reinitialized to a known state that is probably not consistent with the "upper half" of the driver. To allow the upper half to recognize that this has occurred, the capability to the device is rendered invalid. The upper half, on it's next attempt to use this capability, discovers this problem, and it is the responsibility of the upper half to obtain a new capability to the lower half of the driver and do something rational with it.

How much state is kept in the lower half depends a lot on the driver. For IDE and SCSI controllers, there is often state that can be written to the card but not read, and this state often must be tracked by the driver. For network interfaces, there is often a need to perform buffering. ALL of this state is LOST on restart. As to the rest, it depends entirely on how the particular driver has been designed. The original model was to have as little in the lower half driver as possible consistent with the needs of performance. On the 370 series, this was greatly simplified by the fact that all channel processors speak a uniform language, and that as a consequence only a limited number of lower half drivers are required: the console, the clock, and the channel lower half. On the x86, this is regrettably not the case, and it became clear a long time ago that in due course we will need a decent logic for autoconfiguration and download into the kernel of "trusted bottom half" logic.

Is any of this helpful?

Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595