Duplexing and OpenBSD drivers
Jonathan S. Shapiro
shap@eros.cis.upenn.edu
Fri, 27 Feb 1998 20:37:12 -0500
I'm in the process of trying to integrate the OpenBSD driver structure
into EROS as part of the TCP/IP protocol port. I decided that the
easiest way to accomplish the port was to integrate the device logic
into the EROS device structure.
My original reason for *not* doing this had to do with duplexed disk
I/O; I'ld actually hoped to integrate the net drivers.
The problem with duplexed I/O's lies in journaled writes and duplexed
reads. Standard writes proceed in parallel and can be treated as
independent I/O's without difficulty.
jwrites: journaled writes of the same page to different ranges must
not proceed in parallel.
reads: you ideally want the first disk that can accomplish the read
to do it, but this requires a mutex lock across drivers that
precludes issuing I/O programs to the controllers from within
the interrupt logic. The OpenBSD drivers want to do this for
performance reasons.
Here are my tentative solutions for each. Reactions are greatly
appreciated.
JOURNALED WRITES:
First, prohibit range mount/dismount while any write is queued. This
seems like a good idea on general principles.
Second, encode in each write request the range table index of the
range to which it is currently being written. When the I/O completes,
the completion logic looks at the request type (jwrite) and attempts
to recycle the request to the next appropriate range. Since range
mounts and dismounts will not occur in the middle of this, there is no
worry about race conditions.
DUPLEXED READS:
For duplexed ranges, keep an LRU bit. In addition, for each range,
keep track of the last N cylinders on which I/O operations have been
queued. When queueing a read request, do as follows:
1. If the cylinder +/- N appears in some range's recently
used list, queue the request there.
2. Otherwise, queue the request on the LRU range and advance
the LRU token to the next range in the (circular) duplex
list.
3. If a read fails, make note of this in a failure table, and
do not requeue the read to the same range duplex the next
time around.
This should have the effect of decently batching read requests
w.r.t. track while simultaneously spreading them across the ranges.
It's a whole lot simpler than my current design, and eliminates the
need for an inter-device mutex interlock at I/O initiation.
Reactions?
shap