duplexed I/O

Jonathan S. Shapiro jsshapiro@earthlink.net
Fri, 18 Sep 1998 10:03:39 -0400


I'ld appreciate comments and reactions to the following.

The current EROS duplexed I/O subsystem is using some unusual data
structures that make it impossible to use existing UNIX drivers.

The specific problem has to do with read handling.  The current EROS
logic is to stick a read request on all drive queues that might
conceivably satisfy that read and take some extra measures to ensure
that two drives never attempt to read the same block simultaneously.
The implied logic changes prevent EROS from directly using existing
UNIX drivers.

[UNIX systems that do duplexing, by the way, rely on the presence of a 
per-process kernel stack to do so]

Most of the other differences in the driver logic between EROS and
UNIX are minor, and are readily fixed with minor alterations to the
driver code.  I'm therefore wondering if there might be a
straightforward strategy for spreading reads that would let me simply
queue one read in the first place.

The I see it are as follows:

1. Naively, we want to spread reads as much as possible.

2. We wish to restrict the spread so that multiple reads in the same
   track can be handled on the same head.

   [track rather than cylinder because head to head delay is about the 
    same as track to track on modern drives due to lack of vertical
    registration of the platter.]

3. On modern drives, the sector geometry changes according to
   cylinder, the drive won't tell you, and it's probably not worth the
   bother of determining it dynamically at the moment.


Given the above constraints I can see several possible strategies:

1. Location-division multiplexing:

   Divide disk ranges into bands whose size approximates a few
   cylinders and place the read on the n'th drive associated with the
   range, where N is computed as

      ((pageno relative to range) / (pages per band)) % ndrives

   Pro: damn simple, probably will help
   Con: subject to contention similar to cache conflicts

2. Time-windowed location-division multiplexing.

   Divide disk ranges into bands whose size approximates a few
   cylinders.  For each range, keep track of the last drive in the
   duplex set to get used.  

   Keep a cache of recently accessed (range, band, drive) tuples.
   When a new I/O fits within an existing recently used band, use that
   cache entry.  Otherwise, allocate a new cache entry whose drive is
   (lastdrive + 1) % (nDrives in range)

   Place the read on the drive specified by the cache line.

   Under random load this will degrade to round-robin placement.  If
   the load really is random this is actually the right thing to do.

   Pro: doesn't have the same cache contention problems
   Con: modestly more complex

There are variants on the windowing metric, but these are the two
basic mechanisms that leap to mind.  Neither seems especially hard to
do.

Anyone have other suggestions?

Anyone have opinions on which is preferable?



shap