Comment on EROS/KeyKOS I/O

Jonathan S. Shapiro shap@eros.cis.upenn.edu
Fri, 23 May 1997 18:30:07 -0400


A long time ago, Norm told me that objects in the core table had a bit
indicating that they were "involved in I/O".  Since a write never
implies damage to the object, this presumably must have captured the
fact that there was a read going *in* to the object in question.

I subsequently ignored this comment, on the theory that if the read
was still pending I would block while attempting to allocate the
associated duplexed I/O structure.  To avoid grabbing an object while
I/O was still in progress to it I simply did all inbound I/O into a
raw page frame in core, and when I/O completed I then re-tagged the
frame according to it's proper type (page, node pot, etc.).  This
differed from the KeyKOS approach, but had much the same end result.
Once the object was properly listed in core the I/O was guaranteed to
be complete.

This worked fine until about 5 minutes ago.

I have just encountered an interesting race condition that did not
arise in KeyKOS.  Because EROS retains knowledge of the locations of
objects in the checkpoint area as late as possible, it is possible for
the following sequence of events to happen:

	program A fetches object X from checkpoint area, from a
	   generation which has migrated and whose space is therefore
	   subject to reclamation.
        program B allocates new object in log.  This causes log
           reclamation, which happens to reclaim object X, which
	   happens to free a log frame (e.g. it's a page or last node
	   in that log pot)
	we desire to simply zero the in-core image of this log frame,
	   because it is now known to be idle, but we must take care
	   to cancel any pending I/O.
	because the object involved in I/O is not listed in it's
	   proper form, we cannot readily find it without
	   investigating the low level disk I/O request structures.

Once the I/O is in progress, the right thing to do is let it proceed.
Because the frame is not listed under it's proper identity in the core
table map, it's necessary under my design to grot through the I/O
request table with interrupts disabled, only to discover that there
was an active I/O after all, in which case one must cautiously tiptoe
backwards.

The resulting violation of encapsulation is more than a bit ugly.
Most importantly, it requires interrupts to be suspended too long for
comfort.

I am about to revert to the previous design, which is simpler and
avoids this problem by making the I/O-involved frame visible in the
right way from the instant the I/O has actually committed.  It will
also (someday) simplify multiprocessor arrangements.


shap