Message Passing and Segment Faults

Jonathan Shapiro shap@viper.cis.upenn.edu
Thu, 8 Dec 94 14:41:03 -0500


You may have noticed that I've been relatively quiet for the past two
days.  Then again maybe not.  I've been wrestling with a question, and
I've been unable to come to a good answer, and I'ld appreciate
reactions.  The following concerns DIMSUM.

In DIMSUM, segments are coherent.  Two holders of a segment key on
different machines are guaranteed to see a consistent view of the
data.   The machines may have different architectures, and
consequently different page sizes.

On low-bandwidth nets, it is important to avoid sending more data than
necessary.  On high-bandwidth nets, the data transfer is less
critical, but sending more data induces more core pressure.  For this
reason, the DIMSUM coherency architecture is built on something I call
core data frames.  A core data frame is the unit of transfer for
user-data related coherency.  All page frame sizes across the system
are an integral multiple of the core frame size.  A page frame is not
considered to be mappable on a machine unless all of its associated
core frames are on the same machine.

In a configuration involving 4K and 16K page sizes, the core data
frame would be 4K. If your 16K page machine and my 4K page machine
share a segment, I can fault a 4K chunk out of the middle of your 16k
page, and you will fault it back if you need to access some other 4k
chunk.  To get a sense of how this works, look at the subpage locking
technology on the original RS6000 MMU.

One of the places where KeyKOS gets a big win is it's ability to
internally associate and disassociate a page frame from a segment.
When combined with the kernel's view of memory as cache, this allows
drivers to do last-minute core frame allocation, which has a
potentially substantial impact on core pressure.  I've been trying to
figure out how to do the same trick in DIMSUM, and I'm stumped.

Here's the KeyKOS behavior:

Imagine that a segment keeper is doing coherency between two images of
a segment. When one segment needs a page, it lifts the page key from
the other and plugs it in.  The actual content of the page does not
need to be transferred.

The analogous DIMSUM problem appears in a more critical path:

Compare this with an ordinary DIMSUM page fault.  The kernel requests
data from the keeper, which turns to the drive.  The drive may be
remote, in which case it cannot readily do late frame allocation,
because core data frames may not correspond to page frames.  Further,
the segment, the keeper, and the disk driver may all have different
page frame sizes.

If core data frames are reified as nameable objects, then they can be
transferred between machines just fine.  The difficulty is that
ultimately they must be assembled into a page frame, which imposes
constraints of contiguity and placement in physical memory.

At it's heart, the only party who is in a position to define the
placement constraints on the allocated data frames is the party who
encountered the fault.  The frames cannot be named until they are
allocated, so there doesn't appear to be a good way to convey the
layout constraints to a remote disk driver.

I'm trying to come to a design that minimizes copies.  In the remote
case, the copies seem more or less inevitable because of data
alignment and placement constraints, but in the local case one would
sure like to avoid them.

Anybody got any good ideas?  The next best alternative is to make
everybody allocate frames on the assumption of the largest page size.


Jonathan