Heterogeneity
Bryan Ford
baford@schirf.cs.utah.edu
Mon, 05 Dec 94 22:55:58 MST
>I'm stumped about something. It's related to the ongoing memory
>object discussion. Actually, I've come to the conclusion that calling
>it a memory object was a really unfortunate choice of name. It
>implied that the only way to access such objects was by mapping them,
>which was never my intention. I'm going to try to call them data
>objects from now on. We can then distinguish between the semantics of
>data objects and the semantics of mappings of data objects.
Actually, I don't think the "memory" in "memory object" necessarily
means "mapped memory" - "memory" is really just a generic term for
storage. The fact that Mach memory objects have traditionally implied
mapping as opposed to other methods of access doesn't mean that the
term couldn't easily be generalized a little.
In fact, open file objects (file descriptors) in the Hurd are really
memory objects in exactly the sense that you're proposing: they support
both a mapped-memory external-pager interface and a conventional
Unix-style read/write interface.
>In any case, here is today's trivia challenge question:
>
>[...problems with different page sizes on different machines...]
I think you're making one big assumption (and a fairly common one) about
external paging interfaces that is greatly increasing the multiple page
size problem. Relaxing that assumption, I suspect, would make that problem
unimportant for typical uses of external paging interfaces, although it
wouldn't make the problem go away completely.
Basically, traditional external paging interfaces, including Mach's, take
the form of an asymmetrical "API" that the kernel "provides" to user code.
The external pager is the "master" and the kernel is the "slave". As such,
the "slave" is designed to be as low-level and controllable as possible.
As a byproduct, the "master" (external pager) needs to know a lot about the
"slave" (kernel), including the slave's page size, in order to make
anything work. In a heterogeneous system, the page size may vary from
slave to slave, which is the problem you're wrestling with.
But all the real external pager applications I've actually seen in
practice, including DSM, don't need the "ultimate controllability" that the
asymmetrical API-style pager interface provides. All they really need is a
two-way data coherency interface, which is conceptually symmetrical. In
other words, instead of the pager side saying, "map this page" or "unmap
this page" and the kernel side saying, "a fault happened here" or "evict
this page", they should both use a single, symmetrical interface with two
basic operations: "push" (take THAT!) and "pull" (gimmee gimmee). Hence I
call it a push/pull interface; we'll be trying out such an interface in
the new Mach VM system.
The pull operation is the most fundamental, and probably the common case.
It's a generic request for data or access permissions. (I actually have
the interface spec'd out quite a bit, but I won't burden the list with all
the gory details at this point.) The kernel/VMM makes a pull call to the
external pager when a page fault occurs, and the pager makes a pull call to
the kernel to retrieve modifications or invalidate parts of the kernel's
cache. The caller (puller) indicates the region for which the requested
data or access is required, typically corresponding to the caller's page
size, and the callee (pullee) is free to return data/access for a larger
region than requested, with a few restrictions. The upshot is that, if
the two parties have different minimum granularities, they automatically
wind up using the maximum of the two for maintaining coherency between
them. If one of the parties, say a DSM manager, communicates
simultaneously with many different VMMs with different page sizes,
the interaction among the different page sizes will be basically automatic:
a side effect of the sizes of blocks requested by the different VMMs when
they field page faults.
This type of pager interface also makes supporting multiple page sizes
_simultaneously_ at one or both ends a much saner proposition, which as
I've mentioned is one of the things we're trying to support in the new VM
system. Finally, the symmetry of the interface should make stacking pagers
on top of each other much easier, by essentially halving the number of
interfaces that must be dealt with.
As I said before, this type of interface doesn't actually solve the
multiple page size problem, but for practical uses of external pager
interfaces it should make the problem ignorable: as long as each side obeys
the basic push/pull coherency interface, neither side really needs to know
the minimum granularity of the other side. The basic push/pull interface
could be extended with more restrictive operations to provide the "ultimate
controllability" of the master/slave API when that controllability is
really desired; but the extra complexity of that interface doesn't need to
be imposed on typical external pagers like device drivers or DSM managers.
So, does this address the problem you're thinking of, or am I way out in
left field? :-)
Bryan