NewSys working sets

Jonathan Shapiro shap@viper.cis.upenn.edu
Tue, 22 Nov 94 14:43:05 -0500


Since NewSys does not directly manage backing store (and hopefully
doesn't have swap management in the kernel), core page frame
allocation is an important issue.  The goals are conflicting:

    1. We want to ensure that allocating a core page frame is a very
       lightweight operation when one is available, and that
       availability of a frame is likely when needed.

       =>  The kernel must implement some primitive aging policy
           (probably LRU) to ensure that pages are likely to be
           available.

    2. We want to ensure that an uncooperative backing store manager
       cannot deadlock the system by refusing to remove dirty pages.

       =>  Page frame resources must be subject to some form of
           accounting. 

    3. We want to be able to make opportunistic use of idle page
       frames.

       =>  The resource allocation contract must be carefully stated
           to enable the kernel to make opportunistic use of page
	   frames.

    4. We want client applications that manage more than one working
       set to be able to dynamically alter allocations in order to
       perform load balancing strategies.

       =>  The containment relationship between page frames and
           working sets be carefully stated for efficiency and
	   flexibility.

       =>  Because we must run without paging, working set containment
           should be implemented with minimal book keeping overhead.

    5. We need to be able to pin pages in core, to ensure that certain
       key managers are not paged out (e.g. the boot process better
       not get paged out until it gets some kind of system started).

       =>  We must have the ability to view page frames as hard
           allocatable resources.

Did I miss any obvious implications in there?

Initially, I thought to follow the KeyKOS model of a page range key.
For the benefit of the non-KeyKOS crowd, a range key names a
contiguous range of pages.  I abandoned this because it cannot readily
satisfy objective (4) -- the transfer of a page frame from one working
set to another is challenging.

I then thought to borrow the space bank model, but an implementation
that does not rely on page contiguity has a LOT of overhead.  Carrying
around a bitmap for the whole system for every working set seems
excessive.

I then thought to borrow from the meter tree model using a tree of
frame allocation authority constraints.  This is very attractive, as
it allows nested working set size constraints, but what happens when a
page is freed isn't exactly obvious, and the n^2 tree walk to allocate
a page frame is disappointing.  In addition, it seemed to lead to
problems when pages were pinned (though I think I see a way out of
that by modifying the allocation rules).

I'm still debating about this, because there are some good ideas in
there.

Here is where I am at the moment.  I'm eager for your comments.


A Working Set Object is an authority to allocate page frames.  A
process can ONLY allocate page frames if it holds a capability to a
working set.  A working set object contains the following information:

	A start key for the working set manager

	A current allocation count, indicating the number of pages
	   that are currently allocated to this working set.

	A high water mark.  A page can be allocated from a working set
	   manager iff the current allocation count is below the high
	   water mark.

	A low water mark.  If the working set's current allocation
	   count is less than the low water mark, it is guaranteed
	   that page allocations from this working set will be prompt
	   (i.e. will not induce a context switch).

	A pin count.  A page can be pinned under a working set if the
	   pin count is less than the low water mark.

The kernel runs an ageing mechanism (generational LRU) on all pages,
and is free to reallocate stale pages unless they are pinned or dirty.
Under conditions of crisis, the kernel can steal a clean, unpinned
page from any working set object whose current allocation exceeds its
low water mark.

The sum of all low water marks in all working sets must be strictly
less than the number of page frames in the system.

A pinned page will be I/O hazarded ONLY in response to a user
initiated request, which implies that the kernel may not initiate
pageout on these page frames without first copying the page to a
hazardable frame.  Since in any case this won't free the frame, the
kernel probably shouldn't initiate pageout on pinned frames.

The kernel must ensure that there are always enough clean, unpinned
page frames to be able to promptly satisfy an allocation request when
the current allocation count is below the low water mark.  This
implies that the kernel must globally track the underallocation of low
water marks, and ensure that there are always enough clean pages to
satisfy the requests.

The low water mark of a working set cannot under any conditions be
reduced below the pin count.

The holder of two working set capabilities may transfer low water
authority from one working set to another, subject to the constraint
of the pin count rule.  The kernel implements a root working set
object, which is the source of all low water mark authorities.

The high water mark can be set to any value that is greater than or
equal to the low water mark.


This all works, but one would like to be able to subject the high
water mark to tree structured constraint.  The rules can be revised as
follows:

	Allow page allocation if the current allocation count is below
	the larger of (low water mark, high water mark).

	Allow high water mark to be below low water mark.

	Let high water mark be hierarchically constrained by keeping
	working sets in a tree and tracking allocation counts
	accordingly.

I think this is probably more complicated than it is worth, but I
still find the tree constraint seductive.


Reactions?



Jonathan