Small Spaces for the x86 Family

This note describes how we have adapted the ideas from Liedtke's small address space paper [1] for EROS.

1. The Small Spaces Idea

The basic idea behind small spaces relies on the availability of the x86 segmentation mechanism. Just as the EROS kernel is mapped in common in all address spaces, we can map a number of user regions into all address spaces. Access to these regions can be controlled with the segment registers. A process can first be allocated to one of these regions, and can be moved to a traditional address space if it is discovered not to fit in the so-called ``small space.''

Norm Hardy and I first did a design for small address spaces for EROS in 1991. This fell out of a larger conversation in which we were trying to figure out how to let an EROS process believe that it owned the full 4Gbyte virtual address space. Our technique was to have a region that the kernel lived in, but bounce this region around on demand when the process wanted those virtual addresses. In the end, I concluded that the complexity of the kernel bounce logic outweighed its benefit, and decided to settle for a 3 GByte address space for EROS processes on the x86. This, at least, is 1 GByte more than any of the major operating systems, and leaves enough spare room for kernel emulation within the target process address space if desired.

A side effect was that we realized that small spaces were possible. We did not at that time appreciate their performance impact, or we would have implemented them at the time. We did not publish this design. Since I had many other things to implement at the time, I set it aside intending to go back to it later. Jochen Liedtke arrived at the idea independently, and in 1995 put out a paper describing an implementation of it for L4. His performance numbers were very impressive, and I resolved to implement them in EROS as well.

The existence of small spaces can be discovered by means of the load segment limit operation, but we consider this a reasonable tradeoff. If it is helpful, the operating system can implement several sizes of small space (e.g. a large number of 128 Kbyte spaces and a smaller number of 1 Mbyte spaces). The current EROS implementation includes only a single size: 128 Kbyte.

There are several advantages to small spaces:

  • Many programs, most notably keepers, will fit within them.

  • A context switch into a small space or back to the most recently active ordinary space does not need to flush the TLB. This alone reduces the context switch cost by nearly 140 cycles in practice.

  • Cross-space copy is simplified by eliminating the need to walk the recipient address tables by hand.

  • Since EROS is intended to promote an architectural discipline of many small objects, a great many EROS domains fall within the small space size limit.

  • (Corollary) Small spaces substantially reduce the demand for page table page frames.

A minor disadvantage is that a process must (in our current implementation) rebuild it's mapping tables when it exceeds the small space size limits.

2. Planned Implementation

Much to my surprise, the implementation of small spaces looks to be very simple. The address map for the kernel was rearranged, and I corrected a number of places where assumptions about the previous virtual address had managed to creep into the code.

2.1. Changes to Address Mappings

In the EROS implementation of small spaces, the address space is divided into three regions:

+--------------+--------------+----------------+
| Large Space  | Small Spaces | Kernel Virtual |
|   (unique)   |   (shared)   |    (shared)    |
+--------------+--------------+----------------+
Small spaces and kernel space are shared across all virtual maps, enabling rapid transition to/from kernel and to/from small spaces.

2.1. Changes to the Process Structure

I added three new fields to the Process structure:

struct Process {
   ...
   uva_t   limit;      /* uppermost virtual address */
   uva_t   bias;       /* true VA = process VA + bias */
   PTE     *smallPTE;  /* true VA = process VA + bias */
};
While the limit information can be obtained by inquiring of the segment register itself, it is faster to load it from the Process structure. The bias is process specific, and must be applied to sent and received string addresses during string transfer in order for the kernel to reference the memory map in the correct place.

Previous to this change, the lower bound of string addresses and argument blocks was not checked by the fast-path IPC code. Since the uppermost pages of kernel virtual space were unmapped, arithmetic rollover guaranteed that an invalid lower bound would result in a page fault. Rather than place unmapped buffer regions between small spaces I elected to pack them densely and start checking the lower bounds.

Every EROS Process might potentially be a small (128 Kbyte) space. Since a 128 Kbyte space takes only 32 page tables to map, I decided to simply dedicate a special collection of page tables. These page tables are mapped into every process and also into the kernel-special address space. Every Process carries with it 32 dedicated PTE entries.

When the Process structure is first loaded, the smallPTE field is initialized to point to the first entry. If this field is nonzero, the associated process resides in a small space, and no page table root pointer is loaded on return to the process. The net effect is that the small space runs parasitically in all address spaces, which is what we wanted.

2.2. Changes to Context Switch

The new context switch logic must update the global descriptor table with the correct base and bound for the small space before switching into it. This amounts to rewriting four words, all of which prove to be in the same cache line.

2.3. Small Process Page Faults

When a process performs a page fault, the page fault code must now check to see if the smallPTE field is nonzero. If so, the faulting process is a small process, and should not attempt to share page tables. A different path is taken for address resolution that sidesteps the efficiencies of the main path, because in most cases a small process will have only a single Node or Page as its entire address space.

If the fault address exceeds the bound for this process, then one of three things must be true:

  • The reference is into kernel space, and is therefore illegal.

  • The address is bad, and a keeper must be invoked.

  • The address is valid, and the process must be transparently converted into a large space process.

We defer changing the process into a large address space until we discover a valid address outside the small space range. There is no sense converting a process that will shortly go away in any case.

2.2. Changes to IPC

The only other piece of code that needs to be touched is the fast and slow IPC paths. The slow IPC path needs to be updated to reflect the new mapping table traversal logic. The fast path needs to be taught a new string transfer method.

3. Results

This section will be filled in as the implementation progresses.


  1. Jochen Liedtke. Improved Address-Space Switching on Pentium Processors by Transparently Multiplexing User Address Spaces. GMD Technical Report No. 933. Further information about L3 and L4 can be found here.


Copyright 1998 by Jonathan Shapiro. All rights reserved. For terms of redistribution, see the GNU General Public License