Re: user-level networking Bill Frantz (frantz@netcom.com)
Sun, 23 Aug 1998 22:01:00 -0800

At 8:07 PM -0800 8/20/98, Jonathan S. Shapiro wrote:
>I think I have finally come up with a satisfactory approach to moving
>network protocol handling into user land without an undue sacrifice in
>performance.
>
>Review of the problem:
>
>The problem lies only in *reading* packets, and stems from two issues:
>
> 1. Packets must be examined to determine their recipient. This
> typically requires making a temporary copy of the data.
>
> 2. When a packet arrives, the ethernet receiver must not be stalled
> if no process chances to be waiting for a packet at that
> instant. Between the cost of packet processing and the ever
> increasing incidence of back-to-back packets, there is a real
> need for the ethernet system to be able to grab packet receipt
> buffers quickly.
>
>We would like a solution that:
>
> + does not make redundant copies in the face of protection boundary
> crossings
> + does not cause clogs in the input FIFOs (problem 2 above).
> + requires the receiving application to pay for the packet storage.
>
>It just occurred to me that memory is getting much cheaper, given
>which the following solution seems doable:
>
>Classify packets into two types -- big and small.
>
>Small packets are <= XX (probably 128) bytes, and are collected into a
>network-specific small packet pool.
>
>Large packets are anything larger than XX bytes. When a large packet
>arrives, a new core page frame is allocated and the large packet is
>read into that page frame.
>
>An in-kernel packet classifier is now run. The output of the
>classifier is a tag corresponding to the packet fetch capability that
>can fetch this packet. The page frame is tagged with this value and
>stuck in a hash chain or some such thing. There is a distinguished
>tag value (probably zero) meaning ``not interested.''
>
>When a program wishes to receive packet(s), it must first establish a
>packet filter in the style of DPF. In exchange, it receives a
>capability that will let it read packets matching the pattern.
>
>Packet receive keys do not survive restart -- the input filter must be
>reestablished. Tag values are large (say: 64 bits), allowing them to
>be allocated without rollover within the expected service life of the
>CPU silicon.
>
>Using this key, the program can say ``fetch me a packet.'' To do so,
>it provides *two* data addresses:
>
> 1. An address into which small packets will be copied.
> 2. A valid page-aligned address for a large packet frame.
>
>In the large packet case, the packet is transferred by dorking the
>page table entries and shooting down the supplied virtual address in
>the TLB, reclaiming the in-core page frame for the page that the user
>specified. The packet is therefore moved with no copies, and the user
>application pays for the storage. (SmallTalk people: think
>``become''). In essence, the user-supplied page is swapped with the
>previously allocated driver page, conserving storage on both sides of
>the exchange.
>
>Reactions?
>
>shap

I like the general direction. However, the user should specify (1) an address for the small packets (as a receive byte string), and (2) a read-write page key for the magic data swap. The return code will tell where the data is.

If the interface is designed this way, the operation is within the existing model, and can be emulated for testing etc. Once you have all the authorities passed in the stand manner, anything magic the kernel does is optimization, security reasoning can proceed as if it is another domain operating the network, not special magic in the kernel.


Bill Frantz       | If hate must be my prison  | Periwinkle -- Consulting
(408)356-8506     | lock, then love must be    | 16345 Englewood Ave.
frantz@netcom.com | the key.     - Phil Ochs   | Los Gatos, CA 95032, USA