[E-Lang] An Attempted Restatement of Hydro's Semantics

Mark S. Miller markm@caplet.com
Sun, 01 Apr 2001 13:40:41 -0800


So far we've been talking mostly at the syntactic and API level -- what the 
collections operators should be and what operations they should map too.

I'd like to switch to semantics.  What kinds of collections should we have 
and what should their semantics be?  I've been studying Hydro through the 
much easier to understand abridged docs 
http://www.waterken.com/Hydro/2.0/doc/abridged-index.html while 
experimenting with Hydro in an Elmer window.  I should have done this long 
ago and I recommend it.  Starting in this message, I'll assume a basic 
familiarity with the (only!) eight types linked-to by the left-hand frame, 
and occasional other pages linked to from there -- mostly supertypes of 
these types. Tyler, thanks for the simplified view.  It helped a lot!

I'll start in this message by trying to abstract a high-level semantics from 
those immutable Hydro containers relevant to the present discussion (as well 
as trying to winnow down which cases are relevant). In upcoming messages, 
I'll suggest some changes to Hydro.  I'll proceed under the assumptions 
below until corrected.  So corrections appreciated even more than usual, as 
it's important to get this stuff straight in order to make progress from here.


                                       Mapoids and Setoids


There are two kinds of containers, distinguished by their answer to 
"isPairAssociative()".  For now, I'll call a pair-associative container a 
"mapoid", and a non-pair associative container a "setoid".  (These ugly 
words are mine, not Tyler's, and I hope they're only temporary.)

A mapoid consists of 
    1) the contents: a sequence of key-value pairs, 
    2) an enumeration order policy,
    3) a collision policy, 
    4) an optional key equivalence or ordering function, and 
    5) an optional value equivalence function.
The sequence order of #1 is the enumeration order.  Our requirement for a 
deterministic spec requires us to include enumeration order in the 
semantics.  This is why we can't use the conventional formulation: "a set of 
key-value pairs". (Although #1 could have been phrased as a set of triples: 
[enumeration position, key, value], but I'll avoid that view.)

A setoid consists of 
    1) the contents: a sequence of elements, 
    2) an enumeration order policy,
    3) a collision policy, 
    4) an optional element equivalence or ordering function. 
The sequence order of #1 is the enumeration order.  (#1 could have been
phrased as a set of pairs: [enumeration position, element], but I'll avoid 
that view.)

(Note that I'm also purposely avoiding the Hydro doc's merged view of setoid 
and mapoid -- that a setoid is a mapoid in which key and value are the same. 
My reason for avoiding this view will become clear in later email, but I 
hope my restatement is accurate despite this change of style.  We could 
also have said that a mapoid is a setoid where the setoid elements are of 
the form [key, value], but we'll avoid that view as well.)

When deriving a new container from one or more old containers, the left-hand 
container -- the message receiver -- determines the attributes of the new 
container unless stated otherwise in the operation.  This applies to all of 
the container attributes enumerated above.


                         Equivalence and Unsorted Collections


An equivalence function 
http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Equivalence.html 
determines whether two elements are in the same equivalence class, which in 
this message I'll usually abbreviate as "x <=> y", even though this notation 
would normally imply equivalence according to x, rather than according to a 
separate equivalence function.  If this shorthand ever seems ambiguous, 
lemme know.

Collections that only need equivalence are unsorted collections.  The 
enumeration order of an unsorted collection is deterministically specified 
entirely by the enumeration order policy (see below).

The default equivalence function for unsorted collection for the E language 
programmer is E's "x == y".  For the Java/Hydro programmer, it's "x.equals(y)".


                                 Order and Sorted Collections


An ordering function 
http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Order.html provides 
an irreflexive partial order -- a partial order in which, forall x, either x 
<=> x or x is incomparable to x.  (Btw, I verified with Dean that he had 
not actually encountered this notion previously.) 

Collections that need an ordering function are sorted collections.  I assume 
that presently they are fully-sorted -- that they may only contain elements 
that, taken together, form a full order according to the collection's 
ordering function.  This does not and should not require the ordering 
function itself to represent a full order.  To leave room open for other 
possibilities, I'll call these fully-sorted collections.

The enumeration order of a sorted mapoid is a full order consistent with the 
partial order of the keys.  The enumeration order of a sorted setoid is a 
full order consistent with the partial order of the elements.  The remaining 
sources of non-determinism are 

a) when x <=> y while x != y.  (When equivalence is looser than sameness.)
b) partially-ordered collections only: doesn't uniquely determine a full order
c) mapoid only: when a mapoid allows multiple associations for the 
   equivalent keys, the ordering function doesn't order these associations.

For both setoids and mapoids, the *remaining* nondeterminism must be 
absorbed by a deterministic deterministically specified enumeration order 
policy (see below).

The default equivalence function for sorted collections for the E language 
programmer is "x compareTo(y)".  For the Java/Hydro programmer, I assume 
it's the similar, where perhaps x and y must be of type PartiallyComparable 
http://www.waterken.com/Hydro/2.0/doc/com/waterken/PartiallyComparable.html .


                                 Enumeration Order Policy


Enumeration order policies are specified in terms of operations used to 
derive new collections from existing collections.  There are usually three 
players to keep track of, for which we use the following terminology:

    newCollection := existingCollection derivationOp(proposedContent)

Two enumeration order policies explicitly called out on the abridged Hydro 
pages are FIFO and LIFO.  Although these seem to be specific to unsorted 
setoids, I'll try to state these policies in a way that could be applied to 
unsorted mapoids as well.  I don't know if Hydro specifies a 
deterministic enumeration order policy for sorted collections, and at the 
moment I'm at a loss on how to state one.  But one is required.

* FIFO means that, in deriving a new container from an existing one, that 
the contents of the new container taken from the existing one appear first 
and in the same relative order.  When a derivation operation proposes new 
contents (necessarily in their own order), the contents of the new container 
taken from the proposed contents appear afterwards, and in the same relative 
order.

* With LIFO the contents taken from the proposed contents appear first in 
the new container, followed by the contents taken from the existing 
container.  The contents taken from the existing container appear in the 
same relative order.  Does the proposed contents appear in the same or the 
reversed relative order?

I assume FIFO is the default for unsorted collections, and that there is 
some sensible default for sorted collections.  In later email, I'll propose 
making these default choices the only choices -- I see no need to make this 
a parameterizable dimension.
.

                                         Collision Policy


The collision policy comes into play when deriving a new container from an existing container, where the derivation-operation arguments propose content that collide with the existing content.  A setoid collision occurs between an existing element x and a proposed element y iff x <=> y according to the equivalence function of the existing collection.  I'm not sure when a mapoid collision occurs between an existing association k1 => v1 and a proposed association k2 => v2.  Is it when k1 <=> k2, or when k1 <=> k2 && !(v1 <=> v2)?

The three collision policies are "multi", "replacing", "unique".

* "multi" simply says collisions aren't a conflict.  Put 'em both in the 
newly constructed collection.

* "replacing" says that proposed contents are included instead of the 
existing contents it collides with.

* "unique" says a collision throws an exception rather than constructing a 
new collection.  While I generally like the theory behind this kind of error 
detection, I don't think it warrants a separate collision policy, so I'll 
ignore this case.

- A replacing setoid is a "set"

- A multi setoid is a "multiset" (or "bag", "list", or "queue")

- A replacing mapoid is a "map"

- A multi mapoid is a "multimap".  Note that the precise semantics of a 
multimap depend on the question above.  With either answer, both k1 => v1 
and k1 => v2 can exist in the same multimap, when v1 and v2 are not 
equivalent.  What about when they are?  In any case, I don't think there's 
enough need for a multimap to justify keeping it.  


                                        Only Six Survive?


We'd then be left with set, multiset, and map; any of which may be sorted or 
FIFO unsorted.  If we can indeed boil things down to these 3*2 cases, that 
would be a great start.  I'll assume this in further email until I hear 
otherwise.

        Cheers,
        --MarkM