[E-Lang] An Attempted Restatement of Hydro's Semantics
Mark S. Miller
markm@caplet.com
Sun, 01 Apr 2001 13:40:41 -0800
So far we've been talking mostly at the syntactic and API level -- what the
collections operators should be and what operations they should map too.
I'd like to switch to semantics. What kinds of collections should we have
and what should their semantics be? I've been studying Hydro through the
much easier to understand abridged docs
http://www.waterken.com/Hydro/2.0/doc/abridged-index.html while
experimenting with Hydro in an Elmer window. I should have done this long
ago and I recommend it. Starting in this message, I'll assume a basic
familiarity with the (only!) eight types linked-to by the left-hand frame,
and occasional other pages linked to from there -- mostly supertypes of
these types. Tyler, thanks for the simplified view. It helped a lot!
I'll start in this message by trying to abstract a high-level semantics from
those immutable Hydro containers relevant to the present discussion (as well
as trying to winnow down which cases are relevant). In upcoming messages,
I'll suggest some changes to Hydro. I'll proceed under the assumptions
below until corrected. So corrections appreciated even more than usual, as
it's important to get this stuff straight in order to make progress from here.
Mapoids and Setoids
There are two kinds of containers, distinguished by their answer to
"isPairAssociative()". For now, I'll call a pair-associative container a
"mapoid", and a non-pair associative container a "setoid". (These ugly
words are mine, not Tyler's, and I hope they're only temporary.)
A mapoid consists of
1) the contents: a sequence of key-value pairs,
2) an enumeration order policy,
3) a collision policy,
4) an optional key equivalence or ordering function, and
5) an optional value equivalence function.
The sequence order of #1 is the enumeration order. Our requirement for a
deterministic spec requires us to include enumeration order in the
semantics. This is why we can't use the conventional formulation: "a set of
key-value pairs". (Although #1 could have been phrased as a set of triples:
[enumeration position, key, value], but I'll avoid that view.)
A setoid consists of
1) the contents: a sequence of elements,
2) an enumeration order policy,
3) a collision policy,
4) an optional element equivalence or ordering function.
The sequence order of #1 is the enumeration order. (#1 could have been
phrased as a set of pairs: [enumeration position, element], but I'll avoid
that view.)
(Note that I'm also purposely avoiding the Hydro doc's merged view of setoid
and mapoid -- that a setoid is a mapoid in which key and value are the same.
My reason for avoiding this view will become clear in later email, but I
hope my restatement is accurate despite this change of style. We could
also have said that a mapoid is a setoid where the setoid elements are of
the form [key, value], but we'll avoid that view as well.)
When deriving a new container from one or more old containers, the left-hand
container -- the message receiver -- determines the attributes of the new
container unless stated otherwise in the operation. This applies to all of
the container attributes enumerated above.
Equivalence and Unsorted Collections
An equivalence function
http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Equivalence.html
determines whether two elements are in the same equivalence class, which in
this message I'll usually abbreviate as "x <=> y", even though this notation
would normally imply equivalence according to x, rather than according to a
separate equivalence function. If this shorthand ever seems ambiguous,
lemme know.
Collections that only need equivalence are unsorted collections. The
enumeration order of an unsorted collection is deterministically specified
entirely by the enumeration order policy (see below).
The default equivalence function for unsorted collection for the E language
programmer is E's "x == y". For the Java/Hydro programmer, it's "x.equals(y)".
Order and Sorted Collections
An ordering function
http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Order.html provides
an irreflexive partial order -- a partial order in which, forall x, either x
<=> x or x is incomparable to x. (Btw, I verified with Dean that he had
not actually encountered this notion previously.)
Collections that need an ordering function are sorted collections. I assume
that presently they are fully-sorted -- that they may only contain elements
that, taken together, form a full order according to the collection's
ordering function. This does not and should not require the ordering
function itself to represent a full order. To leave room open for other
possibilities, I'll call these fully-sorted collections.
The enumeration order of a sorted mapoid is a full order consistent with the
partial order of the keys. The enumeration order of a sorted setoid is a
full order consistent with the partial order of the elements. The remaining
sources of non-determinism are
a) when x <=> y while x != y. (When equivalence is looser than sameness.)
b) partially-ordered collections only: doesn't uniquely determine a full order
c) mapoid only: when a mapoid allows multiple associations for the
equivalent keys, the ordering function doesn't order these associations.
For both setoids and mapoids, the *remaining* nondeterminism must be
absorbed by a deterministic deterministically specified enumeration order
policy (see below).
The default equivalence function for sorted collections for the E language
programmer is "x compareTo(y)". For the Java/Hydro programmer, I assume
it's the similar, where perhaps x and y must be of type PartiallyComparable
http://www.waterken.com/Hydro/2.0/doc/com/waterken/PartiallyComparable.html .
Enumeration Order Policy
Enumeration order policies are specified in terms of operations used to
derive new collections from existing collections. There are usually three
players to keep track of, for which we use the following terminology:
newCollection := existingCollection derivationOp(proposedContent)
Two enumeration order policies explicitly called out on the abridged Hydro
pages are FIFO and LIFO. Although these seem to be specific to unsorted
setoids, I'll try to state these policies in a way that could be applied to
unsorted mapoids as well. I don't know if Hydro specifies a
deterministic enumeration order policy for sorted collections, and at the
moment I'm at a loss on how to state one. But one is required.
* FIFO means that, in deriving a new container from an existing one, that
the contents of the new container taken from the existing one appear first
and in the same relative order. When a derivation operation proposes new
contents (necessarily in their own order), the contents of the new container
taken from the proposed contents appear afterwards, and in the same relative
order.
* With LIFO the contents taken from the proposed contents appear first in
the new container, followed by the contents taken from the existing
container. The contents taken from the existing container appear in the
same relative order. Does the proposed contents appear in the same or the
reversed relative order?
I assume FIFO is the default for unsorted collections, and that there is
some sensible default for sorted collections. In later email, I'll propose
making these default choices the only choices -- I see no need to make this
a parameterizable dimension.
.
Collision Policy
The collision policy comes into play when deriving a new container from an existing container, where the derivation-operation arguments propose content that collide with the existing content. A setoid collision occurs between an existing element x and a proposed element y iff x <=> y according to the equivalence function of the existing collection. I'm not sure when a mapoid collision occurs between an existing association k1 => v1 and a proposed association k2 => v2. Is it when k1 <=> k2, or when k1 <=> k2 && !(v1 <=> v2)?
The three collision policies are "multi", "replacing", "unique".
* "multi" simply says collisions aren't a conflict. Put 'em both in the
newly constructed collection.
* "replacing" says that proposed contents are included instead of the
existing contents it collides with.
* "unique" says a collision throws an exception rather than constructing a
new collection. While I generally like the theory behind this kind of error
detection, I don't think it warrants a separate collision policy, so I'll
ignore this case.
- A replacing setoid is a "set"
- A multi setoid is a "multiset" (or "bag", "list", or "queue")
- A replacing mapoid is a "map"
- A multi mapoid is a "multimap". Note that the precise semantics of a
multimap depend on the question above. With either answer, both k1 => v1
and k1 => v2 can exist in the same multimap, when v1 and v2 are not
equivalent. What about when they are? In any case, I don't think there's
enough need for a multimap to justify keeping it.
Only Six Survive?
We'd then be left with set, multiset, and map; any of which may be sorted or
FIFO unsorted. If we can indeed boil things down to these 3*2 cases, that
would be a great start. I'll assume this in further email until I hear
otherwise.
Cheers,
--MarkM