[E-Lang] An Attempted Restatement of Hydro's Semantics

Tyler Close tclose@oilspace.com
Mon, 02 Apr 2001 12:26:48 +0100


At 01:40 PM 4/1/01 -0800, Mark S. Miller wrote:
>I've been studying Hydro through the
>much easier to understand abridged docs
>http://www.waterken.com/Hydro/2.0/doc/abridged-index.html while
>experimenting with Hydro in an Elmer window.  I should have done this long
>ago and I recommend it.

I also agree. The E command line is a very useful tool for learning new 
APIs, as well as first cut debugging of new code. I often go straight to 
the E shell even before fully reading the Javadoc.

>   Starting in this message, I'll assume a basic
>familiarity with the (only!) eight types linked-to by the left-hand frame,
>and occasional other pages linked to from there -- mostly supertypes of
>these types. Tyler, thanks for the simplified view.  It helped a lot!

Thanks for asking me to do it. Hopefully this will mean more users.


>I'll start in this message by trying to abstract a high-level semantics from
>those immutable Hydro containers relevant to the present discussion (as well
>as trying to winnow down which cases are relevant). In upcoming messages,
>I'll suggest some changes to Hydro.  I'll proceed under the assumptions
>below until corrected.  So corrections appreciated even more than usual, as
>it's important to get this stuff straight in order to make progress from here.

Your understanding seems mostly correct. I'll do specifics where appropriate.

>                                        Mapoids and Setoids
>
>
>There are two kinds of containers, distinguished by their answer to
>"isPairAssociative()".  For now, I'll call a pair-associative container a
>"mapoid", and a non-pair associative container a "setoid".  (These ugly
>words are mine, not Tyler's, and I hope they're only temporary.)

In this message, you seem to be attempting to merge the concepts of a queue 
and a multiset. This is a mistake. A multiset can be used as if it were a 
LIFO queue (stack), but it is impossible to use it as a FIFO queue. The 
FIFO queue is an often used data structure that you can't get rid of.

There are also implementation reasons for using separate types. With a 
queue, you know that push / pop is the "primary" operation, whereas with a 
multiset, searching / inserting is the "primary" operation. It's possible 
to optimize out a lot of the inner data structure overhead in cases where 
the client code is only using these "primary" operations. Having the client 
code tell you at construction time what the "primary" operation will be is 
important.

I don't think you need the "-oid" names. I think it is reasonable to talk 
in terms of a queue, map or set.


>A mapoid consists of
>     1) the contents: a sequence of key-value pairs,
>     2) an enumeration order policy,
>     3) a collision policy,
>     4) an optional key equivalence or ordering function, and
>     5) an optional value equivalence function.

         6) an optional dynamic type checker for the key
         7) an optional dynamic type checker for the value

I'm not sure if 2) belongs in this list. The enumeration order policy is 
fixed for all types of map. Particular maps may differ on the other attributes.

>The sequence order of #1 is the enumeration order.  Our requirement for a
>deterministic spec requires us to include enumeration order in the
>semantics.  This is why we can't use the conventional formulation: "a set of
>key-value pairs". (Although #1 could have been phrased as a set of triples:
>[enumeration position, key, value], but I'll avoid that view.)
>
>A setoid consists of
>     1) the contents: a sequence of elements,
>     2) an enumeration order policy,
>     3) a collision policy,
>     4) an optional element equivalence or ordering function.

         5) an optional dynamic type checker for the value

>The sequence order of #1 is the enumeration order.  (#1 could have been
>phrased as a set of pairs: [enumeration position, element], but I'll avoid
>that view.)
>
>(Note that I'm also purposely avoiding the Hydro doc's merged view of setoid
>and mapoid -- that a setoid is a mapoid in which key and value are the same.
>My reason for avoiding this view will become clear in later email, but I
>hope my restatement is accurate despite this change of style.  We could
>also have said that a mapoid is a setoid where the setoid elements are of
>the form [key, value], but we'll avoid that view as well.)

I'm curious to find out what you've got cooking here. I've spent a lot of 
time thinking about this and am certain that this view of a set is the 
"right" one.


>When deriving a new container from one or more old containers, the left-hand
>container -- the message receiver -- determines the attributes of the new
>container unless stated otherwise in the operation.  This applies to all of
>the container attributes enumerated above.
>
>
>                          Equivalence and Unsorted Collections
>
>
>An equivalence function
>http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Equivalence.html
>determines whether two elements are in the same equivalence class, which in
>this message I'll usually abbreviate as "x <=> y", even though this notation
>would normally imply equivalence according to x, rather than according to a
>separate equivalence function.  If this shorthand ever seems ambiguous,
>lemme know.
>
>Collections that only need equivalence are unsorted collections.  The
>enumeration order of an unsorted collection is deterministically specified
>entirely by the enumeration order policy (see below).
>
>The default equivalence function for unsorted collection for the E language
>programmer is E's "x == y".  For the Java/Hydro programmer, it's 
>"x.equals(y)".

I'll just note that I believe E's "x == y" is "x.equals(y) && 
THE_E_EQ.equals(x.getKeyEquivalence()) && 
THE_E_EQ.equals(x.getValueEquivalence())"



>                                  Order and Sorted Collections
>
>
>An ordering function
>http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Order.html provides
>an irreflexive partial order -- a partial order in which, forall x, either x
><=> x or x is incomparable to x.  (Btw, I verified with Dean that he had
>not actually encountered this notion previously.)
>
>Collections that need an ordering function are sorted collections.  I assume
>that presently they are fully-sorted -- that they may only contain elements
>that, taken together, form a full order according to the collection's
>ordering function.  This does not and should not require the ordering
>function itself to represent a full order.  To leave room open for other
>possibilities, I'll call these fully-sorted collections.

I've tried to define the sorted collection interface such that it is a 
contract for a partially-sorted collection. The currently available 
implementations should be seen as an incomplete implementation of this 
interface (they throw an exception if they can't create a full order). My 
hope is that some Xanadu container will be able to implement the full 
contract. If/when someone does this implementation, it should be easy 
enough to sub it in via the Initializer interface, without affecting client 
code.


>The enumeration order of a sorted mapoid is a full order consistent with the
>partial order of the keys.  The enumeration order of a sorted setoid is a
>full order consistent with the partial order of the elements.  The remaining
>sources of non-determinism are
>
>a) when x <=> y while x != y.  (When equivalence is looser than sameness.)

This can only happen when x is not using the E default equivalence 
implementation, and x contains an element that is not "==" to an element in 
y. For example consider a case-insensitive set of strings:

         [ "MarkM" ] <=> [ "markm" ]

and

         [ "MarkM" ] != [ "markm" ]

I don't see why this behaviour is called "non-determinism".

>b) partially-ordered collections only: doesn't uniquely determine a full order
>c) mapoid only: when a mapoid allows multiple associations for the
>    equivalent keys, the ordering function doesn't order these associations.
>
>For both setoids and mapoids, the *remaining* nondeterminism must be
>absorbed by a deterministic deterministically specified enumeration order
>policy (see below).
>
>The default equivalence function for sorted collections for the E language
>programmer is "x compareTo(y)".  For the Java/Hydro programmer, I assume
>it's the similar, where perhaps x and y must be of type PartiallyComparable
>http://www.waterken.com/Hydro/2.0/doc/com/waterken/PartiallyComparable.html .
>
>
>                                  Enumeration Order Policy
>
>
>Enumeration order policies are specified in terms of operations used to
>derive new collections from existing collections.  There are usually three
>players to keep track of, for which we use the following terminology:
>
>     newCollection := existingCollection derivationOp(proposedContent)
>
>Two enumeration order policies explicitly called out on the abridged Hydro
>pages are FIFO and LIFO.  Although these seem to be specific to unsorted
>setoids,

to queues

>  I'll try to state these policies in a way that could be applied to
>unsorted mapoids as well.

I think it is impossible to implement FIFO and LIFO behaviour for 
associative collections. For sorted collections, this is true by 
definition. For unsorted collections, I have tried attacking the problem 
many times from many different angles and I just don't see how it's 
possible. The closest I got was the List class, but I can't figure out how 
to efficiently make it provide an immutable map interface. I'll also note 
that the techinique applied in the List implementation does not scale well. 
I can't extend that technique to large disk based collections.

>   I don't know if Hydro specifies a
>deterministic enumeration order policy for sorted collections, and at the
>moment I'm at a loss on how to state one.  But one is required.

Well, for the current full order implementations, the enumeration order 
should be obvious. The partial order contract that they incompletely 
implement would probably have to take into account the construction 
history, as unsorted collections do. This seems tricky to define until we 
have an actual implementation that we can refer to. You and Dean know the 
Xanadu implementation. What was it's contract?


>* FIFO means that, in deriving a new container from an existing one, that
>the contents of the new container taken from the existing one appear first
>and in the same relative order.  When a derivation operation proposes new
>contents (necessarily in their own order), the contents of the new container
>taken from the proposed contents appear afterwards, and in the same relative
>order.
>
>* With LIFO the contents taken from the proposed contents appear first in
>the new container, followed by the contents taken from the existing
>container.  The contents taken from the existing container appear in the
>same relative order.  Does the proposed contents appear in the same or the
>reversed relative order?

The reversed relative order. It is my belief that merging two collections 
should produce a result that is the same as if the elements from the added 
collection where each individually added to the reference collection.


>I assume FIFO is the default for unsorted collections, and that there is
>some sensible default for sorted collections.  In later email, I'll propose
>making these default choices the only choices -- I see no need to make this
>a parameterizable dimension.

LIFO is the only option for unsorted collections. FIFO is just not 
possible. Consider how the insert and pop operations have to interact in 
order to avoid shuffling the rest of the collection.

For sorted collections, the only implemented option is full order. If the 
collection is a "multi" collection, then elements with equivalent keys 
appear in FIFO order according to the construction history.


>                                          Collision Policy
>
>
>The collision policy comes into play when deriving a new container from an 
>existing container, where the derivation-operation arguments propose 
>content that collide with the existing content.  A setoid collision occurs 
>between an existing element x and a proposed element y iff x <=> y 
>according to the equivalence function of the existing collection.  I'm not 
>sure when a mapoid collision occurs between an existing association k1 => 
>v1 and a proposed association k2 => v2.  Is it when k1 <=> k2, or when k1 
><=> k2 && !(v1 <=> v2)?

Collisions in an associative collection are always in terms of the keys only.


>The three collision policies are "multi", "replacing", "unique".
>
>* "multi" simply says collisions aren't a conflict.  Put 'em both in the
>newly constructed collection.
>
>* "replacing" says that proposed contents are included instead of the
>existing contents it collides with.
>
>* "unique" says a collision throws an exception rather than constructing a
>new collection.  While I generally like the theory behind this kind of error
>detection, I don't think it warrants a separate collision policy, so I'll
>ignore this case.

You're wrong. In my code, I find that I use all three associative 
collection types with roughly the same frequency. For a recent example of a 
"unique" container, consider my recent SLS proposal. It is a unique map 
from a key to a string list. If a user attempts to register a duplicate key 
string (ie: public key hash), then an exception must be produced.


>- A replacing setoid is a "set"
>
>- A multi setoid is a "multiset" (or "bag", "list", or "queue")
>
>- A replacing mapoid is a "map"
>
>- A multi mapoid is a "multimap".  Note that the precise semantics of a
>multimap depend on the question above.  With either answer, both k1 => v1
>and k1 => v2 can exist in the same multimap, when v1 and v2 are not
>equivalent.  What about when they are?  In any case, I don't think there's
>enough need for a multimap to justify keeping it.

I also use the multimap a lot. An example of this can also be found by 
looking at an upcoming revision of my SLS proposal. Rather than just 
providing a string list, the SLS should provide a multimap of string type 
=> string. This will allow for the storage of heterogenous information in 
the SLS. For example, the string type might be "url", or "onion", or ...

Your terminology proposal pre-supposes some of your conclusions. I think 
this is a big mistake. Please stick to the terminology presented in the 
Hydro docs until you convince us that a reduced namespace is appropriate.


>                                         Only Six Survive?
>
>
>We'd then be left with set, multiset, and map; any of which may be sorted or
>FIFO unsorted.  If we can indeed boil things down to these 3*2 cases, that
>would be a great start.

I sympathize with your desire to reduce the number of variations in the 
container types, but I think it's a bad idea. I believe that they are all 
useful and that we can have consistent semantics across all of them. Users 
can then use whatever reduced subset of this they like. I suspect that most 
users will start out using only the "replace" variants of associative 
collections as well as the LIFO and FIFO queue types. As they get used to 
these and come to understand the differences with the other variants, 
they'll find that their code can be simplified by using a collection that 
is better matched to the operation that they are performing.

Tyler