[E-Lang] An Attempted Restatement of Hydro's Semantics
Mark S. Miller
markm@caplet.com
Mon, 02 Apr 2001 08:08:09 -0700
At 04:26 AM Monday 4/2/01, Tyler Close wrote:
>>A mapoid consists of
>> 1) the contents: a sequence of key-value pairs,
>> 2) an enumeration order policy,
>> 3) a collision policy,
>> 4) an optional key equivalence or ordering function, and
>> 5) an optional value equivalence function.
>
> 6) an optional dynamic type checker for the key
> 7) an optional dynamic type checker for the value
>
>I'm not sure if 2) belongs in this list. The enumeration order policy is
>fixed for all types of map. Particular maps may differ on the other attributes.
All cool.
I'm glad you mentioned #6 and #7. This takes us to your
not-necessarily-enumerable collection type, the Domain
http://www.waterken.com/Hydro/2.0/doc/com/waterken/canal/Domain.html . This
is much like, but different from, E's current regions (which are based on
Xanadu's regions). Yet another reconciliation we'll need to do, but another
day. Let's get through the enumerable collections first.
Relating their use for type checking to current E, this seems more Guard
like than Region like, so perhaps we have three concepts to reconcile. In
any case, it's good that you have flexible type checkers here. The current
E collections can only be asked to do an instanceof check against a Class
(the equivalent, I take it, of your InstanceOf Domain).
>>The sequence order of #1 is the enumeration order. Our requirement for a
>>deterministic spec requires us to include enumeration order in the
>>semantics. This is why we can't use the conventional formulation: "a set of
>>key-value pairs". (Although #1 could have been phrased as a set of triples:
>>[enumeration position, key, value], but I'll avoid that view.)
>>
>>A setoid consists of
>> 1) the contents: a sequence of elements,
>> 2) an enumeration order policy,
>> 3) a collision policy,
>> 4) an optional element equivalence or ordering function.
>
> 5) an optional dynamic type checker for the value
Also a Domain. Cool.
>>The default equivalence function for unsorted collection for the E language
>>programmer is E's "x == y". For the Java/Hydro programmer, it's "x.equals(y)".
>
>I'll just note that I believe E's "x == y" is "x.equals(y) &&
>THE_E_EQ.equals(x.getKeyEquivalence()) &&
>THE_E_EQ.equals(x.getValueEquivalence())"
Huh? I don't understand this at all.
>I've tried to define the sorted collection interface such that it is a
>contract for a partially-sorted collection. The currently available
>implementations should be seen as an incomplete implementation of this
>interface (they throw an exception if they can't create a full order). My
>hope is that some Xanadu container will be able to implement the full
>contract. If/when someone does this implementation, it should be easy enough
>to sub it in via the Initializer interface, without affecting client code.
So the spec is for partially sorted collections, with a temporary
implementation restriction that only for order are currently accepted.
Wonderful!
>>The enumeration order of a sorted mapoid is a full order consistent with the
>>partial order of the keys. The enumeration order of a sorted setoid is a
>>full order consistent with the partial order of the elements. The remaining
>>sources of non-determinism are
>>
>>a) when x <=> y while x != y. (When equivalence is looser than sameness.)
>
>This can only happen when x is not using the E default equivalence
>implementation, and x contains an element that is not "==" to an element in
>y. For example consider a case-insensitive set of strings:
Terminology confusion. I meant x and y both to be elements to be compared by the unshown ordering function, gotten from the unshown existing collection. Let me know if the following successfully rephrases what you're trying to say:
This can only happen when the existing collection, c1, is not using the E
default equivalence implementation, and c1 contains an element, x, that is
not "==" to an element, y, in the proposed contents, c2, even though x <=> y
according to c1's equivalence function. For example consider a
case-insensitive set of strings:
> [ "MarkM" ] <=> [ "markm" ]
>
>and
>
> [ "MarkM" ] != [ "markm" ]
>
>I don't see why this behaviour is called "non-determinism".
In the union of two such multi collections, even a full ordering function
would not determine the resulting order. You must use creation order to
deterministically decide whether the result is
[ "MarkM", "markm" ] or [ "markm", "MarkM" ].
>> I'll try to state these policies in a way that could be applied to
>>unsorted mapoids as well.
>
>I think it is impossible to implement FIFO and LIFO behaviour for
>associative collections.
I'm cool with that. For mapoids, I don't care about FIFO or LIFO, so long
as we've got determinism.
>> I don't know if Hydro specifies a
>>deterministic enumeration order policy for sorted collections, and at the
>>moment I'm at a loss on how to state one. But one is required.
>
>Well, for the current full order implementations, the enumeration order
>should be obvious.
Modulo the [ "MarkM", "markm" ] vs [ "markm", "MarkM" ] issue above.
>The partial order contract that they incompletely
>implement would probably have to take into account the construction history,
>as unsorted collections do.
As does the full order contract, as demonstrated above.
>This seems tricky to define until we have an
>actual implementation that we can refer to.
Agreed. At this point I only care that we're committed to defining it.
>You and Dean know the Xanadu implementation. What was it's contract?
Jeez, I don't remember. Dean, Chris, Ravi, anyone?
>>* With LIFO [...] Does the proposed contents appear in the same or the
>>reversed relative order?
>
>The reversed relative order. It is my belief that merging two collections
>should produce a result that is the same as if the elements from the added
>collection where each individually added to the reference collection.
If we keep LIFO, I'm happy with that decision.
>>I assume FIFO is the default for unsorted collections, and that there is
>>some sensible default for sorted collections. In later email, I'll propose
>>making these default choices the only choices -- I see no need to make this
>>a parameterizable dimension.
>
>LIFO is the only option for unsorted collections. FIFO is just not possible.
>Consider how the insert and pop operations have to interact in order to
>avoid shuffling the rest of the collection.
FIFO Queues are unsorted collections, so this statement seems contradictory.
I assume you mean specifically unsorted associative collections (unsorted
mapoids)? In any case, I don't understand this claim. Could you expand?
>For sorted collections, the only implemented option is full order. If the
>collection is a "multi" collection, then elements with equivalent keys
>appear in FIFO order according to the construction history.
Good!
>>[...] I'm not sure when a mapoid collision occurs between an existing association k1 => v1 and a proposed association k2 => v2. Is it when k1 <=> k2, or when k1 <=> k2 && !(v1 <=> v2)?
>
>Collisions in an associative collection are always in terms of the keys only.
Cool.
>>The three collision policies are "multi", "replacing", "unique".
>>
>>* "multi" simply says collisions aren't a conflict. Put 'em both in the
>>newly constructed collection.
>>
>>* "replacing" says that proposed contents are included instead of the
>>existing contents it collides with.
>>
>>* "unique" says a collision throws an exception rather than constructing a
>>new collection. While I generally like the theory behind this kind of error
>>detection, I don't think it warrants a separate collision policy, so I'll
>>ignore this case.
>
>You're wrong. In my code, I find that I use all three associative collection
>types with roughly the same frequency. For a recent example of a "unique"
>container, consider my recent SLS proposal. It is a unique map from a key to
>a string list. If a user attempts to register a duplicate key string (ie:
>public key hash), then an exception must be produced.
You might want to consider instead an immutable variant of what I do in E's
current mutable collections. All my update operations which might have a
conflict have an optional additional argument called "strict", which
defaults to false. If provided and true, then the update operation throws
an exception rather than replacing. The knowledge that a replace would be
an error is per-operation, not per-collection.
>>[...] I don't think there's enough need for a multimap to justify keeping it.
>
>I also use the multimap a lot. An example of this can also be found by
>looking at an upcoming revision of my SLS proposal. Rather than just
>providing a string list, the SLS should provide a multimap of string type =>
>string. This will allow for the storage of heterogenous information in the
>SLS. For example, the string type might be "url", or "onion", or ...
I find "a lot" surprising. I'm not surprised that there are occasional
examples like the SLS. However, if these are comparatively rare, I'd be
inclined to use a map whose values were sets, rather than learn a new
abstraction for a rare usage with no clear benefit. Sorta like Dean's
argument against rarely used syntax.
>> Only Six Survive?
>>
>>
>>We'd then be left with set, multiset, and map; any of which may be sorted or
>>FIFO unsorted. If we can indeed boil things down to these 3*2 cases, that
>>would be a great start.
>
>I sympathize with your desire to reduce the number of variations in the
>container types, but I think it's a bad idea. I believe that they are all
>useful and that we can have consistent semantics across all of them. Users
>can then use whatever reduced subset of this they like. I suspect that most
>users will start out using only the "replace" variants of associative
>collections as well as the LIFO and FIFO queue types. As they get used to
>these and come to understand the differences with the other variants,
>they'll find that their code can be simplified by using a collection that is
>better matched to the operation that they are performing.
Earlier, I raised the issue of how to better decouple the E and Hydro
efforts, while still enabling them to play together smoothly. E can already
call and be called by a never ending set of Java libraries. It's good to
keep library design separate from language design when we can, but a
language has a core set of data types where these issues meet.
I would like to define a reduced subset of Hydro, adequate for all core E
language collection needs. I would like the E language to specify only this
subset as its necessary collection API. Having the rest of Hydro available
as libraries is fine, just as it is fine for a zillion other libraries to be
available.
I continue to think that the core collection types may only need to be
unsorted
* map
* set
* list/multiset
* the list subtype String
sorted
* map
* set
* list/multiset
* Domains/Regions or something (not yet thought about)
On reflection, the 3 sorted types above probably don't need to be included
in this core subset, leaving us with 5. If we can define the API of the
core subset so it doesn't pull in others, ie, so it's self contained, then
we won't need to argue about the rest of Hydro in order to for E to shift
from its current collections to this subset of Hydro.
Cheers,
--MarkM