[E-Lang] Operators #2: Comparison

Dean Tribble tribble@e-dean.com
Sat, 07 Apr 2001 12:04:12 -0700


>When x and y, after coercions, are both members of a (type? field? group?
>help me out here folks) T that has its own defined "natural" order (which 
>must
>be an irreflexive partial order), then
>
>     x op__cmp(y)
>
>should return either an integer or a float64, which therefore must either be
>negative, zero, positive, or NaN, indicating that x < y, x <=> y, x > y, or
>x is incomparable to y, respectively, in this natural order.  (Note that NaN
>responds to all five zero-comparison queries with false.)

This abstraction keeps making me itch.  One of the red flags is the phrase, 
"either an integer or a number".  Another is the pervasive influence of NaN 
when I've never actually encountered one in many years of programming.  The 
third is the use of float64 at all, when lots of graphics, etc, profitably 
use float32.  We'll take up the float question separately.

This is E.  You already have a concept more general than NaN (albeit 
inspired by NaN) that seems like it could be used in its place: broken 
promises.  Use it!  :-)  Special functions can exist to extract any 
relevant NaN magic IEEE details for that handful of programmers that will 
*ever* care, but for the rest of us, NaN is just an exception.  This 
combination seems like it would simplify the semantics of E that 
programmers need to pay attention to, and still be better than just about 
any other programming language out there with regard to NaN and IEEE 
compliance.

Advantages:
- comparable things are *reflexively* partially ordered.
- the other 99.9999% of the code out no longer carries conceptual baggage 
from NaN and irreflexive objects.
- people are going to assume everything is reflexive, anyway.
- it is easier to explain (a Walnut opinion?)
- it simplifies operators, collections, regions, etc.
- it's more efficient and orthogonal
- it moves the complexity of NaNs to those who want to use them

>Specific Meanings:
>
>Scalars
>
>* On float64s: NaNs are incomparable to all float64s including themselves,
>and so are also irreflexive.
>"NaN op__cmp(f) -> NaN" and "f op__cmp(NaN) -> NaN".

e.g., NaN op__cmp(f) -> <Ref broken by <NotANumberException>>

>All floats other than NaN (including positive and negative infinity) are
>fully ordered as expected.  When f1 and f2 and not NaNs, then
>f1 op__cmp(f2) -> non-NaN according to this full order.

And with the above change, these return integers as usual.

>   Note the following
>two disagreements between E-equality (sameness) and equivalence among 
>float64s:
>
>     NaN == NaN  vs  NaN op__cmp(NaN) -> NaN, so !(NaN <=> NaN)

If you canonicalize the NaN broken promise, then the above holds, I 
believe, since a broken promise is == itself.

>Subset-ness
>
>* On Java classes (and by extension perhaps, on declared E types, should we
>ever declare such things, as we probably will): x <= y means that x is a
>declared subtype of y.  If y is a class, then x is y or a class that extends
>a class that's <= y.  If y is an interface, then x is y or x is a class or
>interface that extends or implements a class that's <= y.  x < y means that
>x is a declared strict subtype of y, that x < y && !(x <=> y).

Given that E's semantics are based on method names, this is a bad idea.  I 
could perfectly well have two classes that are subtypes from E's 
perspective, but unrelated in Java's subset graph.  There are theories of 
subtyping which could make sense of E's semantics, and they are *not* 
consistent with the above subtyping rules.  Unless there is a great need 
for this, don't do it.  (Though feel free to make an unrelated set of 
methods for asking the above question of Java classes.)

>* On regions: x <= y means that the set of members of region x is a subset
>of the set of members of region y.

If the ordering is different between two regions in the same coordinate 
space, are they different?  If not, it might make sense to consider that 
'<=' for aggregates take enumeration order into account.  That would enable 
more semantic similarity with Lists, if it could be done.

>* On sets: x <= y means x is a subset of y.

>Lists
>
>I think this issue breaks my attempt to unify lists and multisets.  Lists

Does it make it more than just theoretically bad?  Specifically, if lists 
are "multi-sets except for comparison" do things actually break?  will 
people actually be confused?  The answer may very well be yes, but it's 
worth asking :-)

I any case, it's worth asking for other aggregates whether they should also 
take order into account.  The unordered version of "<=" on an aggregate 
might be the variation you need to get through a comparison operator.

One other thing for String comparison:  a default ordering for Strings is 
only useful for sort-based indexing in collections.  Because of locale 
issues, the details of any default ordering are never the right thing 
(except for the programmers locale) for human presentation/searching, 
especially if you take case-sensitivity issues into account.  Therefore, 
most string tools parameterized with locale.

>Maps
>...
>conversations with Dean, I'm inclined toward the first answer: not to define
>it on Maps.  If you want to operate on the keys of the maps as sets, ask the
>maps for their keys.

I don't recall this particular detail, but I agree :-)