[E-Lang] Operators #2: Comparison

Mark S. Miller markm@caplet.com
Fri, 06 Apr 2001 18:16:12 -0700


On Python page http://www.python.org/doc/current/ref/customization.html , it 
seems the only relevant expansion name is "__cmp__".  This seems very 
similar to E's current expansion of "<", "<=", "<=>", ">=", and ">" 
involving "compareTo", except for something I don't understand.

In E currently:

    x < y             x compareTo(y) belowZero()
    x <= y            x compareTo(y) atMostZero()
    x <=> y           x compareTo(y) isZero()
    x >= y            x compareTo(y) atLeastZero()
    x > y             x compareTo(y) aboveZero()

How does Python expand these?  Presumably it uses "__cmp__" rather than 
"compareTo".  But instead of the five messages on the right, the only thing 
I could find is "__nonzero__", which doesn't help.


General Meaning:

When x and y, after coercions, are both members of a (type? field? group?  
help me out here folks) T that has its own defined "natural" order (which must 
be an irreflexive partial order), then

    x op__cmp(y)

should return either an integer or a float64, which therefore must either be 
negative, zero, positive, or NaN, indicating that x < y, x <=> y, x > y, or 
x is incomparable to y, respectively, in this natural order.  (Note that NaN 
responds to all five zero-comparison queries with false.)


Specific Meanings:

Scalars


* On integers and chars: the obvious meaning.  integers are fully ordered, so 
op__cmp can respond with an integer.

* On booleans: an error.  I don't think it's particularly clarifying to say 
that false < true, so op__cmp isn't defined on booleans.

* On float64s: NaNs are incomparable to all float64s including themselves, 
and so are also irreflexive.  
"NaN op__cmp(f) -> NaN" and "f op__cmp(NaN) -> NaN". 
All floats other than NaN (including positive and negative infinity) are 
fully ordered as expected.  When f1 and f2 and not NaNs, then 
f1 op__cmp(f2) -> non-NaN according to this full order.  Note the following 
two disagreements between E-equality (sameness) and equivalence among float64s:

    NaN == NaN  vs  NaN op__cmp(NaN) -> NaN, so !(NaN <=> NaN)

NaN is the same as itself, but is incomparable to itself.

    -0.0 != 0.0 vs -0.0 <=> 0.0

-0.0 is not the same as 0.0, but is equivalent to 0.0


Subset-ness


* On Java classes (and by extension perhaps, on declared E types, should we 
ever declare such things, as we probably will): x <= y means that x is a 
declared subtype of y.  If y is a class, then x is y or a class that extends 
a class that's <= y.  If y is an interface, then x is y or x is a class or 
interface that extends or implements a class that's <= y.  x < y means that 
x is a declared strict subtype of y, that x < y && !(x <=> y).

* On regions: x <= y means that the set of members of region x is a subset 
of the set of members of region y.

* On sets: x <= y means x is a subset of y.


Lists


I think this issue breaks my attempt to unify lists and multisets.  Lists 
should be ordered according to a lexicographic ordering of their elements.  
While this is pleasant in general, it's only compelling in one case, 
Strings, but here it seems we have an inescapable conflict.  For strings,
x <= y should be by lexicographic ordering on the character codes.  If Strings 
are also multisets of their characters, then we'd have a conflicting claim 
on the meaning of x <= y -- that the characters in x are a subset of the 
characters in y.  Not only is this a conflict, but the latter meaning is 
sufficiently strange that I hereby withdraw my suggested unification of list 
and multiset.

So, on lists: x op__cmp(y) compares them lexicographically -- compares 
their elements pairwise left to right until it encounters the first pair 
that aren't equivalent.  The answer to the overall compareTo is negative, 
positive of NaN, exactly as the first non-equivalence is one of these.  If 
the two lists are of equal length and everywhere pairwise equivalent, then 
the two lists are equivalent.  (When the only disagreement is length, we 
generalize the answer from String.  What is it?)


Maps


It is plausible either to not define op__cmp on Maps, or to define it 
according to the keys of the Map considered as a set.  After some voice 
conversations with Dean, I'm inclined toward the first answer: not to define 
it on Maps.  If you want to operate on the keys of the maps as sets, ask the 
maps for their keys.