[E-Lang] Operators #2: Comparison
Mark S. Miller
markm@caplet.com
Sat, 07 Apr 2001 13:31:54 -0700
--=====================_172616577==_.ALT
Content-Type: text/plain; charset="us-ascii"
At 12:04 PM Saturday 4/7/01, Dean Tribble wrote:
>> x op__cmp(y)
>>
>>should return either an integer or a float64, which therefore must either be
>>negative, zero, positive, or NaN, indicating that x < y, x <=> y, x > y, or
>>x is incomparable to y, respectively, in this natural order. (Note that NaN
>>responds to all five zero-comparison queries with false.)
>
>This abstraction keeps making me itch. One of the red flags is the phrase,
>"either an integer or a number".
Actually, the phrase is "an integer or a float64". I would be happy to say
that the result must be a float64, but this is expensive for fully ordered
types. A looser requirement might say that it returns something that acts
either like either negative, zero, positive or NaN in response to the five
zero comparison messages, and to the message "isNaN()". But I think I'd
rather be more specific -- I see nothing to be gained by the looser spec.
>Another is the pervasive influence of NaN
>when I've never actually encountered one in many years of programming. The
>third is the use of float64 at all, when lots of graphics, etc, profitably
>use float32. We'll take up the float question separately.
Just briefly. We can't not support float64s. float32s just don't have
enough precision to get by, and float128s are usually overkill and not
widely supported. I also really don't want to have more than one floating
point type.
High speed floating point code should be written in some lower level
language like C or Java and linked in to E. The key is that an E list of
float64s be an unboxed "array of double", so this low level code can read
and produce these lists without needless overhead. If the low level code
wants to operate on arrays of float32s instead, we could accommodate this
without introducing a new type, but I'm inclined to make the low level code
convert at the interface until we find this to be a problem.
See http://www.erights.org/enative/fatpointers.html#boxingData
>This is E. You already have a concept more general than NaN (albeit
>inspired by NaN) that seems like it could be used in its place: broken
>promises. Use it! :-)
As I've already discussed with Dean verbally, we've been over this issue
before on e-lang, and I'm not inclined to revisit it. It's a difficult
issue, and what we have works. Does anyone have thread roots of that
previous discussion handy?
>Special functions can exist to extract any relevant
>NaN magic IEEE details for that handful of programmers that will *ever*
>care, but for the rest of us, NaN is just an exception. This combination
>seems like it would simplify the semantics of E that programmers need to pay
>attention to, and still be better than just about any other programming
>language out there with regard to NaN and IEEE compliance.
Huh? It would be worse than Java. In Java scalar floating point
arithmetic, NaNs are incomparable with NaNs, and 0.0 is equivalent to -0.0.
>Advantages:
>- comparable things are *reflexively* partially ordered.
Partial order of any variety still requires "x op__cmp(y)" to return
something that's not negative, zero, or positive.
>>Subset-ness
>>
>>* On Java classes (and by extension perhaps, on declared E types, should we
>>ever declare such things, as we probably will): x <= y means that x is a
>>declared subtype of y. If y is a class, then x is y or a class that extends
>>a class that's <= y. If y is an interface, then x is y or x is a class or
>>interface that extends or implements a class that's <= y. x < y means that
>>x is a declared strict subtype of y, that x < y && !(x <=> y).
>
>Given that E's semantics are based on method names, this is a bad idea. I
>could perfectly well have two classes that are subtypes from E's
>perspective, but unrelated in Java's subset graph. There are theories of
>subtyping which could make sense of E's semantics, and they are *not*
>consistent with the above subtyping rules. Unless there is a great need for
>this, don't do it. (Though feel free to make an unrelated set of methods
>for asking the above question of Java classes.)
Agreed. See recent response to Zooko.
>>* On regions: x <= y means that the set of members of region x is a subset
>>of the set of members of region y.
>
>If the ordering is different between two regions in the same coordinate
>space, are they different? If not, it might make sense to consider that
>'<=' for aggregates take enumeration order into account. That would enable
>more semantic similarity with Lists, if it could be done.
Regions, not being enumerable, don't need a parameterizable order, so I
didn't define them to have one. I think this may be different than Xanadu
regions, but I like it better. A coordinate space may very well still have
orderings associated with it, but these need not be a property of a region.
In any case, if it were, then for two regions r1 and r2 that contained the
same set of positions but with different orders:
r1 == r2 -> false
r1 <=> r2 -> true
ie, they're not the same, but they are equivalent (in their natural order,
which is subset).
>>* On sets: x <= y means x is a subset of y.
>
>>Lists
>>
>>I think this issue breaks my attempt to unify lists and multisets. Lists
>
>Does it make it more than just theoretically bad? Specifically, if lists
>are "multi-sets except for comparison" do things actually break? will
>people actually be confused? The answer may very well be yes, but it's
>worth asking :-)
>
>I any case, it's worth asking for other aggregates whether they should also
>take order into account. The unordered version of "<=" on an aggregate
>might be the variation you need to get through a comparison operator.
All good and urgent questions.
>One other thing for String comparison: a default ordering for Strings is
>only useful for sort-based indexing in collections. Because of locale
>issues, the details of any default ordering are never the right thing
>(except for the programmers locale) for human presentation/searching,
>especially if you take case-sensitivity issues into account. Therefore,
>most string tools parameterized with locale.
I disagree. People use the default order all the time because it's often
good enough, and they are understandably too lazy to learn about locales and
localization. For example, I think "ls" just uses the default order, and
people don't scream too much.
Cheers,
--MarkM
--=====================_172616577==_.ALT
Content-Type: text/html; charset="us-ascii"
At 12:04 PM Saturday 4/7/01, Dean Tribble wrote:
x op__cmp(y)
should return either an integer or a float64, which therefore must either
be
negative, zero, positive, or NaN, indicating that x < y, x <=>
y, x > y, or
x is incomparable to y, respectively, in this natural order. (Note
that NaN
responds to all five zero-comparison queries with
false.)
This abstraction keeps making me itch. One of the red flags is the
phrase,
"either an integer or a number".
Actually, the phrase is "an integer or a float64". I
would be happy to say
that the result must be a float64, but this is expensive for fully
ordered
types. A looser requirement might say that it returns something
that acts
either like either negative, zero, positive or NaN in response to the
five
zero comparison messages, and to the message "isNaN()".
But I think I'd
rather be more specific -- I see nothing to be gained by the looser
spec.
Another is the pervasive influence
of NaN
when I've never actually encountered one in many years of
programming. The
third is the use of float64 at all, when lots of graphics, etc,
profitably
use float32. We'll take up the float question
separately.
Just briefly. We can't not support float64s. float32s just
don't have
enough precision to get by, and float128s are usually overkill and not
widely supported. I also really don't want to have more than one
floating
point type.
High speed floating point code should be written in some lower level
language like C or Java and linked in to E. The key is that an E
list of
float64s be an unboxed "array of double", so this low level
code can read
and produce these lists without needless overhead. If the low level
code
wants to operate on arrays of float32s instead, we could accommodate this
without introducing a new type, but I'm inclined to make the low level
code
convert at the interface until we find this to be a problem.
See
http://www.erights.org/enative/fatpointers.html#boxingData
This is E. You already
have a concept more general than NaN (albeit
inspired by NaN) that seems like it could be used in its place: broken
promises. Use it! :-)
As I've already discussed with Dean verbally, we've been over this issue
before on e-lang, and I'm not inclined to revisit it. It's a
difficult
issue, and what we have works. Does anyone have thread roots of
that
previous discussion handy?
Special functions can exist to
extract any relevant
NaN magic IEEE details for that handful of programmers that will *ever*
care, but for the rest of us, NaN is just an exception. This
combination
seems like it would simplify the semantics of E that programmers need to
pay
attention to, and still be better than just about any other programming
language out there with regard to NaN and IEEE
compliance.
Huh? It would be worse than Java. In Java scalar floating
point
arithmetic, NaNs are incomparable with NaNs, and 0.0 is equivalent to
-0.0.
Advantages:
- comparable things are *reflexively* partially
ordered.
Partial order of any variety still requires "x op__cmp(y)" to
return
something that's not negative, zero, or positive.
Subset-ness
* On Java classes (and by extension perhaps, on declared E types, should
we
ever declare such things, as we probably will): x <= y means that x is
a
declared subtype of y. If y is a class, then x is y or a class that
extends
a class that's <= y. If y is an interface, then x is y or x is a
class or
interface that extends or implements a class that's <= y. x <
y means that
x is a declared strict subtype of y, that x < y && !(x
<=> y).
Given that E's semantics are based on method names, this is a bad
idea. I
could perfectly well have two classes that are subtypes from E's
perspective, but unrelated in Java's subset graph. There are
theories of
subtyping which could make sense of E's semantics, and they are *not*
consistent with the above subtyping rules. Unless there is a great
need for
this, don't do it. (Though feel free to make an unrelated set of
methods
for asking the above question of Java classes.)
Agreed. See recent response to Zooko.
*
On regions: x <= y means that the set of members of region x is a
subset
of the set of members of region y.
If the ordering is different between two regions in the same coordinate
space, are they different? If not, it might make sense to consider
that
'<=' for aggregates take enumeration order into account. That
would enable
more semantic similarity with Lists, if it could be
done.
Regions, not being enumerable, don't need a parameterizable order, so I
didn't define them to have one. I think this may be different than
Xanadu
regions, but I like it better. A coordinate space may very well
still have
orderings associated with it, but these need not be a property of a
region.
In any case, if it were, then for two regions r1 and r2 that contained
the
same set of positions but with different orders:
r1 == r2 -> false
r1 <=> r2 -> true
ie, they're not the same, but they are equivalent (in their natural
order,
which is subset).
*
On sets: x <= y means x is a subset of y.
Lists
I think this issue breaks my attempt to unify lists and multisets.
Lists
Does it make it more than just theoretically bad? Specifically, if
lists
are "multi-sets except for comparison" do things actually
break? will
people actually be confused? The answer may very well be yes, but
it's
worth asking :-)
I any case, it's worth asking for other aggregates whether they should
also
take order into account. The unordered version of "<="
on an aggregate
might be the variation you need to get through a comparison
operator.
All good and urgent questions.
One other thing for String
comparison: a default ordering for Strings is
only useful for sort-based indexing in collections. Because of
locale
issues, the details of any default ordering are never the right thing
(except for the programmers locale) for human presentation/searching,
especially if you take case-sensitivity issues into account.
Therefore,
most string tools parameterized with locale.
I disagree. People use the default order all the time because it's
often
good enough, and they are understandably too lazy to learn about locales
and
localization. For example, I think "ls" just uses the
default order, and
people don't scream too much.
Cheers,
--MarkM
--=====================_172616577==_.ALT--