Happy Happy Joy Joy (was: On to Hydro)
Jonathan S. Shapiro
shap@eros-os.org
Wed, 23 Aug 2000 13:16:39 -0400
> However, allowing arbitrary choice of comparison means that it's not
> necessarily as easy as adding a counter when you want to allow multiple
> instances of the "same" value in a collection. In particular, since "same"
> doesn't necessarily mean "identical object" once you allow for the choice
> of comparison operator...
Nice catch, Dan. I had not considered this.
In this case, though, I think the resolution is straightforward. There are
two distinct (and readily separated in practice) questions that must be
resolved:
1. Sort order: do two objects occupy the same "position" in the collating
sequence?
2. Identity: are two objects interchangeable for purposes of what needs to
be stored in the collection.
The first problem is purely a matter of choosing the comparison operator.
Whatever it may be, the equality test used by the comparison operator
defines the collating sequence position. A side observation here is that
certain collection implementations are greatly simplified if the collection
implementation can freely assume that a < b => b > a. This follows from the
total ordering requirement, but may not be true for a given comparator
implementation. The specification should say that the comparison operator
must obey this for all input values.
The second issue is much much trickier. Identity can be done on pointer
value, so that's not so bad. Equality raises the following design issue:
If two items are equal w.r.t. the collating sequence, and if they are
"equal" with respect to the object equality test, but not identical, should
multiple copies be stored by the collection?
The answer depends on the application, but I claim that the default is that
a multivalued collection should store distinct copies for each instance. The
reasoning is as follows:
Consider two instances a, b that are currently equal but not identical, both
contained in a collection. There are operations on a that might change it's
value. If the object remains in the collection (i.e. I did a lookup but not
a remove), there is an inevitable problem, and either the specification for
the collection must be careful or there must be some form of notification to
the collection (self valueChanged?). I don't like the notification model,
but some people do.
However, if I have removed the object a from the collection and subsequently
perform an operation on a that changes its value, I should be entitled by
default to assume that this has no effect on the future behavior of the
collection or its remaining members. It follows from this that operations on
a must not change the value of "b" and that "b" therefore must be a distinct
member of the collection in at least some applications. I think that this
should be the default behavior.
Finally, when a multicollection holds multiple objects that collate to the
same position in the sort order, the order in which these objects are
returned or iterated should be undefined. In particular, given three
insertions followed by three removals there should be no ordering promise
about the order in which the values are returned. Further, a lookup operator
may return *any* of the equal-collated objects. It should be guaranteed that
the iterator shall traverse each object exactly once (ignoring the case
where object is removed before being reached by the iterator, which is
another gooey mess).
It should further be specified that in the absence of an intervening
modification to the collection, two "lookup" operations that are handed the
same argument should always return the same value. This should *not* be
required if there is an intervening mutation, as the mutation may (e.g.)
induce a rebalancing that makes it more favorable to return some other
element in the same collating position.
A last comment:
Please bear in mind that I am NOT arguing that ordered collections should in
general be multivalued (new tem by analogy to multiset). I am merely trying
to set forth my expectations about how a multivalued collection, if
implemented, should behave when multiple equal objects are inserted/removed.