Strings and Method Names
Ka-Ping Yee
ping@lfw.org
Thu, 22 Oct 1998 02:04:59 -0700 (PDT)
> I decided to break with Java conventions and allow both literal strings and
> quasi-strings to span multiple lines. I liked this in Tcl.
Interesting. Potentially dangerous, as it can sometimes yield
far-flung syntax errors -- but then i guess the same is true of
unmatched parentheses or braces. By the way, i hope the E
interpreter is smart enough to look for more input when you
leave a paren or bracket unclosed on a line (Python does this).
Wayne's STIPPLE language also had the neat idea of expecting a
continuation when an operator was found dangling at the end of
a line. These two things together might almost entirely obviate
the need for icky \-continuations.
> >Makes for easy editing of help messages and self-documentation
> >inside code.
>
> I understand about help. How does it aid self-documentation?
It helps self-documentation if, as in Python, the documentation
becomes truly attached to the running code, so that documentation
can be extracted from a module with zero parsing effort. The
introspection hooks allow the debugger (written in Python) to
examine the call stack and the local variables and signatures
(including argument names) of all active functions, and the
documentation lets you look up the description of a function
instantly. By having both types of information available to
running code, and a few conventions about the "doc strings",
you can do fun stuff like walk around the inheritance tree,
trace module dependencies, etc. and generate nice descriptive
reports. I've written several scripts (now fairly well-used at
PARC and at ILM) that provide nice Web displays of Python code
or debugging info this way. No parsing required.
(The Python convention is that if a string literal sits by itself
as the first thing in a class, function, or module, then it gets
saved by the interpreter as the "__doc__" attribute of that class,
function, or module, and can later be retrieved as, for example,
"module.__doc__" by anybody. There is a general consensus that
the first line of such strings should be a terse description of
the thing, and subsequent lines can go into more detail. The
complaint that the doc strings waste memory and interpretation
time was addressed in Python 1.5, where you can specify an "optimize"
flag that deletes all assertions and docstrings from the bytecode.)
> >I'm going to risk offense by repeating myself, and more strongly
> >emphasize that Vector ought to mean vector in the linear algebra
> >sense. If you take that word away, what name are the scientists
> >going to use for a vector when they implement math/geometry
> >libraries?
>
> I didn't take the word away, java.util.Vector did. I wish it weren't so.
Hold on just a second here. You're creating a new and wonderful
language, E. I see why it needs to *appeal* to those familiar
with Java. But it doesn't have to *be* Java. So what if there's
a badly-named thing called "java.util.Vector" that looks like an
array? Elevating it to the status of "the" one true Vector would
only worsen the confusion. It's a "java.util.Vector", not a Vector,
and doesn't need to be the name of the *default*, basic, mutable-
sequence type in the language. Javoids can go use java.util.Vector
if that makes them feel more at home, but what we're talking about
is E's very *own* mutable sequence type.
The java.util.Vector interface, which exposes things like its
capacity and the ability to directly manipulate it, seems far
from perfect. Let's not cast it in stone. If we do, will Java
programmers expect Tuples to accept the same messages too?
A lot of things changed between Java and C++. Perhaps the
similar appearance of the syntax was what seduced all those C++
programmers. But they did away with -> and introduced 'import'
and 'interface' and all kinds of stuff. They totally replaced C
arrays and no one complained, because the alternative was easier.
I don't think we need to worry about making E sequences look
exactly like a particular kind of Java array because people will
quickly get used to it just being easier in E.
New marketing jingle:
"It's just easier in E." (syntax)
and:
"It's just safer in E." (capabilities)
etc.
Or how about,
"Sure, i could try doing this in Java.
But it's just *right* in E." (security)
> >> >What are the "standard tuple" and "standard mapping" interfaces?
>
> Are you suggesting "maps" in addition to the Java-standard "containsKey" or
> instead of it?
I would be happiest, again, if there was just one method, "maps"...
but if it is essential to satisfy Java programmers (i.e., if i lose
the preceding debate, see above) and they really want "containsKey",
then so be it. I am so fond of shortness that i am extremely tempted
to express my emotional preference for having both over having only
"containsKey" -- but if it really gets to that point you should probably
make me lose that argument in favour of the TOOWTDI principle.
> > each: okay, but i like a verb better: how about "iterate"?
>
> "pairs"?
A noun just doesn't work for me, because the argument in this
case is going to be a function. (E style question: if object
names are nouns but method names are verbs, should a function-like
object be named with a noun too?) Regardless of whether the
function name is a noun or a verb, though, the fact that the
argument is a function -- and that this method probably returns
nothing -- makes me really want the method name to be a verb.
If i saw a dictionary method named "pairs", i would expect to
receive a list of (key, value) pairs by calling it.
> > elements: if we're calling them keys and values, why not values()?
> > also note elements() is confusing if sets are maps:
> > "set elements()" will get you [null, null, null, ...]
> > (I know, set keys() doesn't read that clearly either, but
> > at least it might prevent one common misunderstanding.)
>
> In this case, the potential confusion you point out is sufficiently likely
> that I'm willing to go with "values" *instead of* the Java-standard
> "elements".
Tyler had an interesting idea in the other message about a
set being a mapping from the elements to the elements themselves.
Did you have a specific reason for choosing that they should all
map to null?
> > ? items: Python dictionaries have an "items" method which can
> > be very handy -- they produce the list of (key, value)
> > pairs. Although Python doesn't do it, it may be useful
> > to turn a list into (index, value) pairs too. Not sure
> > if "items" is the best name for this, but it's not awful.
> > Maybe "itemize".
>
> Actually, I don't think I've ever used "keys" or "elements"/"values" from
> E, since "each"/"iterate"/"pairs" is so convenient (and the for-loop sugar
> makes it even more convenient). Since "keys" is 1) sufficient, 2)
> non-confusing, 3) named as a Javoid would expect, how about we just keep
> "keys" and drop both "elements" and "items"?
I'm not so sure about dropping both "elements/values" and "items".
One problem which i commonly solve using these methods is to sort
a dictionary on its values -- i'll zipper the values together with
the keys, ask the resulting list to sort, and then iterate.
pairs = map(None, dict.values(), dict.keys())
pairs.sort()
for value, key in pairs:
...
I think i would be okay with dropping "items" if it's also very
easy to do this kind of stuff using the appropriate E construct.
(In Python, mapping 'None' onto sequences is the kludge for
zipping them together instead of applying a function.)
> >How do you deal with iterating over mutable things? Or do you
> >just outlaw that? (Seems like a decent answer, i suppose.)
>
> I hadn't thought of that. Currently all iteration operations implicitly
> iterate over a snapshot of the collection, and the implementation of
> mappings and tables knows not to actually make a copy unless needed
> (copy-on-write).
That sounds like a pretty good answer to me, too. It seems like
a reasonable assumption, since the only thing you would want to
do if you did have a mutable sequence is get a snapshot anyway --
forcing the author to say so seems a little pedantic if there isn't
really any other possibility.
> "size". I will banish "length". (I agree with Dean that "count" would be
> better in the absence of Java conventions.)
"count" is cool, but i am just happy that you picked one. Hooray!
> >one possible way to make it go down easier is a Miranda
> >method that made asRepr call toString if you didn't supply your
> >own definition.
>
> Instead, the miranda method should be like
>
> to asRepr { `<${self toString}>` }
>
> except that the angle brackets should be replaced by something that'll give
> a guaranteed immediate parsing error (assuming that the asRepr output up to
> this point was well formed). Otherwise, your cut-and-paste use could
> accidentally seem to work.
Oh, yes, that's a good idea. It should either work totally, or
make sure it doesn't work at all.
> >> ? define PointMaker(x, y) {
> >> > define point {
> >> > to getX {x}
> >> > to getY {y}
> >> > }
> >> > }
> >> # value: <PointMaker>
> >>
> >> ? define pt := PointMaker(3, 5)
> >> # <point>
> >
> >I quite like this. I think this would lead to the convention
> >that the defining occurrence of a behaviour would capitalize
> >the behaviour name, in this case "point".
>
> I'd like to reserve "Point" to name a type-object that describes the
> protocol spoken by objects like pt.
Oh. Well, this brings us back to the other, bigger issue about
types and makers. Would you consider having "Point" be the name of
the maker and "PointType" be the name of the type?
And to avoid name clashes during upgrades, don't you think it
might be better to include some sort of scope-name-path to qualify
the behaviour name?
See, the one (admittedly aesthetic) problem i have with the above
is that the methods inside "point" must now refer to the object as
"point" instead of a widely-established conventional name like "self".
If behaviours were more fully qualified, then we could say:
define Point(x, y) {
define type := PointType # how is this to be taken care of?
define self {
to getX {x}
to getY {y}
to getType {type}
}
}
and get, for example,
? define pt := Point(3, 5)
# <Point>
if we further added the convention that the last element of
the behaviour path is not printed (as it will usually be "self",
and ought to be enclosed in at least one other scope level if
we expect to ever see its likeness again).
With a little fudging perhaps we could then get your purse
to have a default representation of "<Mint.Purse>" or something
like that.
Rrr. I just realized that
if (pt isA PointType)
looks pretty silly. Well, maybe then it is a better idea for
the type to be called "Point" after all, and then for the miranda
method to do something like '<' + self getType getName + '>'.
(I was about to say just "self getType"... but presumably
"self getType toString" would look like "<Type 'Point'>".)
But then again, is it really "isA" we mean here? To be precise
in our terminology, it is perhaps better to say
"pt is a member of the Point type".
I mean, on its own, it does seem to make sense that the
Point type should be called "PointType" rather than just "Point",
viz.:
Point >= 3dPoint # huh??
PointType >= 3dPointType # now read ">=" as "contains"
3dPointType <= PointType # now read "<=" as "is derived from"
We had better be careful with exactly what we call just "Point",
given that we now have types and makers *and* instances to deal with.
> >... You may need to create a different
> >quasiparser for repr substitution if we proceed this way.
>
> I'm glad you found this early. You're right about the implied need, but
> I'm not willing to pay this cost. "asRepr" therefore goes on a long list
> of good ideas that need to be more fully worked out before they can be
> adopted.
Hmm? This didn't seem all that expensive to me. I was just
thinking of a substituter that knew it was supposed to call
"asRepr" instead of "toString" on the things it was going to
substitute in.
> Like most good things, I got this from Norm.
Sound bite! Sound bite! :)
> >... (Wait a second, though... aren't underscores part of
> >identifiers? Oh, no, they can't be, because of pattern ignore.
> >But are underscores permitted within identifiers?)
>
> They are used in identifiers, in the ignore pattern, and in "_/". What's
> the problem?
Sorry, i was mumbling. I meant, for the tokenizer's sake,
presumably you have to outlaw underscores as the first character
of any identifier. Then it is safe to allow underscores in the
middle of an identifier.
> >... i am trying to avoid having too many different ways
> >to say the same thing. We'll get people who prefer to write
> >
> > to get(index)
> >
> >and people who prefer to write
> >
> > to [](index)
>
> (actually, it's
>
> to [index]
> )
Ooooh! Gee. It's very cute (especially given that juxtaposition
is your method calling operator)... but really, this would create
havoc when naming the methods in a debugger, for example... if
the method is really named "get", then searching for "get" in
your code won't find you the method.
> Ok, I get it, and I think I like it. The goal E currently satisfies, which
> this violates, is that a programmer familiar with the operator spelling of
> one of these can do everything they want without needing to learn the
> identifier spelling. I think I'll take the simpler grammar in exchange.
> Thanks!
Cool! Another victory for TOOWTDI, as you call it. I don't
think i've seen this acronym anywhere else, so you get credit
for coining it. That also means you are now faced with the
burden of deciding how it should be pronounced. Phhttbhthtt!! :)
!ping