Build Problems Resolved
Mark S. Miller
markm@caplet.com
Sun, 05 Sep 1999 14:55:01 -0700
At 11:59 AM 9/2/99 , mzukowski@bco.com wrote:
>Without having studied the code, it sounds like you are building parse
>trees, in which case you have an exact correspondence between your trees and
>what you parsed and in what order. This is very convienent, especially for
>your example. Given E's design it seems well suited to that representation,
>but other less friendly languages can benefit greatly from an AST
>representation which is not so directly related to the parse. But that also
>means that I couldn't mix quasi-pattern parsing of text with trees because
>one representation isn't equivalent to the other. I can't be in a parser
>rule parsing text and then hit a subtree object and know what to do with it.
>The only way to get it right would be to unparse the subtree into text and
>let the parser do its thing. Actually the key information that is missing
>is which decision the parser made in order to choose which alternative to
>parse. In some cases there may not have been any alternatives. In other
>cases it might be possible to cache that information directly in the tree.
>But then you also have to worry about side effects of parsing such as scope
>state, etc. I may be worrying about a problem that doesn't need solved
>though.
I think you may be assuming I have a deeper understanding of parsing issues
than I do. I've read this paragraph three time and I think I might know
what you're talking about, but I'm not sure. If you could step through one
concrete example in some detail, that would be immensely clarifying. Feel
free to make up a grammar that creates the problem you are worried about.
>Now onto the fun stuff. ANTLR's lexers are LL which makes them easy to
>reason about since they compile into recursive descent routines instead of
>DFAs. A simple-text matching quasi-pattern like `a@{x}d@y` would translate
>into ANTLR syntax as:
>
>AnonymousRule:
>options {greedy=false;}
>"a" x:(.)* "d" y:(.)* ;
>
>And following the rx quasi-pattern matcher, we could have anltr patterns
>like:
>
>`format (@from:ID) into (@result ID) using (@fmt FormatString)`
>which is trivially translated to (assuming rules ID, WS and FormatString are
>already defined from a lexer-grammar)
>
>"format" (WS)? from:ID (WS)? "into" (WS)? result:ID (WS)? "using" (WS)?
>fmt:FormatString
I think I'm getting it. Very very cool. I take it that there either
should have been colons after 'result' and 'fmt', or there shouldn't have
been a colon after 'from'?
>Why not `format @{from:ID} into @{result:ID} using @{FormatString}` ? I
>guess because you need everything inside of @{} to be only E patterns...
>Which means you need another kind of escape mechanism such as parenthesis
>for the "other" language like rx or antlr. And this because you want to
>implicitly quote the strings inbetween the embedded stuff.
Exactly right. However the colon question turns out, the quasi-string as
you wrote it will translate correctly:
"format (@{0}:ID) into (@{1} ID) using (@{2} FormatString)"
>When we get to trees with antlr there is no nice text representation and I
>wonder about the utility of an antlr tree quasi-pattern parser, although I
>can definately see the advantage of an antlr tree quasi-literal for the
>building of new trees.
The switch/match example used for symbolic differentiation parallels well
the Prolog idiom of multiple clauses of a predicate, each of which has a
head that tries the unify the incoming data with a different tree
pattern. Not only does it dispatch to the matching case, but in so doing
it extracts the subtrees representing the remaining parts of our ignorance
about the data (remaining after we know which case matched). This has
simply proven to be a very useful idiom, and I would expect it to be here
as well.
In any case, this seems like an excellent plan for embedding ANTLR in E
smoothly. It would be great if someone did this (hint hint).
>I also wonder about real "parsing" of tokenized text. Seems like maybe
>overkill for these quasi-parsers.
?
>Along the lines of the ending comments of your Quasi-Literal overview about
>Hypercard, I've been playing with something related but tangential. For the
>interesting problem of writing code generators I've been using a combination
>of noweb and Python. Quasi-literals are cool for small amounts of output
>text, but noweb is really nice for whole programs. For instance
>"<<something>>=" defines a chunk named "something", and "<<something>>"
>references that chunk and causes it to be substituted in:
How is end-of-chunk delimited?
I think you're exactly right that my textual quasi's are primarily useful
when the expression or pattern is relatively small.
What are your thoughts about how noweb & E might play together?
Cheers,
--MarkM