Build Problems Resolved

by way of "Mark S. Miller" <markm@caplet.com> mzukowski@bco.com
Thu, 02 Sep 1999 11:59:07 -0700


[Jonathan, any idea why he might be running into this problem?  I just
noticed that his posts are missing from the archive as well.  --MarkM]

P.S.  Could you cross post to elang list?  Every time I copy
e-lang@eros-os.org it gets rejected because of not finding the host, and I'm
not in a position at this big company to figure out what's wrong where.

Thanks,

Monty

 > >How exactly to you match trees?  Do you first compile the
 > match expression
 > >into a tree and then compare the results?
 >
 > That happens to be how I do it, but it doesn't have to be how
 > I do it.  An
 > observably identical quasi-pattern parser could instead produce a
 > MatchMaker containing code for a decision tree compiled from
 > the original
 > string.  Think about compiling Prolog clause-heads to Warren Abstract
 > Machine instructions.  Similarly, a ValueMaker could contain code for
 > generating a tree, without itself containing a tree.  The
 > analogy here
 > might be Prolog clause-bodies.
 >
 > >How do you distinguish between parents and children?
 >
 > I don't understand.

Without having studied the code, it sounds like you are building parse
trees, in which case you have an exact correspondence between your trees and
what you parsed and in what order.  This is very convienent, especially for
your example.  Given E's design it seems well suited to that representation,
but other less friendly languages can benefit greatly from an AST
representation which is not so directly related to the parse.  But that also
means that I couldn't mix quasi-pattern parsing of text with trees because
one representation isn't equivalent to the other.  I can't be in a parser
rule parsing text and then hit a subtree object and know what to do with it.
The only way to get it right would be to unparse the subtree into text and
let the parser do its thing.  Actually the key information that is missing
is which decision the parser made in order to choose which alternative to
parse.  In some cases there may not have been any alternatives.  In other
cases it might be possible to cache that information directly in the tree.
But then you also have to worry about side effects of parsing such as scope
state, etc.  I may be worrying about a problem that doesn't need solved
though.

 > >How do you distinguish between parents and children?
 >
 > I don't understand.

ANTLR includes a facility for tree-parsing, assuming LISP like
parent-sibling trees where each node responds to getChild() and getSibling()
messages.  To write a tree parser you need to denote the difference.  Your
approach to trees builds two trees and then matches them, letting the parser
build the trees.  It's just two different approaches.

Now onto the fun stuff.  ANTLR's lexers are LL which makes them easy to
reason about since they compile into recursive descent routines instead of
DFAs.  A simple-text matching quasi-pattern like `a@{x}d@y` would translate
into ANTLR syntax as:

AnonymousRule:
options {greedy=false;}
"a" x:(.)* "d" y:(.)* ;

And following the rx quasi-pattern matcher, we could have anltr patterns
like:

`format (@from:ID) into (@result ID) using (@fmt FormatString)`
which is trivially translated to (assuming rules ID, WS and FormatString are
already defined from a lexer-grammar)

"format" (WS)? from:ID (WS)? "into" (WS)? result:ID (WS)? "using" (WS)?
fmt:FormatString

Why not `format @{from:ID} into @{result:ID} using @{FormatString}` ?  I
guess because you need everything inside of @{} to be only E patterns...
Which means you need another kind of escape mechanism such as parenthesis
for the "other" language like rx or antlr.  And this because you want to
implicitly quote the strings inbetween the embedded stuff.

When we get to trees with antlr there is no nice text representation and I
wonder about the utility of an antlr tree quasi-pattern parser, although I
can definately see the advantage of an antlr tree quasi-literal for the
building of new trees.

I also wonder about real "parsing" of tokenized text.  Seems like maybe
overkill for these quasi-parsers.

Along the lines of the ending comments of your Quasi-Literal overview about
Hypercard, I've been playing with something related but tangential.  For the
interesting problem of writing code generators I've been using a combination
of noweb and Python.  Quasi-literals are cool for small amounts of output
text, but noweb is really nice for whole programs.  For instance
"<<something>>=" defines a chunk named "something", and "<<something>>"
references that chunk and causes it to be substituted in:

<<test>>=
@SysInclude { fontdefs	} # font definitions
@SysInclude { langdefs	} # language definitions
@SysInclude { tbl	} # Table definitions
@SysInclude { bsf	} # BasicSetup package
@Use { @BasicSetup }

<<font>>
{lines 1.2fx nohyphen} @Break
{0.0 0.0 0.0 setrgbcolor} @SetColor

10.5i @High 8.0i @Wide
English @Language
90d @Rotate
{
	@CD {	
		<<lout object>>
	}
}

@
A single @ on a line starts a "documentation chunk" which won't make it into
"tangled" code but will be typeset in the literate program "woven" for human
readers.  Following us is another chunk named "font"

<<font>>=
{Times Base 8p} @Font

The nice thing about noweb is that it preserves indentation, so when <<lout
object>> is expanded, every line will be prefixed with the indentation in
front of the reference to <<lout object>>.  And of course I said to myself,
why should I limit myself to textual representation?  I wrote a noweb parser
and "tangler" (in ANTLR of course!) and embedded JPython.  If a chunk name
is not defined in the noweb document, then it is evaluated in the JPython
interpreter's namespace.  If a chunk name ends with "exec", then I take that
to mean that the chunk is not a text template but rather JPython code to
execute.

This has turned into a very fun way to program code generators.  It makes it
especially easy to create a test piece of code, get it to work, then factor
out the parts that need to be code instead of constants, and break the whole
generated routine into easily digestable chunks--which is what literate
programming was meant for.

See my "Noweb Parser" at http://www.antlr.org/showcase.html for code, but
email me if you start using it because I have fixed a few bugs.

Monty