Distribute Parse Trees, Not Bytecode
mzukowski@bco.com
mzukowski@bco.com
Wed, 23 Jun 1999 10:18:18 -0700
Chip wrote:
>[+] A good parse tree representation would also let you keep comments
>and source text formatting information.
I've written a couple of source to source translators and would recommend
keeping your "discarded" lexical elements such as whitespace, comments,
parenthesis, etc., completely unattached to the tree. In fact I highly
recommend storing the program as an Abstract Syntax Tree (AST) and not a
"parse tree" which would by definition derive it's tree structure from the
non-terminals in your grammar. But I think you are really talking about
ASTs already when you say a "good" parse tree representation.
To do any real analysis you will want an AST which gives you an easy to
manipulate structure which is uncluttered by things already implied in the
tree structure such as closing braces and parens. To reconstruct the source
you will want comments, etc. Keep those outside of the tree in a separate
table. As you walk the tree you look in the table for anything that should
come inbetween two tree nodes. Perhaps you would also want to package in
some secondary data structures that everybody will want, like a symbol
table. How much should go in? Where's your space/time tradeoff? In my
mind you want the first basic passes done like parsing, getting symbols into
the table, but don't go as far as removing information needed to reconstruct
the source.
Monty