Parse trees v. ASTs

Chip Morningstar chip@communities.com
Wed, 8 Sep 1999 14:13:46 -0700 (PDT)


>Yeah, a lot of this is just standard parsing stuff.  I'm not as familiar
>with YACC to compare the two tools.
>
>> What redudancy is eliminated? As far as I can see that's the parse
>> tree anybody would generate.
>
>If you are building trees by hand, then yes, that's what you would want to
>do.  A "parse tree" is actually the parse represented as a tree with no
>hints as to what to leave out.  Strangely enough Python uses this kind of
>tree.  Sounds like E is not doing that from your description.

YACC makes you do all the work yourself, except for the actual
parsing. Each production has associated with it a block of "action"
code which you provide. This code is called when the production is
reduced. It can return a value for the production (which value can be
anything you want) and in turn has the values of the terminal and
(previously reduced) non-terminal symbols of the production as
parameters (values of terminal symbols are provided by the lexer,
which you also have to write). If I recall correctly, in the E
parser the typical production action consists of a constructor call to
create a syntax tree node of the appropriate class.

The downside of this is you have to do a lot of work yourself. The
upside is that it is extremely simple and extremely flexible (and the
work that's left to you is typically very easy, though often quite
tedious). After writing numerous compilers and transformation
utilities based on YACC over several years I noticed I was coding the
same flavor of stuff over and over again and finally ended up creating
a package of C macros called YaccHelper to do most of the scutwork;
of course, this package is of no value when programming in Java.

>> Where's the simplification? Or have we actually been using 
>> ASTs all along
>> and just calling them "parse-trees" out of sloppiness?
>
>Sounds like you have been using ASTs.  How do you manipulate your trees?  Do
>you do it by hand or have a grammar for them?  Do you have heterogenous
>nodes with a different class for each language construct or do you use a
>generic node class to have homogenous nodes?  Just curious.

We use a generic base parse tree node class with a subclass for each
major syntactic construct. Like I said, I just thought everybody did
this.