Compiling E: Phases of Transformations
Mark S. Miller
markm@caplet.com
Wed, 09 Aug 2000 12:31:54 -0700
At 07:26 AM 8/9/00 , Monty Zukowski wrote:
>You don't really need to know the parse nodes, just how a span in one file maps
>to a span in another file, since the error message will be printed in terms of
>the file span, not the parse node.
But what is the interface to the E virtual machine by which E computation
loads a Kernel-E program? If it is a flat array of text or bytes, then we
could provide such a span-to-span mapping. But we would still need to
figure out how to pass this information between stages of the compiler so
that this mapping can be generated, for the virtual machine to load.
Instead, I assume the virtual machine loads a Kernel-E parse-tree itself,
rather than an encoding of such, in which case we need to provide a mapping
from parse-nodes to original source spans. I only see three ways to provide
such a mapping:
1) encode it in the parse-nodes.
2) separately provide a mapping from parse-node-identity to source-spans
3) separately provide a mapping from parse-tree navigation path (turning
directions) to source-span.
#2 requires that parse-nodes have identity, which conflicts with making them
pass-by-copy.
#3 (which I hadn't previously though of) is the equivalent for trees of
using spans in text as the keys of a mapping. It would work, but seems to
be a lot of trouble.
The known problem with #1 is that it encodes information in the parse trees
that don't correspond to a printed form. In light of the previous
discussion of E's expansion of quasi-expressions, and the tentative
conclusion that text + span-to-span mapping is the right interface there,
perhaps the right answer to this objection is to provide a parse-tree-printing
operation that generates both text and a span-to-span mapping that maps from
this printed text to the original sources.
Then, to re-answer the last email, the decision about what stage the
debugger should display as source becomes a decision about which map we
throw away.
Were the loading interface to the virtual machine text rather than trees, it
could be the pair of text and mapping that a Kernel-E tree would print.
Parse-trees would no longer have "unprintable" information.
>You know, this has an interesting case. Grammar constructs get generated into
>loops and conditionals, and to understand antlr's generated code you want to
>look at the generated code but know also what the grammar element that generated
>it was. Hmm, this actually is probably best solved by comments instead of some
>weird nested line directive scheme.
Oh god, comments! Another option!
Actually, using the previous quasi-parser example, I think we can disqualify
comments for the same reason we disqualified #line-like directives. It is
unreasonably burdensome on the definition of each quasi-grammar that they must
understand any one universal (among quasi-grammars) commenting convention.
Better to put the mapping on the side, in a separate argument to the
compiler, than to impose this burden on all quasi-grammars. Making them all
recognize ${<integer>} and such was bad enough.
How do you feel about the "which mapping do you throw away" approach for
determining whether an ANTLR user debugs at the grammar level or the parser
implementation level?
>Also interesting is that generating .class files would go against one of antlr's
>main philosophies--to generate human readable code. Although I guess you should
>be able to generate .class files and .java files simultaneously if desired.
How do you like the suggestion, in earlier email, of generating Java source
plus a Java-source-span to grammar-source-span mapping, and then giving the
ANTLR user the option to use this mapping to post-process the *.class file?
The decision about which level to debug at becomes the decision about
whether to post-process the *.class file.
This approach requires no changes to or knowledge of the insides of any Java
compilers. Often, it can also be used with stock Java debuggers, without
any changes or much knowledge.
Cheers,
--MarkM