Compiling E: Phases of Transformations

Mark S. Miller
Wed, 09 Aug 2000 12:31:54 -0700

At 07:26 AM 8/9/00 , Monty Zukowski wrote:
>You don't really need to know the parse nodes, just how a span in one file maps
>to a span in another file, since the error message will be printed in terms of
>the file span, not the parse node.

But what is the interface to the E virtual machine by which E computation 
loads a Kernel-E program?  If it is a flat array of text or bytes, then we 
could provide such a span-to-span mapping.  But we would still need to 
figure out how to pass this information between stages of the compiler so 
that this mapping can be generated, for the virtual machine to load.

Instead, I assume the virtual machine loads a Kernel-E parse-tree itself, 
rather than an encoding of such, in which case we need to provide a mapping 
from parse-nodes to original source spans.  I only see three ways to provide 
such a mapping:

1) encode it in the parse-nodes.
2) separately provide a mapping from parse-node-identity to source-spans
3) separately provide a mapping from parse-tree navigation path (turning 
     directions) to source-span.

#2 requires that parse-nodes have identity, which conflicts with making them 

#3 (which I hadn't previously though of) is the equivalent for trees of 
using spans in text as the keys of a mapping.  It would work, but seems to 
be a lot of trouble.

The known problem with #1 is that it encodes information in the parse trees 
that don't correspond to a printed form.  In light of the previous 
discussion of E's expansion of quasi-expressions, and the tentative 
conclusion that text + span-to-span mapping is the right interface there, 
perhaps the right answer to this objection is to provide a parse-tree-printing 
operation that generates both text and a span-to-span mapping that maps from 
this printed text to the original sources.  

Then, to re-answer the last email, the decision about what stage the 
debugger should display as source becomes a decision about which map we 
throw away.

Were the loading interface to the virtual machine text rather than trees, it 
could be the pair of text and mapping that a Kernel-E tree would print.  
Parse-trees would no longer have "unprintable" information.

>You know, this has an interesting case.  Grammar constructs get generated into
>loops and conditionals, and to understand antlr's generated code you want to
>look at the generated code but know also what the grammar element that generated
>it was.  Hmm, this actually is probably best solved by comments instead of some
>weird nested line directive scheme.

Oh god, comments!  Another option!

Actually, using the previous quasi-parser example, I think we can disqualify 
comments for the same reason we disqualified #line-like directives.  It is 
unreasonably burdensome on the definition of each quasi-grammar that they must 
understand any one universal (among quasi-grammars) commenting convention.  
Better to put the mapping on the side, in a separate argument to the 
compiler, than to impose this burden on all quasi-grammars.  Making them all 
recognize ${<integer>} and such was bad enough.

How do you feel about the "which mapping do you throw away" approach for 
determining whether an ANTLR user debugs at the grammar level or the parser 
implementation level?

>Also interesting is that generating .class files would go against one of antlr's
>main philosophies--to generate human readable code.  Although I guess you should
>be able to generate .class files and .java files simultaneously if desired.

How do you like the suggestion, in earlier email, of generating Java source 
plus a Java-source-span to grammar-source-span mapping, and then giving the 
ANTLR user the option to use this mapping to post-process the *.class file?  
The decision about which level to debug at becomes the decision about 
whether to post-process the *.class file.

This approach requires no changes to or knowledge of the insides of any Java 
compilers.  Often, it can also be used with stock Java debuggers, without 
any changes or much knowledge.