[e-cvs] cvs commit: e/doc/elang/quasi index.html like-ellipses.html

markm@eros.cs.jhu.edu markm@eros.cs.jhu.edu
Sat, 22 Dec 2001 23:49:31 -0500


markm       01/12/22 23:49:31

  Added:       doc/elang/quasi index.html like-ellipses.html
  Log:
  start on separate doc section on quasis

Revision  Changes    Path
1.1                  e/doc/elang/quasi/index.html

Index: index.html
===================================================================
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<!--last modified on Saturday, October 03, 1998 04:19 PM -->
 


 
Title Bar Title
 






 

ERights Home  
Prev x Next

Big Title


John McCarthy has said: "XML is just S-Expressions, only ten times as verbose." This comparison is too kind to XML. What XML, S-Expressions, Antlr ASTs (abstract syntax trees), and Prolog Term trees have in common is that they are simple(*) notations (or surface syntaxes) for simple(*) "practically universal" trees of symbols. These trees are "practically universal" in that many other forms of symbolic data can be mapped into any of these trees.

The clearest and most compelling case is the Antlr AST -- these are used to represent the result of parsing anything that can be parsed according to a BNF grammar. This demonstrates that one such universal data structures may have a great multitude of surface syntaxes.

Although the Antlr documentation also calls these S-Expressions, and although the Antlr notation for writing them looks like S-Expressions, semantically they are actually most similar to Prolog Term trees. Since Antlr is the system we most desire to interoperate with, as of this release we introduce E's Prolog-like Term trees, with conversions back and forth to Antlr ASTs.

In an E Term, a functor is a tag or a literal data element. A literal data element is either a literal character, integer, floating point number, or string, written the same way, and meaning the same thing, as these literal data elements do in E. (This is the same as they do in Java, except that integers are precision unlimited, and floating point numbers are always double precision.) A Term always consists of a functor and zero or more arguments. These arguments are its children, which transitively always form a finite tree. The leaves of the tree are simply Terms with no arguments.

 
 
 
1.1 e/doc/elang/quasi/like-ellipses.html Index: like-ellipses.html =================================================================== <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <!--last modified on Saturday, October 03, 1998 04:19 PM --> The Power of Scheme's Ellipses

 

ERights Home  
Prev x Next

The Power of Ellipses
Borrowed from Scheme


Like regular expressions, our quasi-literal term-tree patterns and expressions have quantifiers. These quanitifiers annotate the pattern or expression to its left.

  • '?' means optional -- zero or one occurrence.
  • '+' means one or more occurrences.
  • '*' means one or more occurrences.

These quantifiers normally just annotate patterns. Why do we also apply them to expressions? In order to incorporate the expressiveness of Scheme's "..." notation. This is best explained by transposing Scheme's example into term trees:

The text in the boxes below is from the Scheme FAQ.

5.6. How are ellipses (...) "counted" during macro expansion?

Section 4.3.2 of R5RS states

Pattern variables that occur in subpatterns followed by one or more instances of the identifier ... are allowed only in subtemplates that are followed by as many instances of .... They are replaced in the output by all of the subforms they match in the input, distributed as indicated. It is an error if the output cannot be built up as specified.

It is important to understand what is meant by "followed" in the above. Take for example the pattern
(a (b c ...) (d e ...) ...)
What matters is how many ellipsis structurally follow a variable rather than lexically. The variables a, b, c, d and e are followed by three, three, three, two and two ellipses respectively, but struturally they are followed by none, none, one, one and two ellipses respectively. The difference between the structural and lexical ellipsis counts arises because ellipses only apply to the pattern/template immediatetly preceeding them. Thus, the above pattern can supply the input for the template
((a c ...) (b d ...) (e ...) ...)
The structural ellipsis count of the template variables matches that of the pattern, whereas the lexical template count is completely different.

Likewise for term trees of course. For us, the corresponding example pattern might be

term`[[@a, @c*], [@b, @d*], [@e*]*]`

or, as an expression, with '$'s rather than '@'s. Note that the square brackes are just syntactic sugar for a term with "tuple" for a functor name. So the above is equivalent to:

term`tuple(tuple(@a, @c*), tuple(@b, @d*), tuple(@e*)*)`

 

One issue that is not addressed by the R5RS explanation of macro expansion is how ellipses templates are expanded when they combine variables that matched input of different size. The canonical example for this is

(define-syntax foo
  (syntax-rules ()
    ((foo (a ...) (b ...)) '((a b) ...))))
(foo (1 2) (3 4 5)) ;=> ?
In different Schemes this is either an error or produces the result '((1 3) (2 4)), i.e. the "oversized" matches are automatically shortend as needed.

When these variables are different sizes, since this is a likely indicator of a bug, and because its easily repaired when it's intentional, E considers this case to be an error.

? def term`foo([@a*],[@b*])` := term`foo([1,2],[3,4,5])`
# value: term`foo([1, 2],
#                 [3, 4, 5])`
 
? a
# value: [term`1`, term`2`]
 
? b
# value: [term`3`, term`4`, term`5`]
 
? term`[[$a,$b]*]`
# problem: Failed: Inconsistent shape: 2 vs 3

So let's shorten b and see what success looks like

? def b2 := b(0,2)
# value: [term`3`, term`4`]
 
? term`[[$a,$b2]*]`
# value: term`[[1, 3],
#              [2, 4]]`

But when the variable is too flat, so to speak (insufficient dimensions or rank), E takes the same permissive attitute as shown by the following Scheme example.

Some Schemes relax the above "matching ellipses count" requirement by allowing allowing template variables to be followed by at least as many ellipses as their corresponding pattern variables. The resulting expansion repeats the matched input. Thus

(define-syntax foo
  (syntax-rules ()
    ((foo (a ...) (b ...) ...) '(((a b) ...) ...))))
(foo (1 2) (3 4) (5 6 7)) ;=> '(((1 3) (1 4)) ((2 5) (2 6) (2 7)))
produces the desired result. Note however that in any ellipses sub-template there must be at least one template variable for which the ellipsis count is exactly the same as in the pattern, otherwise the macro expander would not be able to determine how many times to repeat the matched input of the template variables with ellipsis counts higher than in the pattern.

? def term`foo([@a*], [@b*]*)` := term`foo([1,2], [3,4], [5,6,7])`
# value: term`foo([1, 2],
#                 [3, 4],
#                 [5, 6, 7])`
 
? a
# value: [term`1`, term`2`]
 
? b
# value: [[term`3`, term`4`], [term`5`, term`6`, term`7`]]
 
? term`[[[$a,$b]*]*]`
# value: term`[[[1, 3],
#               [1, 4]],
#              [[2, 5],
#               [2, 6],
#               [2, 7]]]`

Since literal data is repeated as many times as necessary, variables that are too flat, like a above, are treated in a way that's intermediate between the treatment of literal data and the treatment of variables, like b above, of adequate dimensionality.

What are some other good examples of the power of Scheme's "..." that we should also use here?

Acknowledgements

Who should we acknowledge for Scheme's "..." system?

Matthias Radestock, the editor of the FAQ we quote from above.

Dean Tribble, who suggested incorporating the power of Scheme's "..." into the quasi-literal handing of Term trees.