[e-lang] Somewhat offtopic: Extensible Term Language

Mark S. Miller markm at cs.jhu.edu
Sun Feb 5 22:29:19 EST 2006


Constantine Plotnikov wrote:
> As I have written long time ago, I’m currently working on open
> source project that has a goal to create a language definition
> framework that can be used as textual DSL construction kit.
> The framework is currently named Extensible Term Language
> (ETL).


Hi Constantine,

Given what you're working on, you may interested in E's draft schema language 
and its two surface syntaxes.

Below, I include first the schema.schema file, which defines the tree-grammar 
of the ASTs of E's schema language. This language has two surface syntaxes: 
term.y defines the surface syntax that schema.schema itself is written in, and 
grammar.y defines a surface syntax suitable for expressing grammars over 
sequences (see comments below).
Following this file, I show a E shell session in which I parse schema.schema 
into a term-tree which represents its AST, and which conforms to the 
tree-grammar it specifies.

I wrote all this ages ago, with the fantasy that someday someone would build a 
parser generator to generate parsers from these grammar descriptions. I'm 
ecstatic to report that Dean Tribble is now making progress on a Packrat-like 
parser generator, written in E and generating E, able to process a useful 
subset of the schema language below. Hopefully, this will enable E to 
self-host its own parser. Eventually it should be practical to use this 
language to generate quasi-parsers as well.


Contents of src/bin/resources/quasiliteral/schema/schema.schema
------------------------------------------------------------------------------
# Copyright 2004 Hewlett Packard, Inc. under the terms of the MIT X license
# found at http://www.opensource.org/licenses/mit-license.html ...............

# ? def makeSchema := <import:org.quasiliteral.schema.makeSchema>
# ? def <schema> := <resource:org/quasiliteral/schema/>
# ? makeSchema.parseSchema(<schema:schema.schema>.getTwine())

# The schema corresponding to term.y as processed by the quasiMetaBuilder.
# <p>
# This describes AST info describing the result of parsing input in the
# grammar defined by term.y. Various other interesting schemas are subsets
# of the schema presented here:
# <p>
# Schemas themselves, such as this one, are written in the subset of this
# schema without <tt>action</tt> or <tt>&lt;hole&gt;</tt>.
# <p>
# Most schemas are written in the context-free subset of that, without
# <tt>firstChoice</tt>, <tt>not</tt>, or <tt>interleave</tt>. <ul>
# <li><tt>firstChoice</tt> is <i>prioritized choice</i> from Parsing
#     Expression Grammars.
# <li><tt>not</tt> forms a <i>syntactic predicate</i> from Parsing Expression
#     Grammars.
# <li><tt>interleave</tt> is an unordered analog of <tt>seq</tt>, from
#     Relax-NG
# </ul>
# <p>
# Grammars are the subset of schemas meant for processing sequences of symbols
# rather than trees of symbols. They can be written in the language of
# grammar.y. When processed by the quasiMetaBuilder, they are described by the
# subset of this schema where <tt>&lt;term&gt; ::= &lt;functor&gt;;</tt>.
# The syntax of the grammar.y language is essentially the corresponding
# subset of the term.y language, with the additional changes that it uses
# juxtaposition where term.y uses ',', and that juxtaposition binds much more
# tightly than ','.
# <p>
# Actual grammar definitions may use holes and actions to express the
# transformation of a low-level syntax into "semantics", i.e., a high level
# syntax.
# <p>
# The actual output of the quasiMetaBuilder is a term tree, which is itself
# in a small subset of the term.y language, and therefore corresponds to a
# small subset of this schema, which we list separately in term.schema.
# <p>
# The language in which term-tree quasiliterals are expressed is the variant
# of this grammar whose start symbol is <tt>&lt;rhs&gt;</tt> rather than
# <tt>&lt;schema&gt;</tt>. It therefore does not include
# <tt>&lt;schema&gt;</tt>, <tt>&lt;production&gt;</tt>, or
# <tt>&lt;lhs&gt;</tt>, since they are not reachable starting at
# <tt>&lt;rhs&gt;</tt>. Of the reachable elements, some are not yet
# implemented (such as <tt>interleave</tt>), and may never be.

# @author Mark S. Miller

<schema>     ::= schema(<production>+);
<production> ::= production(<lhs>, <rhs>);      # <lhs> ::= <rhs>;
<lhs>        ::= tag(.String.);
<rhs>        ::= <term>
              |   onlyChoice(<rhs>, <rhs>+)      # <rhs> | <rhs> | ...
              |   firstChoice(<rhs>, <rhs>+)     # <rhs> / <rhs> / ...
              |   not(<rhs>)                     # !<rhs>
              |   optional(<rhs>)                # <rhs>?
              |   oneOrMore(<rhs>)               # <rhs>+
              |   zeroOrMore(<rhs>)              # <rhs>*
              |   any                            # .
              |   interleave(<rhs>, <rhs>+)      # <rhs> & <rhs> & ...
              |   seq(<rhs>, <rhs>+)             # <rhs> , <rhs> , ...
              |   action(<rhs>, <rhs>)           # <rhs> -> <rhs>
              |   empty;                         # ()
<term>       ::= term(<functor>, <rhs>)         # <functor>(<rhs>)
              |   <functor>;
<functor>    ::= tag(.String.)                  # foo
              |   tag(.String., <hole>)          # foo at x
              |   <hole>                         # @x
              |   data(<literal>)                # "foo"
              |   range(<literal>, <literal>);   # 'a'..'z'
<hole>       ::= dollarHole(.int. | .String.)   # $x | ${3}
              |   atHole(.int. | .String.);      # @x | @{3}
<literal>    ::= .char. | .int. | .float64. | .String.;
-----------------------------------------------------------------------


? def makeSchema := <import:org.quasiliteral.schema.makeSchema>
# value: <makeSchema>

? def <schema> := <resource:org/quasiliteral/schema/>

? makeSchema.parseSchema(<schema:schema.schema>.getTwine())
# value: term`schema(production(tag("<schema>"),
#                               term(tag("schema"),
#                                    oneOrMore(tag("<production>")))),
#                    production(tag("<production>"),
#                               term(tag("production"),
#                                    seq(tag("<lhs>"),
#                                        tag("<rhs>")))),
#                    production(tag("<lhs>"),
#                               term(tag("tag"),
#                                    tag(".String."))),
#                    production(tag("<rhs>"),
#                               onlyChoice(tag("<term>"),
#                                          term(tag("onlyChoice"),
#                                               seq(tag("<rhs>"),
#                                                   oneOrMore(tag("<rhs>")))),
#                                          term(tag("firstChoice"),
#                                               seq(tag("<rhs>"),
#                                                   oneOrMore(tag("<rhs>")))),
#                                          term(tag("not"),
#                                               tag("<rhs>")),
#                                          term(tag("optional"),
#                                               tag("<rhs>")),
#                                          term(tag("oneOrMore"),
#                                               tag("<rhs>")),
#                                          term(tag("zeroOrMore"),
#                                               tag("<rhs>")),
#                                          tag("any"),
#                                          term(tag("interleave"),
#                                               seq(tag("<rhs>"),
#                                                   oneOrMore(tag("<rhs>")))),
#                                          term(tag("seq"),
#                                               seq(tag("<rhs>"),
#                                                   oneOrMore(tag("<rhs>")))),
#                                          term(tag("action"),
#                                               seq(tag("<rhs>"),
#                                                   tag("<rhs>"))),
#                                          tag("empty"))),
#                    production(tag("<term>"),
#                               onlyChoice(term(tag("term"),
#                                               seq(tag("<functor>"),
#                                                   tag("<rhs>"))),
#                                          tag("<functor>"))),
#                    production(tag("<functor>"),
#                               onlyChoice(term(tag("tag"),
#                                               tag(".String.")),
#                                          term(tag("tag"),
#                                               seq(tag(".String."),
#                                                   tag("<hole>"))),
#                                          tag("<hole>"),
#                                          term(tag("data"),
#                                               tag("<literal>")),
#                                          term(tag("range"),
#                                               seq(tag("<literal>"),
#                                                   tag("<literal>"))))),
#                    production(tag("<hole>"),
#                               onlyChoice(term(tag("dollarHole"),
#                                               onlyChoice(tag(".int."),
#                                                          tag(".String."))),
#                                          term(tag("atHole"),
#                                               onlyChoice(tag(".int."),
#                                                         tag(".String."))))),
#                    production(tag("<literal>"),
#                               onlyChoice(tag(".char."),
#                                          tag(".int."),
#                                          tag(".float64."),
#                                          tag(".String."))))`


-- 
Text by me above is hereby placed in the public domain

     Cheers,
     --MarkM



More information about the e-lang mailing list