[e-lang] Somewhat offtopic: Extensible Term Language
Mark S. Miller
markm at cs.jhu.edu
Sun Feb 5 22:29:19 EST 2006
Constantine Plotnikov wrote:
> As I have written long time ago, I’m currently working on open
> source project that has a goal to create a language definition
> framework that can be used as textual DSL construction kit.
> The framework is currently named Extensible Term Language
> (ETL).
Hi Constantine,
Given what you're working on, you may interested in E's draft schema language
and its two surface syntaxes.
Below, I include first the schema.schema file, which defines the tree-grammar
of the ASTs of E's schema language. This language has two surface syntaxes:
term.y defines the surface syntax that schema.schema itself is written in, and
grammar.y defines a surface syntax suitable for expressing grammars over
sequences (see comments below).
Following this file, I show a E shell session in which I parse schema.schema
into a term-tree which represents its AST, and which conforms to the
tree-grammar it specifies.
I wrote all this ages ago, with the fantasy that someday someone would build a
parser generator to generate parsers from these grammar descriptions. I'm
ecstatic to report that Dean Tribble is now making progress on a Packrat-like
parser generator, written in E and generating E, able to process a useful
subset of the schema language below. Hopefully, this will enable E to
self-host its own parser. Eventually it should be practical to use this
language to generate quasi-parsers as well.
Contents of src/bin/resources/quasiliteral/schema/schema.schema
------------------------------------------------------------------------------
# Copyright 2004 Hewlett Packard, Inc. under the terms of the MIT X license
# found at http://www.opensource.org/licenses/mit-license.html ...............
# ? def makeSchema := <import:org.quasiliteral.schema.makeSchema>
# ? def <schema> := <resource:org/quasiliteral/schema/>
# ? makeSchema.parseSchema(<schema:schema.schema>.getTwine())
# The schema corresponding to term.y as processed by the quasiMetaBuilder.
# <p>
# This describes AST info describing the result of parsing input in the
# grammar defined by term.y. Various other interesting schemas are subsets
# of the schema presented here:
# <p>
# Schemas themselves, such as this one, are written in the subset of this
# schema without <tt>action</tt> or <tt><hole></tt>.
# <p>
# Most schemas are written in the context-free subset of that, without
# <tt>firstChoice</tt>, <tt>not</tt>, or <tt>interleave</tt>. <ul>
# <li><tt>firstChoice</tt> is <i>prioritized choice</i> from Parsing
# Expression Grammars.
# <li><tt>not</tt> forms a <i>syntactic predicate</i> from Parsing Expression
# Grammars.
# <li><tt>interleave</tt> is an unordered analog of <tt>seq</tt>, from
# Relax-NG
# </ul>
# <p>
# Grammars are the subset of schemas meant for processing sequences of symbols
# rather than trees of symbols. They can be written in the language of
# grammar.y. When processed by the quasiMetaBuilder, they are described by the
# subset of this schema where <tt><term> ::= <functor>;</tt>.
# The syntax of the grammar.y language is essentially the corresponding
# subset of the term.y language, with the additional changes that it uses
# juxtaposition where term.y uses ',', and that juxtaposition binds much more
# tightly than ','.
# <p>
# Actual grammar definitions may use holes and actions to express the
# transformation of a low-level syntax into "semantics", i.e., a high level
# syntax.
# <p>
# The actual output of the quasiMetaBuilder is a term tree, which is itself
# in a small subset of the term.y language, and therefore corresponds to a
# small subset of this schema, which we list separately in term.schema.
# <p>
# The language in which term-tree quasiliterals are expressed is the variant
# of this grammar whose start symbol is <tt><rhs></tt> rather than
# <tt><schema></tt>. It therefore does not include
# <tt><schema></tt>, <tt><production></tt>, or
# <tt><lhs></tt>, since they are not reachable starting at
# <tt><rhs></tt>. Of the reachable elements, some are not yet
# implemented (such as <tt>interleave</tt>), and may never be.
# @author Mark S. Miller
<schema> ::= schema(<production>+);
<production> ::= production(<lhs>, <rhs>); # <lhs> ::= <rhs>;
<lhs> ::= tag(.String.);
<rhs> ::= <term>
| onlyChoice(<rhs>, <rhs>+) # <rhs> | <rhs> | ...
| firstChoice(<rhs>, <rhs>+) # <rhs> / <rhs> / ...
| not(<rhs>) # !<rhs>
| optional(<rhs>) # <rhs>?
| oneOrMore(<rhs>) # <rhs>+
| zeroOrMore(<rhs>) # <rhs>*
| any # .
| interleave(<rhs>, <rhs>+) # <rhs> & <rhs> & ...
| seq(<rhs>, <rhs>+) # <rhs> , <rhs> , ...
| action(<rhs>, <rhs>) # <rhs> -> <rhs>
| empty; # ()
<term> ::= term(<functor>, <rhs>) # <functor>(<rhs>)
| <functor>;
<functor> ::= tag(.String.) # foo
| tag(.String., <hole>) # foo at x
| <hole> # @x
| data(<literal>) # "foo"
| range(<literal>, <literal>); # 'a'..'z'
<hole> ::= dollarHole(.int. | .String.) # $x | ${3}
| atHole(.int. | .String.); # @x | @{3}
<literal> ::= .char. | .int. | .float64. | .String.;
-----------------------------------------------------------------------
? def makeSchema := <import:org.quasiliteral.schema.makeSchema>
# value: <makeSchema>
? def <schema> := <resource:org/quasiliteral/schema/>
? makeSchema.parseSchema(<schema:schema.schema>.getTwine())
# value: term`schema(production(tag("<schema>"),
# term(tag("schema"),
# oneOrMore(tag("<production>")))),
# production(tag("<production>"),
# term(tag("production"),
# seq(tag("<lhs>"),
# tag("<rhs>")))),
# production(tag("<lhs>"),
# term(tag("tag"),
# tag(".String."))),
# production(tag("<rhs>"),
# onlyChoice(tag("<term>"),
# term(tag("onlyChoice"),
# seq(tag("<rhs>"),
# oneOrMore(tag("<rhs>")))),
# term(tag("firstChoice"),
# seq(tag("<rhs>"),
# oneOrMore(tag("<rhs>")))),
# term(tag("not"),
# tag("<rhs>")),
# term(tag("optional"),
# tag("<rhs>")),
# term(tag("oneOrMore"),
# tag("<rhs>")),
# term(tag("zeroOrMore"),
# tag("<rhs>")),
# tag("any"),
# term(tag("interleave"),
# seq(tag("<rhs>"),
# oneOrMore(tag("<rhs>")))),
# term(tag("seq"),
# seq(tag("<rhs>"),
# oneOrMore(tag("<rhs>")))),
# term(tag("action"),
# seq(tag("<rhs>"),
# tag("<rhs>"))),
# tag("empty"))),
# production(tag("<term>"),
# onlyChoice(term(tag("term"),
# seq(tag("<functor>"),
# tag("<rhs>"))),
# tag("<functor>"))),
# production(tag("<functor>"),
# onlyChoice(term(tag("tag"),
# tag(".String.")),
# term(tag("tag"),
# seq(tag(".String."),
# tag("<hole>"))),
# tag("<hole>"),
# term(tag("data"),
# tag("<literal>")),
# term(tag("range"),
# seq(tag("<literal>"),
# tag("<literal>"))))),
# production(tag("<hole>"),
# onlyChoice(term(tag("dollarHole"),
# onlyChoice(tag(".int."),
# tag(".String."))),
# term(tag("atHole"),
# onlyChoice(tag(".int."),
# tag(".String."))))),
# production(tag("<literal>"),
# onlyChoice(tag(".char."),
# tag(".int."),
# tag(".float64."),
# tag(".String."))))`
--
Text by me above is hereby placed in the public domain
Cheers,
--MarkM
More information about the e-lang
mailing list