[e-lang]
Simplifying abstract syntaxes (was: On kernel-E, operators,
and properties)
Mark Miller
markm at cs.jhu.edu
Tue Jan 4 01:20:51 EST 2005
Mark Miller wrote:
> By way of context, the E abstract syntax allows any arbitrary string as a
> message selector. When the message selector is an E identifier, the E
> concrete
> syntax allows the selector to be written without the quotes. However, an E
> language processor that starts with ASTs would not need to know or care
> about,
> for example, the Unicode tables saying what is an identifier character.
At http://www.eros-os.org/pipermail/e-lang/2004-April/009821.html
Dean Tribble wrote:
> I'm learning C# and noticed that it has a similar but more
> general mechanism: @"blah blah" is the identifier <blah blah> and can be
> used anywhere an identifier can be used. This provides the ability to
> access libraries from other languages, etc.
The particular choice above doesn't work for us, because of our conflicting
use of '@'. But yes, I'd like to remove such distinction from the E abstract
syntax everywhere, and from the Term tree syntax as well. Proposal: Instead of
saying that an E variable-name may only be an identifier, define it as
<variable-name> ::= <ident>
| '::' <ident>
| '::' <literal-string>
;
As of 0.8.33p, these productions can be enabled by
pragma.enable("noun-string")'. The reason for using '::' is that the
experimental property access syntax already uses '::<property-name>', where
<property-name> can be an identifier or a literal string, for property access.
As has previously been discussed, this expands as shown below:
? pragma.enable("dot-props")
? interp::expand := true
# value: true
? interp::expand := false
# expansion: interp.__getPropertySlot("expand").setValue(\
# def ares__3 := false)
# ares__3
# value: false
ignoring the return value, and given that interp doesn't override the Miranda
__getPropertySlot/1 method, this expansion is equivalent to
interp.setExpand(false)
So, the intuition is to think of '::' as a naming-path separator. An initial
'::' begins a naming path, and an initial name on a naming path is always a
variable name in the current scope. Even if we never accept the dot-props
suggestion into the language definition, the '::' syntax for introducing
non-identifier variable-names isn't much worse than anything else I could
think of.
A silly example of doing arithmetic in a more Scheme-like style:
? pragma.enable("noun-string")
? def ::"+"(a, b) :any { return a + b }
# value: <+>
? ::"+"(3,4)
# value: 7
The current Term-tree abstract grammar, or "infoset" as w3c folks like to say,
is documented at <http://www.erights.org/elang/quasi/terms/term-spec.html>. A
"Tag" (<http://www.erights.org/elang/quasi/terms/term-spec.html#Tag>) is
conceptually a list of segments, where the current definition of segment
depends on Unicode tables in a fashion similar to the way that the definition
of Java and current E identifiers do. I propose the same kind of fix as above:
* The abstract syntax for segment should be an arbitrary string of Unicode
characters. A Tag would then be a list of arbitrary strings. The abstract
Term-tree syntax would then be independent of complex Unicode distinctions.
* The concrete syntax would depend on Unicode character tables so that only a
segment that was an identifier, for some suitable notion of identifier, could
be written without quotes.
* For consistency with E, we go to '::' rather than ':' as the segment separator.
* To distinguish a Tag from a literal String, an initial segment, if it's
quoted, must be preceded by a '::'. In terms of the current grammar, perhaps
<Tag> ::= (<ident> | <special>) ('::' <segment>)*
| '::' <segment> ('::' <segment>)*
;
<segment> ::= <ident> | <special> | <String> ;
This would give us the opportunity to revive an appealing old proposal:
Quasi-JSON back from the dead
At http://www.eros-os.org/pipermail/e-lang/2004-September/010074.html
Mark Miller wrote:
> Note: I hadn't realized till writing this response that the term-tree
> grammar is already so close to accepting JSON as a subset. [...]
> if I changed from "=" to ":" [...] then, ignoring [other]
> annoying Unicode issues, JSON would indeed be a syntactic subset of the
> term-tree language. [...] If we did this, then
> we could probably dispense with creating a separate JSON quasi-parser.
> [...] If no one objects, the next release of E
>
> * will accept either '=' or ':' in the term-tree grammar and quasi-grammer as
> synonyms,
> * '=' will be deprecated,
> * the default pretty printer will be changed to print using ':' rather
> than '='.
>
> Leaving aside the annoying Unicode issues, E's term-trees will then be a
> proper superset of JSON, and no separate parser will be needed.
>
> The previous examples will then work, with ':' rather than '='. This means
> you'll be able to process JSON data using quasi-literal JSON expressions and
> patterns immediately.
>
> In some later release, I hope to retire the soon-to-be-deprecated '='.
At http://www.eros-os.org/pipermail/e-lang/2004-September/010077.html
Kevin Reid wrote:
> I object.
>
> Currently, term`x:y` is a term with the tag <x:y>. If this syntax
> change were introduced, then term`x: y` would be surprisingly different
> from term`x:y` (currently, the former is a syntax error).
>
> Disallowing ':' in term tags would break the TermL embedding of XML as
> described in <http://www.erights.org/data/terml/embeddings.html>, which
> I have implemented in some of my projects.
>
> Specifically, something other than ':' would need to be used to
> represent separation between the XML namespace prefix or URI and the
> local name. This would make the embedding farther from XML, and also
> require choosing a character which can be used in TermL tags but is not
> a character which might appear in an XML Name. (An escaping syntax
> could be used instead, but it would be far more complex than the
> current embedding.)
So this proposal does constitute a form of escape syntax. In addition, uri
strings as segments would always need to be quoted.
So how objectionable is it?
--
Text by me above is hereby placed in the public domain
Cheers,
--MarkM
More information about the e-lang
mailing list