[E-Lang] Concerning XML docs
Mark S. Miller
markm@caplet.com
Wed, 19 Sep 2001 02:02:55 -0400
At 09:09 PM Tuesday 9/18/01, Chip Morningstar wrote:
>[...] I am not familiar with MinML and what its
>(or XML's) role in the E environment is supposed to be [...] What I don't understand is
>what [XML's] relationship to E is.[...]
>What I don't see is where the locus of data interchange is that these issues
>become relevant to us.
The relevance of XML in Jonathan's recent postings is purely for CapIDL. If
I recall, Jonathan cross posted because he knew that there were strong
opinions about XML among us e-langers, as there proved to be.
As far as E goes, I had wanted to make my piece with XML from early on,
precisely for the marketing/mindshare reasons Jonathan explains. I
initially went in naively thinking that XML can't be as horrible as it seems
on first blush, and that surely an accommodation could be found.
To answer your question, I had wanted to use XML as the default universal
parse tree representation to parse other data into, much as ANTLR uses it's
AST class. (ANTLR's ASTs resembles S-Expressions or Prolog term
trees.) http://www.erights.org/elang/grammar/quasi-xml.html explains how I
wanted to fit it into E -- to use quasi-literal patterns and expressions to
easily manipulate data in a Prolog/Perl/XSLT match-bind-substitute style.
And to be able to do so for any data that could be quasi-parsed into a
quasi-literal form of whatever my universal parse tree representation would be.
I had chosen Minimal-XML for this in order to strike a balance between
sanity and politics. Since then, two things have changed my mind. 1) The
SML-DEV group http://groups.yahoo.com/group/sml-dev cannot be stirred into
acting as a vocal independent standards group advocating Minimal-XML as an
alternative. I could go on about this, but I'll let my publicly archived
arguments with them, and their more recent interests in discussing anything
but Minimal-XML, speak for themselves. 2) XML Schema pushed me over the
edge, and I think it will be rapidly pushing others as well. This is no
longer a standard to compromise with, but one to ignore.
I'm planning instead to simply use ANTLR's ASTs, as that's the path of least
resistance if I'm to turn ALTLR into a quasi-parser generator.
So the only remaining relevance XML has for E is that there's a lot of stuff
in XML, and, until the backlash starts, this will get worse before it gets
better. Some of these projects will be ones we want to interoperate with in
some form. We know of one already -- CapIDL. Fortunately, despite
Jonathan's commitment to full XML support by CapIDL comments, E's use of
CapIDL (needed to bind to EROS) can ignore those comments. If we use
Jonathan's parser to parse CapIDL and write out a representation of the
parse tree, Jonathan agrees that this will be in a very narrow subset of
XML, probably Minimal-XML + a little bit. Alternatively, and probably
better anyway, is to just parse it ourselves based on a common CapIDL yacc
grammar.
But what about the fatal question: "Is E compatible with XML?" The answer
is, sure. Just as E is compatible with JPEG. It would be hard for a
Turing-universal language not to be compatible with a data format. Also,
all the standard Java libraries for manipulating XML or JPEG should be
invokable from E, given only that they don't violate capability discipline,
as seems likely.
Actually, we may eventually do better than that. There is probably already
an ANTLR grammar for XML that parses XML into ANTLR ASTs. Assuming we can
turn that into a quasi-parser, then we could accept XML input, but use ANTLR
ASTs to represent them rather than DOM trees. In any case, this possibility
no longer makes my pulse race, and no one should hold their breath.
So, since XML really has no remaining role in E per se, I'd like to make
this our last cross posted message on this topic, except of course for
messages that actually do have to do with E. For messages that have to do
with XML in general, CapIDL in general, or XML for CapIDL, let's take these
exclusively to the CapIDL list.
But I'll leave you with the following. Someone on the SML-DEV list asked
"Anybody for writing what XML should have been?" My reply
http://groups.yahoo.com/group/sml-dev/message/4867 was:
It seems to me that there are three distinct kinds of jobs that have
been smushed together into XML. Sometimes such merging of
functionality results in synergies, as when PL/1 smushed together
features from Fortran, Cobol, Algol, and Lisp, creating a mess. But
a mess suggesting potential synergies between these elements that
inspired many clean descendents like C and Pascal. In particular,
combining heap-allocated pointer structures from Lisp with
struct/record concept from (believe it or not) Cobol was very
powerful, and was one of the steps to objects.
The best we can hope for from XML at this point is for it to become
the PL/1 of textual data representation. By merging these functions,
perhaps someone will be inspired by some synergy I don't see and
create something both new and valuable. Frankly, I doubt it. I
think XML is simply an irredeemable incoherent mess, and the three
distinct jobs remain better done separately by the three distinct
tools that have traditionally done these jobs. Though, I think, XML
can suggest some enhancements to these tools. The three tools?
1) Attributed text. This is the job traditionally associated with a
number of text formats, including FrameMaker's, rtf, and others.
HTML has clearly taken over this world, and if one wants to be
compatible with something, HTML, not XML, is clearly the huge
installed base to be compatible with.
2) S-Expressions. As John McCarthy (creator of Lisp) says "XML is
just S-Expressions, only ten times as verbose". (And, I'd add, about
one hundred times as complex.) As I've said on this list before, if
you need to say you're "XML compatible", and there are many marketing
reasons to do this, Minimal-XML is exciting *because* it removes all
the extraneous crap from XML, leaving just the S-Expressions and the
compatibility. The one cool thing XML does add to traditional
S-Expressions (that Minimal-XML, wisely at this point, leaves out) is
the notion of grammars over S-Expressions. But S-Expressions in
ANTLR http://www.antlr.org/ do this tree grammar thing much better
than XML does, and does it over actual Lisp-like S-Expressions. For
those that don't need XML compatibility, I recommend ANTLR.
3) Object serialization. Both Java and CORBA have created well known
binary serialization formats. Java's is unnecessarily complex, and
has the problem that it's perceived to be language specific (it's
not). CORBA's understandably is crippled, being part of CORBA.
Worse, both are defined only as binary formats. Key to XML's
marketing success is that it's a textual format, and one can
therefore use text in books as an example of the encoding. The world
needs a good language-neutral flexible abstract object serialization
format with two concrete syntaxes: an efficient binary one, and a
readable/editable textual one. Unlike XML, XML/SOAP, or YAML, it
should represent arbitrary graphs straightforwardly. There should be
a full fidelity converter in each direction between these formats. I
suspect such serialization systems already exist, and that some are
good, but none are yet widely known. In today's world, where XML
compatibility is such a crushing issue, I doubt any will become
widely known. But perhaps.
Cheers,
--MarkM