[E-Lang] Concerning XML docs

Mark S. Miller markm@caplet.com
Wed, 19 Sep 2001 02:02:55 -0400


At 09:09 PM Tuesday 9/18/01, Chip Morningstar wrote:
>[...] I am not familiar with MinML and what its
>(or XML's) role in the E environment is supposed to be [...] What I don't understand is
>what [XML's] relationship to E is.[...]
>What I don't see is where the locus of data interchange is that these issues
>become relevant to us.

The relevance of XML in Jonathan's recent postings is purely for CapIDL.  If 
I recall, Jonathan cross posted because he knew that there were strong 
opinions about XML among us e-langers, as there proved to be.

As far as E goes, I had wanted to make my piece with XML from early on, 
precisely for the marketing/mindshare reasons Jonathan explains.  I 
initially went in naively thinking that XML can't be as horrible as it seems 
on first blush, and that surely an accommodation could be found.

To answer your question, I had wanted to use XML as the default universal 
parse tree representation to parse other data into, much as ANTLR uses it's 
AST class.  (ANTLR's ASTs resembles S-Expressions or Prolog term 
trees.)  http://www.erights.org/elang/grammar/quasi-xml.html explains how I 
wanted to fit it into E -- to use quasi-literal patterns and expressions to 
easily manipulate data in a Prolog/Perl/XSLT match-bind-substitute style.  
And to be able to do so for any data that could be quasi-parsed into a 
quasi-literal form of whatever my universal parse tree representation would be.

I had chosen Minimal-XML for this in order to strike a balance between 
sanity and politics.  Since then, two things have changed my mind.  1) The 
SML-DEV group http://groups.yahoo.com/group/sml-dev cannot be stirred into 
acting as a vocal independent standards group advocating Minimal-XML as an 
alternative.  I could go on about this, but I'll let my publicly archived 
arguments with them, and their more recent interests in discussing anything 
but Minimal-XML, speak for themselves.  2) XML Schema pushed me over the 
edge, and I think it will be rapidly pushing others as well.  This is no 
longer a standard to compromise with, but one to ignore.  

I'm planning instead to simply use ANTLR's ASTs, as that's the path of least 
resistance if I'm to turn ALTLR into a quasi-parser generator.

So the only remaining relevance XML has for E is that there's a lot of stuff 
in XML, and, until the backlash starts, this will get worse before it gets 
better.  Some of these projects will be ones we want to interoperate with in 
some form.  We know of one already -- CapIDL.  Fortunately, despite 
Jonathan's commitment to full XML support by CapIDL comments, E's use of 
CapIDL (needed to bind to EROS) can ignore those comments.  If we use 
Jonathan's parser to parse CapIDL and write out a representation of the 
parse tree, Jonathan agrees that this will be in a very narrow subset of 
XML, probably Minimal-XML + a little bit.  Alternatively, and probably 
better anyway, is to just parse it ourselves based on a common CapIDL yacc 
grammar.

But what about the fatal question: "Is E compatible with XML?"  The answer 
is, sure.  Just as E is compatible with JPEG.  It would be hard for a 
Turing-universal language not to be compatible with a data format.  Also, 
all the standard Java libraries for manipulating XML or JPEG should be 
invokable from E, given only that they don't violate capability discipline, 
as seems likely.

Actually, we may eventually do better than that.  There is probably already 
an ANTLR grammar for XML that parses XML into ANTLR ASTs.  Assuming we can 
turn that into a quasi-parser, then we could accept XML input, but use ANTLR 
ASTs to represent them rather than DOM trees.  In any case, this possibility 
no longer makes my pulse race, and no one should hold their breath.

So, since XML really has no remaining role in E per se, I'd like to make 
this our last cross posted message on this topic, except of course for 
messages that actually do have to do with E.  For messages that have to do 
with XML in general, CapIDL in general, or XML for CapIDL, let's take these 
exclusively to the CapIDL list.

But I'll leave you with the following.  Someone on the SML-DEV list asked 
"Anybody for writing what XML should have been?"  My reply 
http://groups.yahoo.com/group/sml-dev/message/4867 was:



It seems to me that there are three distinct kinds of jobs that have
been smushed together into XML.  Sometimes such merging of
functionality results in synergies, as when PL/1 smushed together
features from Fortran, Cobol, Algol, and Lisp, creating a mess.  But
a mess suggesting potential synergies between these elements that
inspired many clean descendents like C and Pascal.  In particular,
combining heap-allocated pointer structures from Lisp with
struct/record concept from (believe it or not) Cobol was very
powerful, and was one of the steps to objects. 

The best we can hope for from XML at this point is for it to become
the PL/1 of textual data representation.  By merging these functions,
perhaps someone will be inspired by some synergy I don't see and
create something both new and valuable.  Frankly, I doubt it.  I
think XML is simply an irredeemable incoherent mess, and the three
distinct jobs remain better done separately by the three distinct
tools that have traditionally done these jobs.  Though, I think, XML
can suggest some enhancements to these tools.  The three tools? 

1) Attributed text.  This is the job traditionally associated with a
number of text formats, including FrameMaker's, rtf, and others.
HTML has clearly taken over this world, and if one wants to be
compatible with something, HTML, not XML, is clearly the huge
installed base to be compatible with. 

2) S-Expressions.  As John McCarthy (creator of Lisp) says "XML is
just S-Expressions, only ten times as verbose".  (And, I'd add, about
one hundred times as complex.)  As I've said on this list before, if
you need to say you're "XML compatible", and there are many marketing
reasons to do this, Minimal-XML is exciting *because* it removes all
the extraneous crap from XML, leaving just the S-Expressions and the
compatibility.  The one cool thing XML does add to traditional
S-Expressions (that Minimal-XML, wisely at this point, leaves out) is
the notion of grammars over S-Expressions.  But S-Expressions in
ANTLR http://www.antlr.org/ do this tree grammar thing much better
than XML does, and does it over actual Lisp-like S-Expressions.  For
those that don't need XML compatibility, I recommend ANTLR. 

3) Object serialization.  Both Java and CORBA have created well known
binary serialization formats.  Java's is unnecessarily complex, and
has the problem that it's perceived to be language specific (it's
not). CORBA's understandably is crippled, being part of CORBA.
Worse, both are defined only as binary formats.  Key to XML's
marketing success is that it's a textual format, and one can
therefore use text in books as an example of the encoding.  The world
needs a good language-neutral flexible abstract object serialization
format with two concrete syntaxes: an efficient binary one, and a
readable/editable textual one.  Unlike XML, XML/SOAP, or YAML, it
should represent arbitrary graphs straightforwardly.  There should be
a full fidelity converter in each direction between these formats.  I
suspect such serialization systems already exist, and that some are
good, but none are yet widely known.  In today's world, where XML
compatibility is such a crushing issue, I doubt any will become
widely known.  But perhaps.



        Cheers,
        --MarkM