[E-Lang] stl-0.8.9k: Quasi-Minimal-XML

Mark S. Miller markm@caplet.com
Fri, 19 Jan 2001 22:15:32 -0800


Look at 
http://www.erights.org/javadoc/org/erights/xml/qdom/package-tree.html while 
reading this note.

We now have a very simple DOM tree for representing Minimal-XML, derived 
from the w3c standard DOM tree, but with everything thrown out except the 
parts relevant to representing Minimal-XML as an immutable tree.  These are 
represented by the class Node, and its two subclasses Element and Text.  
These seem close to done to me.  Once they are enhanced to support the 
visitor pattern well, I intend to retire the E parse tree classes (the 
implementors of ENode in 
http://www.erights.org/javadoc/org/erights/e/elang/evm/package-tree.html ), 
and instead use these DOM tree classes according to the Kernel-E encoding in 
http://www.erights.org/elang/kernel/kernel-e-0.8.9e.dtd .

Much rougher are the classes for representing Minimal-XML quasi-literal 
expressions and patterns.  These are represented by QuasiContent and its 
subclasses, with XMLQuasiParser being the quasi-parser that generates these 
QuasiContent objects.  Don't take the details here too seriously, I intend 
to revise these substantially before they'll be worth examining in detail.  
However, they's already able to do the "Example: Symbolic Differentiation in 
XML Notation" in Step 1 of 
http://www.erights.org/elang/grammar/quasi-xml.html .  While I haven't 
actually tried it (sorry), when eyeballing it I think the only problem 
should be self-contained Elements like

    <power/>

that will need to be rewritten as

    <power></power>

as the former isn't in Minimal-XML.


This whole exercise has made me question once again whether Minimal-XML is 
too minimal.  If the purpose is only politically correct s-expressions, then 
I think Minimal-XML is just about exactly right.  If the purpose is to be 
able to process anyone else's XML content, then it doesn't work at all.  Of 
course, once we seek to do the latter, we're on a slippery slope to full 
XML, which is more than I can stand.  And the last thing I want is to roll 
my own non-standard subset of XML.

Into this thankless situation pops Canonical-XML 
http://www.w3.org/TR/2001/PR-xml-c14n-20010119 , which is what happens to 
XML after a lot of the crap has been processed and reduced to a smaller 
number of constructs, and canonicalization rules are applied to get rid of 
differences that shouldn't make a difference.  Coincidentally, its 
motivation is security -- the canonicalization rules ensure that two XML 
sources which one would think of as representing the same XML structure 
canonicalize to the exact same sequence of characters, so that they can have 
the same cryptographic hash. This is designed to support XML-Signature 
http://www.w3.org/Signature/ , which solves a problem than CapCert will need 
to solve anyway.

Although Canonical XML is still distastefully large, unlike XML, it's well 
below my gag threshold.  The important question which I don't have the 
expertise to answer: Is the information thrown away by canonicalization 
information needed by most of the content out there?  As long as 
Canonical-XML can represent much of the content E programmers will need to 
manipulate, even if it can't represent some, it may well be the sweet spot 
in this awful tradeoff space.


        Cheers,
        --MarkM