[E-Lang] stl-0.8.9k: Quasi-Minimal-XML
Mark S. Miller
Fri, 19 Jan 2001 22:15:32 -0800
reading this note.
We now have a very simple DOM tree for representing Minimal-XML, derived
from the w3c standard DOM tree, but with everything thrown out except the
parts relevant to representing Minimal-XML as an immutable tree. These are
represented by the class Node, and its two subclasses Element and Text.
These seem close to done to me. Once they are enhanced to support the
visitor pattern well, I intend to retire the E parse tree classes (the
implementors of ENode in
and instead use these DOM tree classes according to the Kernel-E encoding in
Much rougher are the classes for representing Minimal-XML quasi-literal
expressions and patterns. These are represented by QuasiContent and its
subclasses, with XMLQuasiParser being the quasi-parser that generates these
QuasiContent objects. Don't take the details here too seriously, I intend
to revise these substantially before they'll be worth examining in detail.
However, they's already able to do the "Example: Symbolic Differentiation in
XML Notation" in Step 1 of
http://www.erights.org/elang/grammar/quasi-xml.html . While I haven't
actually tried it (sorry), when eyeballing it I think the only problem
should be self-contained Elements like
that will need to be rewritten as
as the former isn't in Minimal-XML.
This whole exercise has made me question once again whether Minimal-XML is
too minimal. If the purpose is only politically correct s-expressions, then
I think Minimal-XML is just about exactly right. If the purpose is to be
able to process anyone else's XML content, then it doesn't work at all. Of
course, once we seek to do the latter, we're on a slippery slope to full
XML, which is more than I can stand. And the last thing I want is to roll
my own non-standard subset of XML.
Into this thankless situation pops Canonical-XML
http://www.w3.org/TR/2001/PR-xml-c14n-20010119 , which is what happens to
XML after a lot of the crap has been processed and reduced to a smaller
number of constructs, and canonicalization rules are applied to get rid of
differences that shouldn't make a difference. Coincidentally, its
motivation is security -- the canonicalization rules ensure that two XML
sources which one would think of as representing the same XML structure
canonicalize to the exact same sequence of characters, so that they can have
the same cryptographic hash. This is designed to support XML-Signature
http://www.w3.org/Signature/ , which solves a problem than CapCert will need
to solve anyway.
Although Canonical XML is still distastefully large, unlike XML, it's well
below my gag threshold. The important question which I don't have the
expertise to answer: Is the information thrown away by canonicalization
information needed by most of the content out there? As long as
Canonical-XML can represent much of the content E programmers will need to
manipulate, even if it can't represent some, it may well be the sweet spot
in this awful tradeoff space.