[E-Lang] Minimal-XML (was: Draft Kernel-E DTD & Sketch of translationto debuggable Java)

Jonathan S. Shapiro shap@eros-os.org
Mon, 2 Oct 2000 08:46:02 -0400


> > > Does this mean you have found something readable by the XML folks that
> > > explains these distinctions?  And that you found their distinctions
> > > meaningful?
> >
> >That's a joke, right?
>
> No it's not.  But the fact that you thought it might be indicates I've
been
> reading the wrong things.

Actually, it meant that I agreed with you completely. There is a terrible
lack of good documentation in the community. As an aside, I'ld note that
this is why DSSSL died in favor of CSS even though DSSSL is by far the more
powerful tool. The XML community has the "it shouldn't be a programming
language" bug in a big way.

> I have found the following "slice" useful:
> >
> >     1. tags describe the semantics of the information
> >     2. attributes describe the format of the information.
>
> A wonderful example.  I've never before come across that simple &
clarifying
> statement.  Where did you find it?

It's my own.

Here's another that just occurred to me. This may be less useful, but what
the heck:

Tags describe syntax. Think of the tags as encoding the grammar and the
lexemes of the language being written. This should get you most of the way
there for data-oriented XML. You then use the attributes for checking, for
uniq IDs, etc.

Actually, its blurrier to understand in the context of document
applications. In that domain, I find that it helps to imagine that I am a
cruddy language designer who does not understand the power of context, and
therefore felt compelled to embed a lot of semantics in the grammer.

Finally, one historical note may help in deciphering things. A lot of the
earlier work, such as the Davenport Group DTD (which later became the
DocBook DTD) was done under the assumptions that all of this stuff would be
written and read by programs (not people), and was designed by people who
didn't really understand about parenthesization in parsers. An example is
that DocBook specifies both

    <s2>    for a level-2 section header
    <s2Heading>    heading for same

The DTD requires that <s2Heading> appear as the first element of an <s2> if
it is present [aside: don't hold me to the tag names -- I'm replaying the
design issue from memory]. Thus, it would be completely unambiguous and
easier to use if there were simply a <heading> tag and one wrote:

    <s2>
        <heading>Some title</heading>
        ... content of section ...
        <s3>
            <heading>third level section</heading>
            ....
        </s3>
    </s2>

I have heard an argument that the more explicit tags somehow simplified
parsing. Some probing revealed that the ever-so-insistent designer didn't
understand how to do tokenization, which didn't really surprise me after
seeing the design.

Note that XML forces explicit nesting here, which is (IMHO) one of its few
significant flaws.

shap