[E-Lang] Amusement: proof that XML and lisp forms are interchangeable

Jonathan S. Shapiro shap@eros-os.org
Fri, 21 Sep 2001 18:27:06 -0400


This is purely for your amusement, and not an attempt at anything serious.
If you do not have a strong stomach, do not read on. The idea came up in a
discussion with David Braun, and I couldn't resist giving it a try.

The enclosed XSLT transformer will take an arbitrary well-formed XML input
document and turn it into an arbitrary (if ugly, but its a quick hack)
lisp-style list that can be processed using modern processing tools, such as
any scheme implementation after about 1974. Given this initial
transformation, all of the current processing tools provided by W3C can be
implemented by a better than average undergraduate in roughly one (caffeine
assisted) weekend. Proof of this assertion is left as an exercise for the
student.

It's been a long time since I programmed in scheme, so it's conceivable that
I missed some necessary escaping -- like the XML ':' namespace separator,
which has significance in scheme atoms.

Since entities in the XML input are clobbered by the time the transformer
gets them, there is little you can do about those, but if you know the set
of entities in advance you can modify the transformer to turn them into
elements using an inline doctype subset and thus capture them. For entities
undefined (due to non-inclusion of the DTD) a straightforward further string
transformation hack could be used to extract them as lisp forms.

shap

<?xml version="1.0" encoding="UTF-8"?>
<!--
 Transformer from the test.dtd input to well-formed HTML..
-->

<!DOCTYPE xsl:stylesheet [
  <!ENTITY nbsp " ">
]>

<xsl:stylesheet
  version ="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/TR/REC-html40">

  <xsl:output method="text" indent="yes"/>

  <!-- The cure for XML in one transform: -->

  <xsl:template match="/">
    <xsl:text>'(</xsl:text>
    <xsl:apply-templates/>
    <xsl:text>)</xsl:text>
  </xsl:template>

  <xsl:template match="*">
    <xsl:text>(elem </xsl:text>
    <xsl:value-of select="name()"/>
    <xsl:text> </xsl:text>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates/>
    <xsl:text>)</xsl:text>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:text>(attr </xsl:text>
    <xsl:value-of select="name()"/>
    <xsl:text> </xsl:text>
    <xsl:apply-templates/>
    <xsl:text>)</xsl:text>
  </xsl:template>

  <!-- Following piece of incredible ugliness outputs an XML text node
       as a quoted scheme string -->
  <xsl:template match="text()" priority="2">
    <xsl:text>(text "</xsl:text>
    <xsl:call-template name="quote-the-text">
      <xsl:with-param name="text" select="."/>
    </xsl:call-template>
    <xsl:text>")</xsl:text>
  </xsl:template>

  <xsl:template match="text()" name="quote-the-text">
    <xsl:variable name="dquote" value='"'/>
    <xsl:param name="text" select="."/>
    <xsl:choose>
      <!-- First need to escape embedded backslash characters, as
           these are the scheme escape character -->
      <xsl:when test="contains($text, '\')">
        <!-- recurse on the first part to process the " characters too -->
        <xsl:call-template name="quote-the-text">
          <xsl:with-param name="string" select="substring-before($text,
'\')"/>
        </xsl:call-template>
        <!-- output escaped backslash character -->
        <xsl:text>\\</xsl:text>
        <!-- recurse on the second part -->
        <xsl:call-template name="quote-the-text">
          <xsl:with-param name="string" select="substring-after($text,
'\')"/>
        </xsl:call-template>
      </xsl:when>

      <!-- If text contains the " character, escape it with a backslash.
           Note that if the text makes it this far into the choose, we
           know that it does not contain a \ character, and we need not
           recursively process it for one. -->
      <xsl:when test="contains($text, '$dquote')">
        <!-- output the part up to the quote character -->
        <xsl:value-of select="substring-before($text, '$dquote')" />
        <!-- output escaped quote character -->
        <xsl:text>\"</xsl:text>
        <xsl:call-template name="quote-the-text">
          <xsl:with-param name="string" select="substring-after($text,
'$dquote')"/>
        </xsl:call-template>
      </xsl:when>

      <xsl:otherwise>
        <xsl:value-of select="." />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>