[e-lang] XML library: requirements gathering

Kevin Reid kpreid at mac.com
Sat Jan 16 17:11:01 PST 2010


Thomas Leonard:

I'm sketching out an XML library. I've got it parsing document  
fragments inside an XML quasiliteral (xml`<a>foo</a> <b/> c`). I'm  
also experimenting with providing XPath as the primary means of  
descending into trees.

I'm currently implementing it by wrapping DOM trees with an immutable  
interface, as this seems like both the simplest path and one which  
minimizes the amount of (currently slow) E code executed in the high- 
repeat-count paths.

I don't yet have quasi value holes or pattern holes, so you can't use  
`` syntax to construct or to pattern-match XML, and there also aren't  
any methods to actually get text content out of the tree.


I chose not to go the TermL-XML-embedding path because I would have to  
write much additional code to make it as *accurate* as I want this  
library to be. However, in the future I imagine the internal  
representation of this library being replaced with Term-trees and the  
objects being wrappers around Terms instead of around DOM.


Before I proceed further, I think we should construct a list of design  
goals, particularly your immediate requirements, so as to make sure  
this library becomes useful.

Here's the list I sort of have in mind:


* A data type representing immutable (sub-)trees of XML documents. The  
tree should preserve all information in the XML Infoset.

* XML fragments can be written as quasiliterals in the program.

* These fragments can have quasi-value-holes so as to compose XML  
documents. That is:
   def foo := xml`<a/>`
   def bar := xml`<b>$foo</b>`
results in bar having the value
   xml`<b><a/></b>`
.

* It is possible to construct a customized XML quasiparser with a  
given set of namespace declarations.

* There are means to traverse and pattern-match XML trees. Currently I  
have two plans in mind for this:
   1. XPath expressions can be used as subscripts. Example:
     ? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
     # value: [xml`xyz`, xml`bar`]
   (This is a working-right-now example.)

   2. Pattern matching, as offered currently by term-trees:
     def xml`<input type="text" name="@name" value="@value">` := elem

   (However, pattern-matching style raises issues of adding syntax and
   semantics for repetitions, as well as don't-care vs. strict matching
   of additional attributes, elements, and text.)

These two styles can be usefully combined:
   for xml`<html:input type="text" name="@name" value="@value">`
         in form[xpath`//html:input`] {
       map[name] := value
   }

* An XML fragment consisting solely of text should coerce to a String  
and vice versa.

* There are straightforward, text-encoding-correct ways to read and  
write XML documents (that is, convert between XML trees and strings,  
byte arrays, character streams, and binary streams).


All of what I've listed so far is either already implemented or  
seeming reasonably straightforward (given that we're running in Java)  
except for (a) implementing quasi-holes (without writing a whole new  
augmented-XML parser) and (b) the pattern matching facility, which  
would be a good bit of nontrivial from-scratch design and code.


So, tell me what *you* think you need.

-- 
Kevin Reid                                  <http://switchb.org/kpreid/>






More information about the e-lang mailing list