[e-lang] XML library: requirements gathering
Kevin Reid
kpreid at mac.com
Sat Jan 16 17:11:01 PST 2010
Thomas Leonard:
I'm sketching out an XML library. I've got it parsing document
fragments inside an XML quasiliteral (xml`<a>foo</a> <b/> c`). I'm
also experimenting with providing XPath as the primary means of
descending into trees.
I'm currently implementing it by wrapping DOM trees with an immutable
interface, as this seems like both the simplest path and one which
minimizes the amount of (currently slow) E code executed in the high-
repeat-count paths.
I don't yet have quasi value holes or pattern holes, so you can't use
`` syntax to construct or to pattern-match XML, and there also aren't
any methods to actually get text content out of the tree.
I chose not to go the TermL-XML-embedding path because I would have to
write much additional code to make it as *accurate* as I want this
library to be. However, in the future I imagine the internal
representation of this library being replaced with Term-trees and the
objects being wrappers around Terms instead of around DOM.
Before I proceed further, I think we should construct a list of design
goals, particularly your immediate requirements, so as to make sure
this library becomes useful.
Here's the list I sort of have in mind:
* A data type representing immutable (sub-)trees of XML documents. The
tree should preserve all information in the XML Infoset.
* XML fragments can be written as quasiliterals in the program.
* These fragments can have quasi-value-holes so as to compose XML
documents. That is:
def foo := xml`<a/>`
def bar := xml`<b>$foo</b>`
results in bar having the value
xml`<b><a/></b>`
.
* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.
* There are means to traverse and pattern-match XML trees. Currently I
have two plans in mind for this:
1. XPath expressions can be used as subscripts. Example:
? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
# value: [xml`xyz`, xml`bar`]
(This is a working-right-now example.)
2. Pattern matching, as offered currently by term-trees:
def xml`<input type="text" name="@name" value="@value">` := elem
(However, pattern-matching style raises issues of adding syntax and
semantics for repetitions, as well as don't-care vs. strict matching
of additional attributes, elements, and text.)
These two styles can be usefully combined:
for xml`<html:input type="text" name="@name" value="@value">`
in form[xpath`//html:input`] {
map[name] := value
}
* An XML fragment consisting solely of text should coerce to a String
and vice versa.
* There are straightforward, text-encoding-correct ways to read and
write XML documents (that is, convert between XML trees and strings,
byte arrays, character streams, and binary streams).
All of what I've listed so far is either already implemented or
seeming reasonably straightforward (given that we're running in Java)
except for (a) implementing quasi-holes (without writing a whole new
augmented-XML parser) and (b) the pattern matching facility, which
would be a good bit of nontrivial from-scratch design and code.
So, tell me what *you* think you need.
--
Kevin Reid <http://switchb.org/kpreid/>
More information about the e-lang
mailing list