[e-lang] XML library: requirements gathering
Thomas Leonard
talex5 at gmail.com
Sun Jan 17 11:03:19 PST 2010
2010/1/17 Kevin Reid <kpreid at mac.com>:
> Thomas Leonard:
>
> I'm sketching out an XML library. I've got it parsing document
> fragments inside an XML quasiliteral (xml`<a>foo</a> <b/> c`). I'm
> also experimenting with providing XPath as the primary means of
> descending into trees.
>
> I'm currently implementing it by wrapping DOM trees with an immutable
> interface, as this seems like both the simplest path and one which
> minimizes the amount of (currently slow) E code executed in the high-
> repeat-count paths.
Makes sense. We'd probably want methods to get the underlying Document
(or a copy, more likely) so we can pass it to existing Java code and
wrap the results again.
[...]
> Here's the list I sort of have in mind:
>
> * A data type representing immutable (sub-)trees of XML documents. The
> tree should preserve all information in the XML Infoset.
>
> * XML fragments can be written as quasiliterals in the program.
>
> * These fragments can have quasi-value-holes so as to compose XML
> documents. That is:
> def foo := xml`<a/>`
> def bar := xml`<b>$foo</b>`
> results in bar having the value
> xml`<b><a/></b>`
> .
Sounds good. How do namespaces combine? Sometimes you have a lot of
child elements using the same namespace and it's handy if it ends up
as a single namespace declaration on the root element. Are you
planning to preserve prefixes in any way (or just auto-number them as
xmlns:n0, xmlns:n1, etc)? In the past I've used a scheme where we keep
prefix mappings as a hint when parsing the XML and use them if
possible when serialising, but combine multiple prefixes into one
where possible or create new prefixes where there are conflicts so
that we end up with all the mappings defined on the root element.
> * It is possible to construct a customized XML quasiparser with a
> given set of namespace declarations.
>
> * There are means to traverse and pattern-match XML trees. Currently I
> have two plans in mind for this:
> 1. XPath expressions can be used as subscripts. Example:
> ? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
> # value: [xml`xyz`, xml`bar`]
> (This is a working-right-now example.)
>
> 2. Pattern matching, as offered currently by term-trees:
> def xml`<input type="text" name="@name" value="@value">` := elem
Both this and XPath would be very useful. Would this work?
def xml`<@elementName @attrs*>@content</@elementName>` := ...
When matching on XML I mostly want to ignore unmatched attributes, but
I guess adding @_* would be OK, to stay consistent with the rest of E.
> (However, pattern-matching style raises issues of adding syntax and
> semantics for repetitions, as well as don't-care vs. strict matching
> of additional attributes, elements, and text.)
>
> These two styles can be usefully combined:
> for xml`<html:input type="text" name="@name" value="@value">`
> in form[xpath`//html:input`] {
> map[name] := value
> }
>
> * An XML fragment consisting solely of text should coerce to a String
> and vice versa.
>
> * There are straightforward, text-encoding-correct ways to read and
> write XML documents (that is, convert between XML trees and strings,
> byte arrays, character streams, and binary streams).
>
> All of what I've listed so far is either already implemented or
> seeming reasonably straightforward (given that we're running in Java)
> except for (a) implementing quasi-holes (without writing a whole new
> augmented-XML parser) and (b) the pattern matching facility, which
> would be a good bit of nontrivial from-scratch design and code.
>
> So, tell me what *you* think you need.
Sounds like just what we need. Thanks!
--
Dr Thomas Leonard ROX desktop / Zero Install
GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA
More information about the e-lang
mailing list