daw at cs.berkeley.edu
Sat Feb 9 20:56:29 EST 2008
I like where this is going.
Let me propose one conceptual model, that's a bit different from
what's found in that document. Given a string like
"SELECT * FROM Table WHERE id=$id"
we'd parse this using the same grammar as will be used to interpret it,
leaving a hole in the parse tree where the '$id' appears. Then we'd
fill in that hole with a lexical constant tree node whose value is given
by the value of $id. That gives us a complete parse tree. Now we'd
convert the parse tree back to text ("unparse" it), and that's the
result of the substitution. This assumes that your tree nodes have an
"unparse" operation that correctly performs all the needed escaping.
The idea is that each type of node knows how to unparse itself.
This gives you something very much like a SQL PreparedStatement, except
that we get to use '$id' instead of '&1'.
I'd suggest the above as at least a conceptual model to compare against.
I'm aware that, for performance reasons, it sounds like you don't want
to fully parse these strings. But it seems like a good thought question
is: Will your proposed technique lead to the same behavior as the above
conceptual model? Every deviation with the above conceptual model seems
like it may be worth some careful thought to understand its consequences.
One question is whether you can approximate this using just lexical
analysis. I'm not sure how to tell whether lexical analysis will be
enough to give you all the context needed to know how to escape properly.
Also, one potential failure mode: If the lexical grammar you use deviates
in any way from how the browser will interpret the HTML, then that might
create security holes. So it seems worth thinking about how to ensure
that your grammar will stay exactly in sync with every browser's
Anyway, I like the idea described at that URL.
More information about the e-lang