[e-lang] IPv6 addresses / uriGetter encoding
Thomas Leonard
tal at it-innovation.soton.ac.uk
Wed Jun 23 03:39:37 PDT 2010
On Tue, 2010-06-22 at 08:49 -0700, Mark S. Miller wrote:
> On Tue, Jun 22, 2010 at 5:34 AM, Thomas Leonard
> <tal at it-innovation.soton.ac.uk> wrote:
> E doesn't work with IPv6 addresses. Here's a patch:
>
> http://gitorious.org/repo-roscidus/it-innovation/commit/5cda14106e59be94cda1aa013ee6422458025768
>
> This also adds "[]" to the list of acceptable URI characters.
>
> Hi Thomas, I notice that this patch refers to RFC 2732 for the new
> IPv6 friendly URI syntax. Web searching, I find
> <http://tools.ietf.org/html/rfc3986> claims it obsoletes RFC 2732. I
> have no idea if there are any relevant differences. I've never looked
> at either of these RFCs before. Any idea what this is about?
Don't know, sorry.
> By the way, I found this surprising:
>
> $ ls
> a|b
>
> $ rune
> ? <file:a|b>.getText()
> # problem: <FileNotFoundException: .../a:b (No such file or
> directory)>
>
> ? <file>["a|b"].getText()
> # problem: <FileNotFoundException: .../a:b (No such file or
> directory)>
>
> ? for name in <file:.>.list()
> { println(<file>[name].getText()) }
> # problem: <FileNotFoundException: .../a/a:b (No such file or
> directory)>
>
> Why does it do that?
>
> IIRC, it was a misguided attempt on my part to be compatible with some
> stupid IE6 file: URL behavior. Please feel free to fix. Thanks.
Where should the expansion be done? The lexer only permits valid URL
characters (which a few exceptions), so the assumption is that this
string is the encoded form (e.g. %20 for space). But:
* XXX An open question is whether normalize/1 should also normalize
* '%<hex><hex>' to the encoded character. Currently this is
* not done.
Either the lexer should undo the encoding and the actual getter's get/1
method should expect an unencoded string, or the lexer should leave it
alone and the getter should expect an encoded string. But:
? <file:%20>.getText()
# problem: <FileNotFoundException: /home/tal/%20 (No such file or directory)>
? <file: >.getText()
# syntax error:
# file: >.getText()
It seems obvious that <file:someDir>["%20"] should interpret this is a
file called "%20", not a file called " ". The case for <file>["%20"] is
less clear (I guess this is why Kevin wants to use separate objects).
Also:
? <http>["//localhost:8000/ HTTP/1.0"].getText()
generates:
GET / HTTP/1.0 HTTP/1.1
That seems wrong, and slightly unsafe.
So, I guess the clean thing is for the lexer to expand %XX and for
<http> to re-escape for HTTP and for <file> to modify for the filesystem
on Windows (as it's doing already).
So:
<file:%20> == <file>[" "]
<file:a|b> == <file>["a|b"] == the file a:b (on Windows)
<file:a\b> == <file>["a\b"] == the file a/b (on Linux)
<captp://*aaa@[::1]:123/bbb> == <captp>["//*aaa@[::1]:123/bbb"]
<http://localhost/%20> == <http>["//localhost/ "] == a
GET http://localhost/%20 HTTP/1.1
Does that sound sensible?
--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP
Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:tal at it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk
More information about the e-lang
mailing list