[e-lang] IPv6 addresses / uriGetter encoding

Thomas Leonard tal at it-innovation.soton.ac.uk
Wed Jun 23 03:39:37 PDT 2010


On Tue, 2010-06-22 at 08:49 -0700, Mark S. Miller wrote:
> On Tue, Jun 22, 2010 at 5:34 AM, Thomas Leonard
> <tal at it-innovation.soton.ac.uk> wrote:
>         E doesn't work with IPv6 addresses. Here's a patch:
>         
>         http://gitorious.org/repo-roscidus/it-innovation/commit/5cda14106e59be94cda1aa013ee6422458025768
>         
>         This also adds "[]" to the list of acceptable URI characters.
> 
> Hi Thomas, I notice that this patch refers to RFC 2732 for the new
> IPv6 friendly URI syntax. Web searching, I find
> <http://tools.ietf.org/html/rfc3986> claims it obsoletes RFC 2732. I
> have no idea if there are any relevant differences. I've never looked
> at either of these RFCs before. Any idea what this is about?

Don't know, sorry.

>         By the way, I found this surprising:
>         
>         $ ls
>         a|b
>         
>         $ rune
>         ? <file:a|b>.getText()
>         # problem: <FileNotFoundException: .../a:b (No such file or
>         directory)>
>         
>         ? <file>["a|b"].getText()
>         # problem: <FileNotFoundException: .../a:b (No such file or
>         directory)>
>         
>         ? for name in <file:.>.list()
>         { println(<file>[name].getText()) }
>         # problem: <FileNotFoundException: .../a/a:b (No such file or
>         directory)>
>         
>         Why does it do that?
> 
> IIRC, it was a misguided attempt on my part to be compatible with some
> stupid IE6 file: URL behavior. Please feel free to fix. Thanks.

Where should the expansion be done? The lexer only permits valid URL
characters (which a few exceptions), so the assumption is that this
string is the encoded form (e.g. %20 for space). But:

     * XXX An open question is whether normalize/1 should also normalize
     * '%&lt;hex&gt;&lt;hex&gt;' to the encoded character. Currently this is
     * not done.

Either the lexer should undo the encoding and the actual getter's get/1
method should expect an unencoded string, or the lexer should leave it
alone and the getter should expect an encoded string. But:

? <file:%20>.getText()
# problem: <FileNotFoundException: /home/tal/%20 (No such file or directory)>

? <file: >.getText()
# syntax error: 
#   file: >.getText()

It seems obvious that <file:someDir>["%20"] should interpret this is a
file called "%20", not a file called " ". The case for <file>["%20"] is
less clear (I guess this is why Kevin wants to use separate objects).

Also:

? <http>["//localhost:8000/ HTTP/1.0"].getText()

generates:

GET / HTTP/1.0 HTTP/1.1

That seems wrong, and slightly unsafe.

So, I guess the clean thing is for the lexer to expand %XX and for
<http> to re-escape for HTTP and for <file> to modify for the filesystem
on Windows (as it's doing already).

So:

<file:%20>  ==  <file>[" "]
<file:a|b>  ==  <file>["a|b"] == the file a:b (on Windows)
<file:a\b>  ==  <file>["a\b"] == the file a/b (on Linux)

<captp://*aaa@[::1]:123/bbb> == <captp>["//*aaa@[::1]:123/bbb"]

<http://localhost/%20>  == <http>["//localhost/ "] == a
  GET http://localhost/%20 HTTP/1.1

Does that sound sensible?


-- 
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:tal at it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk 



More information about the e-lang mailing list