[e-lang] Design issue (and proposed change): indentation in literals

Kevin Reid kpreid at mac.com
Thu Jan 15 18:48:35 CST 2009


-----------------------------------------------------------------
Introduction

   ? def makeTermParser := <import:org.quasiliteral.term.makeTermParser>
   > null

The TermL and E prettyprinters fail to handle the interaction of  
newlines and indentation, as demonstrated by these test cases:

   ? (def x := term`{data: ["a\nb"]}`) == (x.asText())
   # value: false

   ? (def x := e`{"a\nb"}`) == e__quasiParser(x.asText())
   # value: false

In both cases, the problem is that string literals are printed with  
unescaped newlines, and the automatic indentation inserts  
inappropriate whitespace after those newlines.


-----------------------------------------------------------------
Also, quasis


While no E implementation yet prints E source with un-expanded  
quasiliterals included, it should happen eventually, and this problem  
is particularly bad for that, as quasiliterals often have multiline  
syntaxes, which would be much uglier if presented as a single line.

   def dialog := JPanel`
     $icon    $message.X.Y >       >
     $blank.Y v            >       >
     v        $blank2.X    $cancel $ok
   `

   def template := `
     <html>
       <title>$title</title>
     </html>
   `

   def tree := term`root(branch(leaf,
                                leaf),
                         branch(leaf))`


-----------------------------------------------------------------
A catalog of solutions to the original problem


1. print newlines in literals as "...\n..." rather than "...
..."
   Advantages: Does not introduce new syntax. Printed source is  
entirely indentation-insensitive.
   Disadvantages: Makes long multiline literals, especially  
quasiliterals, less readable.


2. inhibit automatic indent while printing literals.
   Advantages: Does not introduce new syntax. Long multiline literals  
are presented closer to un-quoted.
   Disadvantages: Printed source is not indentation-insensitive.


3. Introduce syntax which distinguishes block indentation from  
whitespace in literals.
   Advantages: The new syntax can be used for indentation-insensitive  
multiline literals in human-written source.
   Disadvantages: Possible noise in the source code. Strange new  
syntax. Lots of possibilities for what that syntax is.


The third option, suitably implemented, would also have the advantage  
that the indentation of the source code would not "leak" into the text  
the quasiparsers see. It is the one I like the most, but I have no  
*good* ideas for syntax to implement it. Here are some examples.

(The "leak" is especially bad when the quasiquote is a plain-text or  
otherwise whitespace-preserving template; see the HTML generated by E- 
on-JavaScript for some examples of the consequences.)

(Please note that we need a solution for both string literals and  
quasiliterals.)


-----------------------------------------------------------------
A catalog of syntaxes for option 3


3a. In C, adjacent string literals are concatenated, leading to the  
pattern
   "abc\n"
   "  def\n"
   "ghi"
This is simple and unambiguous, but requires syntax on every line, and  
explicit insertions of \n. Also, in E the above is already defined to  
be a sequence expression, so this is not suitable for borrowing.

3b. In Haskell, arbitrary whitespace may be surrounded by backslashes  
to ignore it:
   "abc\n\
   \  def\n\
   \ghi"
This is essentially similar to the C solution, except that it occurs  
within one string literal. This could be adapted for E string  
literals, but would be impractical for quasiquotes, since they require  
$\ for all escapes.

3c. Borrowing the convention used in English text where a quotation  
may be continued by omitting the closing quotation mark at the end of  
a paragraph:
   "abc
   "  def
   "ghi"
This is lightweight, and requires no special marker at the ends of  
lines, but it would change the meaning of existing multiline literals.  
(It would probably give some syntax highlighters grief, too, but  
decent ones should take a pattern of "closing quote or end of line  
ends a string literal" and that would do.)

3d. We could introduce a continuation marker of some sort, which means  
"discard preceding whitespace".
   "abc
  \|  def
  \|ghi"
This has the disadvantage of being visually heavyweight, but I can't  
think of anything else against it. (And if you want to write a  
multiline formatted literal which denotes the string "abcdefghi", you  
can do this:
   "abc\
  \|def\
  \|ghi"

3e. A refinement of the previous is to use the discard-whitespace  
marker once:
   "abc
  \|  def
    ghi"
This works particularly well with the vertically-aligned-beside-the- 
head style of indentation, which is common in Lisp and Haskell but I  
don't know a common name for:
   def tree := term`root(branch(leaf,
                  \|            leaf),
                         branch(leaf))`

However, it requires baking into the lexer the idea that "\|" is "the  
same width" as two spaces. (It also may not cooperate nicely with  
editors doing automatic indentation with tabs.)

There is precedent: Haskell uses the width of whitespace vs.  
characters in order to accomplish its indentation-determines-syntax  
"layout rule", and I haven't heard any complaints about this being  
fragile. (Python is simpler, in that it only cares about levels of  
whitespace indentation.)

This plan is less general than Haskell, in that it only ever compares  
"xxx\|" to "xxx  ", where xxx is a particular string of whitespace  
characters, so I think that it is not too dangerous to build this into  
the lexer.

It can also subsume option 3d, by permitting \| on any continuation  
line, not just the second; this could be used when needed (cranky  
editors; future funkiness introduced by Unicode; display in a  
proportional font where "\|" really isn't the same width as "  ".)


-----------------------------------------------------------------
Conclusion


As you can tell by that I stopped generating options and wrote a lot  
about it, I like option 3e, and if I were in charge I would proceed to  
make it part of the E syntax. However, it isn't obviously ideal, so I  
ask for discussion: what should we do about indentation in literals?

-- 
Kevin Reid                            <http://homepage.mac.com/kpreid/>




More information about the e-lang mailing list