Jacaranda Language Specification, draft 0.3 "Inside every large problem is a small problem struggling to get out." -- C. A. Hoare Introduction ============ This document is draft version 0.3 of a specification for the Jacaranda programming language. Jacaranda is an object-capability [Miller2006] sublanguage of JavaScript. To be more precise, Define language A to be an *attribute-verifiable sublanguage* of language B if-and-only-if: - there exists an attribute grammar [Knuth1968, Knuth1971, Swierstra2005] for A whose underlying grammar recognises a subset of texts recognised by B, and that deterministically computes an 'errors' attribute (whose value is a set of error indicators) on the start symbol; and the semantics of executing a text in A are: - if the text is not recognised by A's underlying grammar or if the 'errors' attribute on the start symbol is non-empty, report these errors. Otherwise, execute the program according to the specification of language B, in a given execution environment. By *ES3F*, we mean the programming language specified by ECMAScript, 3rd edition, subject to corrections for known specification errors in that standard as described in the "ES3 -> ES3F" section below. Jacaranda is an attribute-verifiable sublanguage of ES3F. (ES3F also supports syntax for declaring «const» variables compatible with proposed ES3.1 and ES4, and an optional improvement to the semantics of the GetValue operation that is also similar to an ES4 proposal. In future we could specify Jacaranda as a sublanguage of ES3.1 or ES-Harmony, but the current drafts do not include all of the fixes we need.) In fact, the underlying lexical and syntactic grammars used to specify Jacaranda are (modulo a couple of known bugs in this draft) identical to those of ES3F; the only difference is the addition of rules to compute attributes. Note that _any_ implementation of language B that satisfies B's specification when applied to programs in the A sublanguage, is sufficient to execute a verified A program. In the case of Jacaranda, the sublanguage avoids many known bugs in commonly used JavaScript implementations, and so verified code does not necessarily need to be run on a fully correct implementation of ES3F. By using this definition of subsetting, Jacaranda differs from some other proposals for security-focussed sublanguages of ECMAScript, such as Caja and Cajita [Miller2008] or FBJS [Facebook]. There are many advantages of specifying and implementing a safe sublanguage by attribute-verification rather than by rewriting: - In implementations of the languages mentioned above, the code is rewritten to a form that can introduce additional run-time errors (or other changes in behaviour, called "gotchas" in the Caja/Cajita specification), that would not occur when running it directly as ECMAScript. In Jacaranda, there are necessarily no changes in the possible run-time behaviour of a verified program relative to the same program in ES3F. There are cases where an existing JavaScript program might have to be modified in such a way that additional run-time errors could occur -- for example some property accesses may have to be replaced by function calls, which can throw exceptions -- but these can only happen where there are explicit changes in the source code. - A verifier is easier to understand and has fewer sources of potential error, both in the specification and implementation, than a rewriter. To understand a rewriter, it is necessary to keep in mind the semantics of both the source and target languages, and the run-time library that the translated code depends on, and be convinced that: - the source language semantics are preserved by the translation; - no source program is translated in a way that will lead to the translated code breaking the intended security properties. To specify and implement a verified sublanguage, on the other hand, it is only necessary to define how the language is restricted syntactically; then the security properties of the sublanguage can be understood independently. If the restriction is done by an attribute grammar, there is little room for ambiguity in the specification of the verifier. The semantics are already defined by the original language specification, modulo bugs and ambiguities in that specification. A specification that defines a rewriting into another language is likely to be more complicated, and give a result that is less directly useful and understandable to most users of the sublanguage. (Jacaranda also requires a run-time library written in ES3F, but that library is relatively small and of low implementation complexity. In fact it is mostly a set of utility functions which, although they are written in ES3F and therefore have to be trusted, are each very short and can be understood independently.) - As mentioned above, for Jacaranda the underlying lexical and syntactic grammars on which attributes are computed are exactly as specified by ES3F (and almost exactly as specified by ECMA-262). This means that: - the subsetting relation between Jacaranda and ES3F holds by construction; - it is much easier to see that all possible cases have been accounted for in the specification, since they are all enumerated by an existing grammar; - an existing ECMAScript parser can be used, provided that it follows the ES3F grammar closely enough and enforces it strictly; - it is possible to parse any syntactically valid ECMAScript program (even an invalid ECMAScript or JavaScript program if the parser can recover from errors) and report deviations from the Jacaranda sublanguage in one pass, rather than requiring the programmer to fix each error before the rest of the program can be parsed. - It is easy to perform white-box testing of a verifier based on an attribute grammar, because the intermediate results are deterministic and do not depend on any implementation choices. The attribute grammar specifies precisely what tree of attribute values are to be computed for any given input. So a single white-box test suite can be run against all implementations, or implementations can be compared against each other for "fuzzed" input , and any differences in the resulting attribute tree (not just differences in the overall pass/fail outcome of verification) can be treated as bugs. - In general an attribute grammar has *synthesized* and *inherited* attributes. The synthesized attributes of a node depend only on the attributes and contents of its immediate child nodes; while the inherited attributes of a node depend only on its parent node (and on its other attributes, possibly). A grammar that has only synthesized attributes is called *S-attributed*. Such grammars admit a verifier implementation that is particularly simple and efficient in time and memory: programs can be parsed and verified in a single pass, by either a "top-down" or "bottom-up" style of parser, without necessarily constructing any syntax tree explicitly. The Jacaranda attribute grammar is "almost" S-attributed, in the sense that it has only one inherited attribute ('DependOnNewSemantics'), which is determined by the first token of the source text, and has the same value in all nodes. In practice this case is as easy to implement as an S-attributed grammar; it requires just a single post-order traversal of the parse tree. If an explicit parse tree representation is required (e.g. for a code generation framework), the representation could also ensure that only error-free subtrees are constructed, since the attributes for any subtree can be computed independently if the expected value of the 'DependOnNewSemantics' attribute is given. Attribute computation can be implemented using the "semantic actions" supported by most existing general-purpose parser systems, and is also straightforward to implement in an ad-hoc recursive descent parser. - There are no difficulties in reverifying a program that has already been verified. A non-idempotent rewriter, on the other hand, cannot usefully process a code fragment twice -- even if the result is correct, it will be too inefficient. (This also means that it would be possible to apply N-version programming, with verifiers written using different languages and/or parser libraries by independent teams of programmers: a simple harness could run all versions concurrently and if they differ, reject the input program and report a bug. Of course this will only directly catch implementation rather than specification bugs, and it only applies to the verifier; the JavaScript implementation(s) used to execute verified programs still need to be tested more conventionally.) - Avoiding rewriting means that less inefficiency is introduced into the code that is being run. While there is some overhead due to changing property accesses to function calls, this is explicitly visible in the source code, and can be avoided in many cases (especially in the 'DependOnNewSemantics' mode, as explained later). - Debugging is simplified because it is the original code that is being debugged. Although it is possible in principle to provide a debugger with a mapping between lines of the input code and its rewritten form, in our experience this often doesn't work very well (particularly when the ordering of code is changed, as it is by many of the Caja rewriting rules). Despite these differences, Jacaranda is strongly influenced by Caja and Cajita, and by ADsafe [Crockford], which is also a static sublanguage of ES3 that could in principle be specified by an attribute grammar. (Differences between Jacaranda and ADsafe are out of scope for this document, but in general Jacaranda allows a qualitatively larger sublanguage of ECMAScript than ADsafe. Also, Jacaranda is an object-capability language with per-object encapsulation, whereas ADsafe only provides encapsulation between module instances.) We encountered the idea of using attribute grammars for validation (as opposed to their more common uses in assigning semantics or for code generation) from Meredith Patterson and Len Sassaman [PS2008]. The earliest reference we can find to a sublanguage of an existing programming language being defined in terms of an attribute grammar, is a 1977 paper defining a sublanguage of Algol 68 [Simonet1977]. Edgar Irons did much of the early work that led to the development of attribute grammars [Irons1963]. [Irons1963] Edgar T. Irons, Towards more versatile mechanical translators, Proceedings of Symposia in Applied Mathematics 15, pages 41-50. American Mathematical Society, 1963. [Knuth1968] Donald Knuth, Semantics of Context-Free Languages, California Institute of Technology, Theory of Computing Systems, Volume 2 Number 2. Springer, New York, June 1968. [Knuth1971] Donald Knuth, Semantics of Context-Free Languages: Correction, Stanford University, Theory of Computing Systems, Volume 5 Number 2. Springer, New York, June 1971. [Simonet1977] M. Simonet, An attribute description of a subset of Algol 68, Proceedings of the Strathclyde ALGOL 68 conference, Glasgow, Scotland, pages 129-137, 1977. [Swierstra2005] Wouter Swierstra, Why Attribute Grammars Matter, The Monad.Reader, Issue Four, 01-07-05. [PS2008] Meredith L. Patterson and Len Sassaman, Validation by Parse Tree Comparison Extended to Context-Sensitive Grammars, In preparation, 2008. [Miller2006] Mark S. Miller, Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control, PhD thesis, Johns-Hopkins University, May 2006. [Miller2008] Mark S. Miller, Caja draft specfication, Latest version available from [Crockford] Douglas Crockford, ADsafe specification, [Facebook] FBJS - Facebook Developers Wiki, Conventions =========== Text in a subsection following "Rationale:" is non-normative; that is, it does not impose any conformance requirements on Jacaranda implementations or programs. The same programming language would be specified if this text were removed. Text surrounded by [# #] is a non-normative editing note. Definitions of technical terms used in this specification should be taken from the first of the following documents that provides a definition for the term: 1. This document. 2. ECMA-262, 3rd edition. 3. The Unicode Standard, version 5.1.0. Defining occurrences of terms in this document are shown like this: *term*. A *string* is a sequence of zero or more UTF-16 code units (not necessarily well-formed). In this specification, strings are often presented in single quotes; this has the same meaning, including use of \u escape sequences, as the corresponding string literal in ES3F. Terms applied to strings such as *starts with*, *ends with*, etc. are defined in terms of this code unit sequence, regardless of whether it is a well-formed UTF-16 encoding. For example, '\uD800__' is considered to end in '__'. Code fragments are given in double angle quotation marks: « ». In general, these fragments should be interpreted as ES3F code; they may or may not be valid in the Jacaranda sublanguage. In code fragments, \ stands for itself. Grammar productions are given in single angle brackets: < >. An optional occurrence of a production is shown as < >opt. \u escapes are used in grammar productions to mean the corresponding unescaped code unit. (These conventions are different to those in ECMA-262 because in plain text, we cannot distinguish terminals from nonterminals by font.) Productions that do not correspond to an ECMA-262 production are named with an initial lowercase letter. There are three grammars used in this specification: - In the lexical grammar, :: is used to separate a nonterminal from its definition. The lexical grammar is an S-attribution of the ES3F lexical grammar. The input is a string, and the output is a sequence of tokens, each of which is a string that may have attributes. (ECMA-262 specifies some characteristics of the ES3 lexical grammar only informally. It is possible to use the attribute grammar formalism to specify the required tokenisation very precisely. We do not do this in the current draft because it would add apparent complexity to the Jacaranda specification that would only be concerned with clarifying ES3F, not with defining how Jacaranda is a sublanguage of ES3F.) - In the syntactic grammar, : is used to separate a nonterminal from its definition. The syntactic grammar is technically an LR-attribution of the ES3F syntactic grammar (it would be an S-attribution without the 'DependOnNewSemantics' attribute). Its input is the token sequence obtained from the lexical grammar, and its principal output is an 'errors' attribute giving the set of errors (if any), other than parse errors, that would prevent the source text from being a valid Jacaranda program. - In the string grammar, ::: is used to separate a nonterminal from its definition. The input to the string grammar is a sequence of 16-bit code units. It is a regular grammar that computes no attributes and is only used to recognise classes of strings. The regexp literal grammar in ES3F has no equivalent in Jacaranda, which does not allow regexp literals. Lexical and syntactic grammar productions that do not require an explicit attribution rule but are directly used by a production that does, are shown with a reference to the ECMA-262 section number. The symbol ε in a production means the empty input sequence. The syntactic grammar can be parsed unambiguously, independently of attribute computation. That is, while attribute values affect whether errors result from a given parse, it is never necessary to compute attributes in order to choose between different parse trees. (In a small number of cases, a node must be reparsed against a more specific production than in the ES3 grammar.) The metalanguage used to define attribution rules has values of the following types: - true and false - arbitrary-precision decimal numbers - the undefined attribute value - finite sets of other attribute values - finite sequences of other attribute values - finite mappings from attribute names to attribute values. A parse tree node is represented as a finite mapping of its attributes, with a 'text' attribute holding the corresponding source text. A convention is used to present attribution rules more concisely, by making some "boilerplate" attribute computation implicit: a) if a production has exactly one child node, then all of its attributes are the same as those of the child node; b) otherwise, if no explicit rule is given to compute each of the attributes 'freelyUsed', 'freelyAssigned', 'allVars', 'constVars', 'topLevelVars', 'usedProperties', 'aliasesForThis', 'fatal', 'errors', and 'suppressed', then the attribute is computed by taking the union of the corresponding attribute values on all child nodes; c) for boolean-valued attributes (with names starting 'is'), if no explicit rule is given then the value is false. d) for other attributes, if no explicit rule is given then the value is undefined. The attributes mentioned in b) are all set-valued, so that taking the union of their values for all children of a node is well-defined. The convention is to be used for _all_ syntactic and lexical grammar productions, including those that are used by ES3F but not explicitly mentioned in this document. Notation and auxiliary functions used in attribution rules: // comment a comment that is non-normative and has no effect N.foo the value of attribute 'foo' of node N N is

true iff N.text matches production

(grammatically, ignoring attributes) N isnot

not(N is

) not(b) true if b is false; otherwise false { x <- A | p(x) } the subset of values x taken from A for which p(x) UNION S the union of all sets that are members of S, or undefined if any of the members are undefined A U B UNION {A, B} A \ B the set difference { x <- A | x notin B } S ++ T the concatenation of sequence S followed by T A U= B define A to be the union of its boilerplate value as described above, and B S startswith T true if S is not undefined and sequence S is a prefix of T; otherwise false X == Y true if X and Y are identical values; otherwise false cond ? X : Y if cond is true then X, otherwise Y max(X, Y) X > Y ? X : Y min(X, Y) X < Y ? X : Y reparse N as

else Action reparse N as production

, possibly binding additional variables in the attribution rule. If N cannot be parsed as

, perform Action instead of the remainder of the rule. noErrors(N) = (N.errors U N.fatal) == {} // the following definition uses Haskell-style pattern matching on sequences: checkNoMultipleDeclaration([]) = {} checkNoMultipleDeclaration(x:xs) = { error(id, 'multiple declaration') | id <- x.allVars intersect UNION { y.allVars | y <- xs } } U checkNoMultipleDeclaration(xs) check(cond, arg, msg) = cond ? {} : {error(arg, msg)} error(arg, msg) = representation of an error with auxiliary information 'arg' (an identifier or node) and message 'msg' NumberToString(x) = the result of applying ToString as defined in ECMA-262 section 9.8.1 to "the number value for x" as defined in ECMA-262 section 8.5 [# other notation should be fairly self-evident, but spell it out more explicitly in a future draft #] Any conformance requirements specified in the normative text are intended to be redundant with, i.e. specify precisely the same restrictions as, the Jacaranda lexical and syntactic grammars. If this is not the case then it indicates an ambiguity in the specification that needs to be clarified. [# FIXME: the distinction between fatal and ordinary errors is not reflected in the prose descriptions. An error must be fatal if it affects lexing, or if it relates to a variable that could be hoisted outside of an unevaluated expression. (For the next draft, consider removing this distinction as discussed in the rationale for group EXPRESSIONS.) #] The descriptions of Jacaranda restrictions relative to ES3F are given in groups, each with an uppercase name. Attribution rules given in one group may also impose constraints needed to implement the rules in other groups, in which case cross-references are given in most cases. Rationale: Analogous to type declarations, some degree of redundancy in a specification is useful in order to provide a better chance of detecting specification errors. This is the reason for providing informal descriptions that are redundant with the attribute grammar. Conformance =========== The terms *SHALL*, *SHALL NOT* or *SHALL ONLY* are used to describe constraints on a conforming implementation of Jacaranda. The terms *MUST*, *MUST NOT* or *MUST ONLY* are used to describe static constraints on a module definition (see below). This indirectly specifies that a conforming implementation of Jacaranda SHALL reject module definitions that fail to satisfy such a constraint. The terms *SHOULD*, *SHOULD NOT* or *SHOULD ONLY* are used to describe a recommendation for Jacaranda code to follow, in order to mimimise the risk of potential incompatibilities with future versions of this specification or implementations of ECMAScript. It has no formal consequence for conformance. For any constraint that is described as implementation-defined, a conforming implementation SHALL make a consistent choice and SHALL document which choice is made. An *ES3F interpreter* is an instance of an ES3F implementation conforming to ECMA-262 3rd edition amended as described in the section "ES3 -> ES3F". A *context* is an ES3F execution environment that has copies of the standard built-in objects independent of other contexts in the same ES3F interpreter. Contexts may share other objects. (In typical web browser implementations of JavaScript/ECMAScript, there is usually one context for each frame or iframe.) A *visible side effect* is an effect that changes the state of one or more objects in an ES3F interpreter in a way that is observable to other code running in that interpreter by overt means (that is, without making use of timing of events where this timing is nondeterministic). Memory usage is not considered to be a visible side effect. Jacaranda programs are organised into *modules*. Typically, these are independently developed units of source code. A *module definition* is a string, representing Jacaranda source code intended to define a module. An *internal module representation* is a representation of a module as a collection of values in a given ES3F interpreter. An internal module representation includes a *module object*, which is a deeply immutable object with the properties specified in the module definition. An instance of a Jacaranda module is called a *caplet*. Each caplet will have a *powerbox* object defined by a specific region of its module code. The creator of a caplet must provide an *environment* and a *powersource*. The environment specifies *imports* that will be in scope throughout the caplet's module code. The powersource is an object that will only be initially available to the caplet's powerbox. The caplet will have no overt authority other than that granted via its environment and its powersource, and the ability to perform pure computation. (It will also have access to the module object, but that object provides no additional authority.) A *module name* is a string, representing the name of a module. A *Jacaranda interpreter* is an ES3F interpreter with the following additional characteristics: - an interpreter has a per-context state consisting of a mapping from module names, to internal module representations of modules that have been accepted in that context. - it is possible to create a context with an initially empty mapping. - it is possible to submit a module definition to a given context, with the following effect: - if the module definition violates any of the constraints given in sections "Module Definitions" and "Static Constraints" below, a *static rejection error* SHALL be reported. - the module definition may specify a variant of the Jacaranda language that is unsupported by the implementation. In that case, an *unsupported Jacaranda variant error* SHALL be reported. - if the module definition does not give a 'name' property that is different from any module name already in the mapping, a *module name clash error* SHALL be reported. - the implementation may, for any or no reason, report an *implementation failure* when an attempt is made to submit a module. Such a failure SHALL be distinguishable from other errors. - otherwise, a mapping from the module's name to its internal module representation SHALL be added to the context's state. In any of these cases, there SHALL be no other visible side effects of submitting the module definition. - it is possible to create a caplet by *instantiating* a named module, providing an environment and a powersource as discussed above. - the *actual import environment* of the caplet is the provided environment, augmented with the objects specified in the "Module Run-time Environment" section, the latter taking priority over the former; then intersected with the module's import set as defined in group MODULE_DEFINITION. - if an import named in the caplet's 'imports' list is not available in the import environment, an *unavailable import error* SHALL be reported. - the module MUST have a 'powerbox' property that holds a function (callable) value -- if not then a *missing powerbox property error* SHALL be reported. - the implementation may, for any or no reason, report an *implementation failure* when an attempt is made to instantiate a caplet. Such a failure SHALL be distinguishable from other errors. - the function given by the 'powerbox' property of the module SHALL be called, with the given powersource object and the module object as parameters in that order. The bindings available in the caplet's outer lexical scope SHALL be given by its actual import environment. - the function given by the 'powerbox' property may interact with the environment of the Jacaranda implementation in order to set up *event handlers* which cause functions created by the module to regain control at some later point in time. The specifications of such event-handling features are implementation-defined. In cases where the above description says that an error SHALL occur as a result of submitting or instantiating a module, other visible side effects SHALL NOT occur. At any time, and for any or no reason, a Jacaranda implementation may *lock* a given context, in which case it SHALL NOT visibly run any further code in that context. Other aspects of the interaction between a Jacaranda interpreter and its environment are implementation-defined. (One possibility is that a *container* is written in unrestricted ES3F code to control this interaction.) Rationale: The idea of using different conformance terms to indicate constraints on the implementation vs constraints on programs is borrowed from W3C specifications. The term "lock" is named after the LOCK utility in the Incompatible Timesharing System: . The allowance for implementation failures makes Jacaranda particularly easy to implement: #include int main(int argc, char **argv) { puts("Implementation failure.\n"); return 1; } More seriously, there are good reasons to allow nondeterministic failures, given that a practical Jacaranda implementation has no control over what memory or execution resources are available, and machines with unbounded resources do not exist. The requirement for these failures to be distinguishable from other errors is actually quite strong. ES3 -> ES3F =========== Apply the errata given in . Change and related productions to allow «const» to occur in place of «var». («const» is a in ES3, so it already cannot occur as an identifier.) The detailed changes to these productions are given in the DECLARATIONS group, but are mentioned here because they extend the language rather than restricting it. Section 2 Conformance - Replace A conforming implementation of this International standard shall interpret characters in conformance with the Unicode Standard, Version 2.1 or later, and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is presumed to be the BMP subset, collection 300. If the adopted encoding form is not otherwise specified, it presumed to be the UTF-16 encoding form. with A conforming implementation of ES3F SHALL interpret characters in conformance with the Unicode standard, version 5.1.0 or later (see section 6 for further detail). - The following paragraph does not apply to ES3F: A conforming implementation of ECMAScript is permitted to provide additional types, values, objects, properties, and functions beyond those described in this specification. In particular, a conforming implementation of ECMAScript is permitted to provide properties not described in this specification, and values for those properties, for objects that are described in this specification. Rationale: Permission to provide arbitrary implementation-defined extensions is not appropriate for a security-focussed language. In the case of ES3F, some kinds of extension can make it impossible to define a secure sublanguage by static restriction. For more detail see the constraints on extensions given in the changes to sections 8.6.2, 15, and 16 below. Section 3 References - Replace the reference to Unicode version 2.0 with references to Unicode version 5.1.0. Rationale: The conformance requirements for Unicode have been significantly tightened since version 2.0. In particular, there were security- relevant changes at around version 3.2. Note that this does not require support for characters added in later versions. Section 5.1.4 The Syntactic Grammar - Replace The syntactic grammar as presented in sections 0, 0, 0 and 0 is actually not a complete account of which token sequences are accepted as correct ECMAScript programs. with The syntactic grammar as presented in sections 11, 12, 13 and 14 is actually not a complete account of which token sequences are accepted as correct ES3F programs. Section 6 Source Text - Replace the section before NOTE 1 with: ES3F source text is represented as a sequence of code units in the UTF-16 transformation format of Unicode, version 5.1.0 or later. The text SHOULD be in Unicode Normalization Form C as described in Unicode Stanard Annex #15. An ES3F implementation SHALL NOT perform any normalisation of source text itself (but may warn if the text is not in Unicode Normalization Form C). This does not preclude an ES3F source text from being stored or transmitted in other character encodings, but the process by which these encodings are converted to UTF-16 is outside the scope of this document. Throughout ECMA-262 3rd edition, the phrase "code point" and the word "character" are misused to refer to a UTF-16 code unit. In string literals, regular expression literals and identifiers, any code unit may also be expressed as a Unicode escape sequence consisting of six characters, namely \u plus four hexadecimal digits. Within a non-token comment, such an escape sequence is effectively ignored as part of the comment. Within a string literal, the Unicode escape sequence contributes one code unit to the value of the literal. Within an identifier, the escape sequence contributes one code unit to the identifier. Section 7 Lexical Conventions - Insert after the first paragraph: Note that tokens include «/», «/=», «/*const*/», and «/*fallthru*/», even though these are not included in the production. Section 7.1 Unicode Format Control Characters - This section does not apply to ES3F; in particular, Format Control characters SHALL NOT be "removed from the source text before applying the lexical grammar." Rationale: Format Control characters are not allowed anywhere in a Jacaranda source text (see group LEX_CODE_UNITS). If they were removed by the ES3F lexical grammar, then this rule could not be enforced. Section 8.5 The Number Type - Replace See the descriptions of the ToInt32 and ToUint32 operators in sections 0 and 0, respectively. with See the descriptions of the ToInt32 and ToUint32 operators in sections 9.5 and 9.6, respectively. Section 8.6.2 Internal Properties and Methods - Replace Host objects may implement these internal methods with any implementation-dependent behaviour, or it may be that a host object implements only some internal methods and not others. with A *Jacaranda-accessible property* is a property that has an accessible name as defined in group PROPERTY_NAMES. A value is *built-in-reachable* iff it is reachable in the sense of garbage collection terminology, given that only the properties listed in ES3F_BUILTINS of the global object of any context, or the Jacaranda-accessible properties of any other reachable object can be obtained. ES3F_BUILTINS = { 'Array', 'Boolean', 'Date', 'Math', 'Number', 'RegExp', 'String', 'Error', 'EvalError', 'SyntaxError', 'TypeError', 'RangeError', 'ReferenceError', 'URIError', 'Infinity', 'NaN', 'undefined', 'decodeURI', 'decodeURIComponent', 'encodeURI', 'encodeURIComponent', 'isFinite', 'isNaN', 'parseFloat', 'parseInt' } An ES3F implementation SHALL NOT add Jacaranda-accessible properties other than those specified in ECMA-262 to an object that is at any time built-in-reachable, except as described in Section 15 below. Host objects may implement the [[Get]], [[Put]], [[CanPut]], and [[HasProperty]] internal methods with implementation-dependent behaviour when they are used to access properties that are not Jacaranda-accessible. Host objects may also implement internal methods with implementation-dependent behaviour in other cases subject to the following constraint: for any given host object, either - the object is never built-in-reachable, or - the object has behaviour for all of its internal methods that is equivalent to that of some possible non-host object. [# Are these constraints strong enough to rule out oddball objects that would prevent us from implementing the Jacaranda run-time securely? Should [[Get]], [[Put]], [[CanPut]], and [[HasProperty]] be further restricted? Is "reachable in the sense of garbage collection terminology" clear or precise enough? The problem here is that this is the ES3F spec, but we care mainly about whether it will be possible to prevent an oddball object from being reachable in Jacaranda (hence the restriction to accessible properties). #] Section 8.6.2.6 [[DefaultValue]] (hint) - In steps 3 and 7 of each of the algorithms given for [[DefaultValue]], it is specified to "call the [[Call]] method of" a particular object, even though that object is not guaranteed to implement a [[Call]] method. Modify these steps to specify that if the object does not implement [[Call]], a TypeError exception SHALL be thrown. Rationale: Throwing a TypeError appears to be the behaviour of at least SpiderMonkey and JScript, tested using «String({toString: {}})». [# Test other implementations. #] Section 8.7 The Reference Type - Replace the last sentence of section 8.7 with: The abstract operations GetValue and PutValue are used in this specification to operate on references. The definition of GetValue is dependent on a 'NewSemantics' flag associated with each lexical scope. An ES3F implementation is not required to support the change to the semantics of GetValue (relative to ES3) described here. If it does not, then it SHALL NOT define any variable called 'useNewSemantics' in the initial global environment of a new context, and the NewSemantics flag for all lexical scopes SHALL be false. If it does support this change, then: - it SHALL define a ReadOnly variable called 'useNewSemantics' in the global environment of each context that is set permanently to true. - if a has the «useNewSemantics;» as its first statement, then the NewSemantics flag for that function scope, including all nested functions, SHALL be true. - the NewSemantics flag for the global scope, and for functions not included in the preceding point, SHALL be false. All uses of the GetValue operation in ECMA-262, except for the use in section 11.2.3 (Function Calls), should be considered to pass as a second argument, the value of the NewSemantics flag for the lexical scope of the construct being evaluated. A function object created by step 6 of the GetValue operation is called a *bound function object*. If more than one bound function object is created for a given function obtained as the result of step 4, then they are considered to have equated function bodies in the sense of section 13.1.1. (This means that any pair of such function objects may, at the implementation's discretion, be joined.) - Replace section 8.7.1 with: GetValue (V, NewSemantics) 1. If Type(V) is not Reference, return V. 2. Call GetBase(V). 3. If Result(2) is null, throw a ReferenceError exception. 4. Call the [[Get]] method of Result(2), passing GetPropertyName(V) for the property name. 5. If NewSemantics is false or if Result(4) is not a function object with a body that refers to «this», then return Result(4). 6. Create a new function object as specified in 13.2, such that the new function acts as if its «this» is always bound to Result(2), but otherwise acts identically to the function given by Result(4). 7. Return Result(6). Rationale: The motivation for changing the semantics of GetValue is explained in the rationale for group EXPRESSIONS. (Briefly, it allows removing restrictions on how the result of a property access can be used.) The 'useNewSemantics;' syntax has the following characteristics: - by default, it will throw a ReferenceError in ES3, and so ES3F code that depends on the new semantics will not fail silently. - if a programmer wishes to deliberately suppress this ReferenceError, they can write the following: if (typeof useNewSemantics === 'undefined') useNewSemantics = false; - it is possible for the Jacaranda run-time to suppress the ReferenceError for Jacaranda modules (by defining useNewSemantics in an import environment) without doing so for other ES3F code. - it is possible to test easily whether a given implementation supports the new semantics (whether or not the ReferenceError has been suppressed as described above). When GetValue creates a new function object with bound «this», it does not automatically cache this function object in the original property. That would not be difficult to specify, but it would not be a transparent optimization, because: a) the base object B (Result(2)) may be used as a prototype for some other object, O. In that case, a lookup of the property in O could find the cached function that is bound to B, when it should find a function with unbound «this», that will be later bound to O. b) suppose that an unbound function is found in some prototype P of B, and later the property is mutated in some object in B's prototype chain, up to and including P. Meanwhile a bound version of the original function has been cached in B. For the caching to be transparent, subsequent lookups should find the mutated property value, not the cached value. There is no obvious way to ensure that the cache is invalidated in this case. c) code that is not using the new semantics should not be able to see cached values. These problems are not insoluble, but in combination they make the caching optimization too complicated to be worthwhile. However, the fact that the bound functions created from a given function are equated allows an implementation to use "hash consing" of bound functions (i.e. maintain a table mapping (baseObject, unboundFunction) pairs to the corresponding bound function). This approach would be a transparent optimization. On the other hand, we suspect that function binding will be a sufficiently rare operation that it is not necessary to cache bound functions at all. [# Consider having some way to test whether a given function has the NewSemantics flag set. Would that be needed/helpful for the Jacaranda runtime? #] Section 11.2.3 Function Calls - In step 3 of the algorithm, replace 3. Call GetValue(Result(1)). with 3. Call GetValue(Result(1), false). Rationale: Without this change the semantics would be the same, but the algorithm would unnecessarily specify the generation of a bound function for every call to a function that refers to «this». - Add after the NOTE: An ES3F implementation SHALL NOT return a value of type Reference as a result of calling a function that is built-in-reachable. Section 11.8.5 The Abstract Relational Comparison Algorithm [# Consider changing the algorithm so that it always returns true or false, which is apparently the behaviour of all common JavaScript implementations. #] Section 15 Native ECMAScript Objects - Replace Unless otherwise specified in the description of a particular function, if a function or constructor described in this section is given more arguments than the function is specified to allow, the behaviour of the function or constructor is undefined. In particular, an implementation is permitted (but not required) to throw a TypeError exception in this case. NOTE Implementations that add additional capabilities to the set of built-in functions are encouraged to do so by adding new functions rather than adding new parameters to existing functions. with Unless otherwise specified in the description of a particular function, if a function or constructor described in this section is given more arguments than the function is specified to allow, then the function SHALL either ignore the extra arguments, or throw a TypeError exception. It is implementation-defined (per function) which of these alternatives is chosen. ES3F adds the following properties not defined by ES3: Array.prototype: every, filter, forEach, indexOf, lastIndexOf, map, reduce, reduceRight, some Date: toISOString which are specified as described in and . [# FIXME: these are not stable references #] Other than the above, implementations are not encouraged to add additional capabilities to the set of built-in functions, either by adding new functions or by adding new parameters to existing functions. Rationale: A object-capability language must be much more conservative in any additions to libraries that are built-in or implicitly imported, since these libraries should provide no significant authority (other than pure computation). Any vendor-specific extensions should be made in libraries that are not implicitly imported. Array.prototype.{every, filter, forEach, map, some} have a 'thisObject' argument that is bound to 'this' in each call. For an explanation of why it is safe for these methods to be usable from Jacaranda (even though Function.prototype.{call, apply, bind, bindAsEventListener} must not be), see the rationale for the PROPERTY_NAMES group. - Delete the following: In section 15.7.4.5, If the toFixed method is called with more than one argument, then the behaviour is undefined (see clause 15). In section 15.7.4.6, If the toExponential method is called with more than one argument, then the behaviour is undefined (see clause 15). In section 15.7.4.7, If the toPrecision method is called with more than one argument, then the behaviour is undefined (see clause 15). Rationale: A specification of a safe language cannot allow undefined behaviour in any situation. Since this is a change in the specification of run-time behaviour, it must be changed in ES3F rather than Jacaranda. Section 15.3.4.2 Function.prototype.toString() - Replace An implementation-dependent representation of the function is returned. This representation has the syntax of a FunctionDeclaration. Note in particular that the use and placement of white space, line terminators, and semicolons within the representation string is implementation-dependent. with The string value "function () {...}" is returned. Rationale: See the discussion of function opaqueness under 'toSource' in group PROPERTY_NAMES. Section 16 Errors - Replace * An implementation may provide additional types, values, objects, properties, and functions beyond those described in this specification. This may cause constructs (such as looking up a variable in the global scope) to have implementation-defined behaviour instead of throwing an error (such as ReferenceError). with * An implementation may provide additional values in the global scope beyond those described in this specification, subject to the constraints described in sections 8.6.2 and 15. This may cause looking up a variable in the global scope that is not described in this specification to have implementation-defined behaviour instead of throwing an error (such as ReferenceError). Rationale: The Jacaranda run-time will only rely on properties of the global scope specified in ES3F_BUILTINS. Jacaranda code will not be able to access other properties added to the global scope. Note that this section also includes the following quite scary provision: * An implementation may extend program and regular expression syntax. To permit this, all operations (such as calling eval, using a regular expression literal, or using the Function or RegExp constructor) that are allowed to throw SyntaxError are permitted to exhibit implementation-defined behaviour instead of throwing SyntaxError when they encounter an implementation-defined extension to the program or regular expression syntax. However, we don't need to delete this provision, because: - we are relying on the verifier to catch any code that is syntactically invalid in ES3F; - Jacaranda code cannot call eval or use the Function constructor; - RegExp can be tamed to verify that no regexp strings and flags use extensions. (In practice it must be tamed anyway, because it is broken in SpiderMonkey, as discussed in the rationale for group EXPRESSIONS.) This text must not be interpreted to allow the behaviour of ES3F extensions to ES3 (const and useNewSemantics) to be implementation-defined. Annex B Compatibility - The additional syntax specified in section B.1 SHALL NOT be part of the ES3F grammar. - An ES3F program SHOULD NOT rely on the presence of any of the additional global functions, or properties of String and Date, specified in sections B.2.1 to B.2.6. Lexical Grammar =============== The lexical grammar of Jacaranda is that of ES3F with the additional constraints listed below. LEX_CODE_UNITS: Source code units, line terminators, and whitespace MUST ONLY be used as follows: :: { fatal U= check(c in , c, 'invalid code unit') } :: \u0009 // exclude ASCII controls except TAB, \u000A // LF, \u000D // CR. \u0020-\u007E // \u007F-\u009F controls \u00A0-\u00AC // \u00AD :Cf: \u00AE-\u05FF // \u0600-\u0603 :Cf: \u0604-\u06DC // \u06DD :Cf: \u06DE-\u070E // \u070F :Cf: \u0710-\u17B3 // \u17B4-\u17B5 :Cf: \u17B6-\u200A // \u200B-\u200F :Cf:; \u200C-\u200F !Safari !IE6 \u2010-\u2027 // \u2028-\u2029 !Cajita; \u202A-\u202E :Cf: !Safari !IE6 \u202F-\u205F // \u2060-\u2064 :Cf:; \u2065-\u2069 !Cajita; \u206A-\u206F :Cf: !Safari !IE6 \u2070-\uD7FF // \uD800-\uDFFF UTF-16 surrogates \uE000-\uFDCF // \uFDD0-\uFDEF noncharacters \uFDF0-\uFEFE // \uFEFF (BOM) :Cf: !Safari !IE6 \uFF00-\uFFEF // \uFFF0-\uFFF8 !Safari !Firefox; // \uFFF9-\uFFFB :Cf: // \uFFFC (object replacement character) // \uFFFD (replacement character) // \uFFFE-\uFFFF noncharacters :: \u0020-\u007E :: \u000A \u000D :: \u0009 \u0020 { fatal U= check(c == '\u0009' or c == '\u0020', c, 'invalid whitespace code unit') } Note that in ECMA-262 sections 7.2 and 7.3, both the tables and the grammar must be changed. This rule does not restrict the set of code units that may be represented by \u escapes. This rule does not affect the set of whitespace characters used in the specification of ToNumber (ECMA-262 section 9.3.1). Rationale: Start with \u0000-\uFFFF, and - remove code units that are not in XML 1.1's or that are in XML 1.1's (see below) - remove code units corresponding to BMP characters that are in category :Cf: (Format Control) in Unicode 5.1.1: (this is a superset of category :Cf: in all previous Unicode versions) - remove code units corresponding to BMP characters that are in category :Zs: (Whitespace) in Unicode 5.1.1: other than \u0009 and \u0020. - remove code units that are disallowed by Cajita according to . - remove \uFFFC and \uFFFD. and from XML 1.1 are used just because that is a slightly less arbitrary choice than us deciding on our own what control characters should be restricted. Disallowing :Cf: characters is in principle the wrong thing for proper internationalisation support. However, we eventually concluded that it's the only way to guarantee consistent and secure behaviour across browsers, which in this case trumps internationalisation. (The right thing would be to allow :Cf: characters other than \uFEFF only in string literals and comments, and to allow \uFEFF [byte order mark] only between tokens. Relaxing the rules later is easier than tightening them.) Whitespace is restricted to US-ASCII tab and space characters. There is no reason to internationalise whitespace used to separate tokens in source code. (Other space characters are still allowed in strings.) Also, the fact that vertical tab and form feed were treated as whitespace rather than as line separators was potentially confusing; Jacaranda disallows them. LEX_COMMENT_FIRST_CHARACTER: If the first code unit in a after «/*» is not «*» or \u0020, or if the first code unit in a after «//» is not «/» or \u0020, then all code units in the comment MUST be printable US-ASCII (that is, [\u0020-\u007E]). The necessary grammar change is given in group LEX_COMMENT_KEYWORDS. Rationale: This prevents an attack relevent to adversarial code review, where a comment that looks like «/*const*/» does not actually enforce constness, and similarly for other comments used by JSLint, for example. It also prevents the first character in a comment from being a non-US-ASCII character that is stripped on parsing, which might plausibly result in the LEX_COMMENT_NO_ATSIGN_EXTENSIONS group being bypassed. LEX_COMMENT_NO_ATSIGN_EXTENSIONS: The first character in any comment after «/*» or «//» MUST NOT be «@». The necessary grammar change is given in group LEX_COMMENT_KEYWORDS. Rationale: This may trigger non-ES3F-compliant extensions (for example or ). ADsafe rejects «@cc» anywhere in a comment. This is sufficient to prevent enabling of JScript conditional compilation, but not the Venkman debugger extensions. Caja and Cajita do not need to reject «@» in comments, because the output of cajoling does not include any comments. See . LEX_COMMENT_KEYWORDS: «/*const*/» and «/*fallthru*/» are treated as tokens. (Other comments are not tokens.) They MUST ONLY appear where specified in the Jacaranda grammar. Note the lack of spaces before and after «const» or «fallthru». Also note that only these exact spellings are recognised; for example, «/*\u0063onst*/» is not recognised as a «/*const*/» token, even though «\u0063» would represent «c» within a string literal. The lexical grammar changes resulting from this and groups LEX_COMMENT_FIRST_CHARACTER and LEX_COMMENT_NO_ATSIGN_EXTENSIONS are: :: ε :: * / opt * / { fatal U= check(right is , right, 'comment does not start with * or space and is not printable ASCII') } * opt * / \u0020 opt * / :: but not * or \u0020 or @ :: see ECMA-262 section 7.4 :: see ECMA-262 section 7.4 :: see EMCA-262 section 7.4 :: see ECMA-262 section 7.4 :: { fatal U= check(right is , right, 'comment does not start with / or space and is not printable ASCII') } / opt \u0020 opt :: :: but not or / or \u0020 or @ :: see ECMA-262 section 7.4 :: see ECMA-262 section 7.4 Rationale: /*const*/ and /*fallthru*/ must be ES3 comments in order for Jacaranda to remain a sublanguage of ES3F. (They are lexed as multiline comments, but should then be retained as tokens, while other comment tokens are thrown away.) See group ASSIGNABLE_EXPRESSIONS for a description of the use of /*const*/ (it is essentially the same as 'const' as implemented in some current browsers and proposed for ES3.1/ES-Harmony, but with static rejection instead of silent failure). See also . LEX_STRING_LITERALS: A MUST ONLY use one of the escape characters defined in below. The following attribution rules are used to compute the value of a string literal, which is needed for other rules. They correspond exactly to the procedure in ECMA-262 section 7.8.4, expressed in the same attribute grammar notion as the rest of this specification. The grammar has been altered to not require lookahead in the production. [# FIXME: change it back. This change is a good idea but should go in ES3.1, not here. #] :: " " { SV = ds.SV } ' ' { SV = ss.SV } :: ε { SV = "" } { SV = [c.CV] ++ rest.SV } \u005C 0 // no initial decimal { SV = ['\u0000'] ++ rest.SV } :: ε { SV = "" } { SV = [c.CV] ++ rest.SV } \u005C 0 { SV = ['\u0000'] ++ rest.SV } :: but not " or \u005C or { CV = c } \u005C { CV = e.CV } :: but not " or \u005C or or { CV = c } \u005C { CV = e.CV } :: ε { SV = "" } { SV = [c.CV] ++ rest.SV } :: ε { SV = "" } { SV = [c.CV] ++ rest.SV } \u005C 0 { SV = ['\u0000'] ++ rest.SV } :: but not ' or \u005C or { CV = c } \u005C { CV = e.CV } :: but not ' or \u005C or or { CV = c } \u005C { CV = e.CV } { CV = ce.CV } { CV = he.CV } { CV = ue.CV } { CV = se.CV } :: b { CV = '\u0008' } t { CV = '\u0009' } n { CV = '\u000A' } f { CV = '\u000C' } r { CV = '\u000D' } " { CV = '\u0022' } ' { CV = '\u0027' } / { CV = '\u002F' } \u005C { CV = '\u005C' } v { fatal U= error(se, '\v escape') } :: x { CV = [a.MV*16 + b.MV] } :: u { CV = [a.MV*4096 + b.MV*256 + c.MV*16 + d.MV] } :: 0 { MV = 0 } 1 { MV = 1 } 2 { MV = 2 } 3 { MV = 3 } 4 { MV = 4 } 5 { MV = 5 } 6 { MV = 6 } 7 { MV = 7 } 8 { MV = 8 } 9 { MV = 9 } A { MV = 10 } B { MV = 11 } C { MV = 12 } D { MV = 13 } E { MV = 14 } F { MV = 15 } a { MV = 10 } b { MV = 11 } c { MV = 12 } d { MV = 13 } e { MV = 14 } f { MV = 15 } Rationale: JScript does not implement «\v» (it treats it as «v»). This could allow lexer confusion attacks. Unlike Caja, we cannot rewrite it to a different escape. There is no reason to allow «\» followed by an escape character that does not have a defined meaning. However, we must allow «\/» in order to remain a superset of JSON. (Unfortunately, JSON does not allow «\'», otherwise JSON and Jacaranda would support identical syntax for double-quoted strings. We considered disallowing «\'» in double-quoted strings, but that would create an inconsistency between single- and double-quoting.) LEX_NUMERIC_LITERALS: A MUST NOT have more then 20 significant digits. The following attribute definitions are used to compute the value of a numeric literal, which is needed for other rules. They correspond exactly to the procedure in ECMA-262 section 7.8.3, expressed in the same attribute grammar notion as the rest of this specification. :: . { MV = dil.MV; fatal U= check(dil.sig <= 20, dl, '> 20 significant digits in decimal literal') } . { reparse (dil ++ dd) as else { fatal U= error(dl, 'cannot happen') }; MV = dil.MV + dd.MV*10^-(dd.n); fatal U= check(digits.sig <= 20, dl, '> 20 significant digits in decimal literal') } . { MV = dil.MV*10^(ep.MV); fatal U= check(dil.sig <= 20, dl, '> 20 significant digits in decimal literal') } . { reparse (dil ++ dd) as else { fatal U= error(dl, 'cannot happen') }; MV = (dil.MV + dd.MV*10^-(dd.n))*10^(ep.MV); fatal U= check(digits.sig <= 20, dl, '> 20 significant digits in decimal literal') } . { MV = dd.MV*10^-(dd.n); fatal U= check(dd.sig <= 20, dl, '> 20 significant digits in decimal literal') } . { MV = dd.MV*10^(ep.MV - dd.n); fatal U= check(dd.sig <= 20, dl, '> 20 significant digits in decimal literal') } { MV = dd.MV*10^(ep.MV); fatal U= check(dil.sig <= 20, dl, '> 20 significant digits in decimal literal') } :: { MV = d.MV; n = 1 } { MV = nzd*10^(dd.n) + dd.MV; n = dd.n + 1 } :: 0 { MV = 0; sig = 0; n = 1 } { sig = 1; n = 1 } 0 { MV = rest.MV; sig = rest.sig; n = rest.n + 1 } { MV = nzd.MV*10^(rest.n) + rest.MV; sig = rest.sig + 1; n = rest.n + 1 } :: 0 { MV = 0 } :: 1 { MV = 1 } 2 { MV = 2 } 3 { MV = 3 } 4 { MV = 4 } 5 { MV = 5 } 6 { MV = 6 } 7 { MV = 7 } 8 { MV = 8 } 9 { MV = 9 } :: { MV = si.MV } :: one of e E :: + { MV = dd.MV } - { MV = -(dd.MV) } :: 0x { MV = hd.MV } 0X { MV = hd.MV } { MV = hil.MV*16 + hd.MV } : see group LEX_STRING_LITERALS Rationale: ECMA-262 section 7.8.3 says that: Once the exact MV for a numeric literal has been determined, it is then rounded to a value of the Number type. If the MV is 0, then the rounded value is +0; otherwise, the rounded value must be /the/ number value for the MV (in the sense defined in 8.5), unless the literal is a and the literal has more than 20 significant digits, in which case the number value may be either the number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th significant digit position. Therefore decimal literals with more than 20 significant digits cannot be used to obtain any additional precision. (Note that 20 significant digits are more than sufficient to round-trip any IEEE double value.) This restriction is also necessary in order to enforce that properties in object literals are unique. The issue here is that if the conversion to an IEEE double is nondeterministic, then the verifier may fail to detect names that should be treated as colliding (in groups OBJECT_LITERALS and MODULE_DEFINITION). LEX_NO_REGEXP_LITERALS: Regexp literals MUST NOT be used. :: / / { fatal U= error(re, 'regexp literal') } Rationale: ES3 requires that each regexp literal in a program share a single RegExp object. However RegExp objects are mutable, which makes this sharing error-prone, and provides implicit communication channels that should not be present in an object-capability language. Omitting regexp literals also simplifies lexing. A JavaScript lexer must be implemented using feedback from the parser, because whether a '/' character (not in a comment) is lexed as part of a or a depends on whether a is syntactically allowed at that position. LEX_NO_SEMICOLON_INSERTION: A Jacaranda source text MUST NOT trigger "automatic semicolon insertion" as defined in ECMA-262 section 7.9 when interpreted as an ES3F program. The following restrictions are needed to ensure that this does not occur: * The syntactic grammar is interpreted strictly, without applying the rules of ECMA-262 section 7.9.1; * A postfix ++ or -- operator MUST be on the same line as the last token of its operand; * The next token (i.e. identifier, expression, or semicolon) after a 'continue', 'break', 'return' or 'throw' keyword MUST be on the same line as that keyword. Rationale: First read . [# FIXME: not stable reference #] It is obvious that we do not want "syntactic semicolon insertion" -- it is too difficult to specify and too difficult for programmers to understand its consequences. But is the ES3.1 wiki page correct in asserting that "grammatical semicolon insertion" is "relatively harmless"? First note that the page is not correct in stating that grammatical semicolon insertion can be implemented by removing terminating semicolons from the grammar. That is true in some cases: specifically, it is true for the semicolons that terminate 'do', 'continue' and 'break' statements. In these cases, if the semicolon were removed from the grammar but present in a given source text, it would be interpreted as an , which has no semantic effect. So, after 'do', 'continue', and 'break' statements, semicolon insertion would indeed be relatively harmless: at most, a human reader or a parser would need one token lookahead to understand/parse the resulting language unambiguously. For the 'return', 'var', 'throw' and expression statements, however, this approach fails -- in a way that demonstrates the basic problem with implicit semicolon insertion of any kind. Consider the example: { var a = x + y; } If the semicolon terminating a were simply omitted from the grammar, then the grammar would be ambiguous because this example could be parsed as either { var a = x; + y; } or { var a = x + y; } both of which are syntactically valid. In ES3 it will be parsed as the latter -- but if a programmer mistypes + as ++, for example: { var a = x ++ y; } then that will be parsed as { var a = x; ++ y; } In other words, if the programmer makes any syntax error in a variable initialiser, return, throw, or expression statement, then under ES3's semicolon insertion rules it might silently be interpreted as something quite different. To determine the resulting parse, we would need to consider every possible location at which a semicolon could be inserted in the program beyond that point, with _unbounded_ lookahead (it is not even sufficient to consider until the next explicit semicolon, in general). This is clearly unacceptable. By this argument, if Jacaranda were to support grammatical semicolon insertion, it could only safely do so after 'do', 'continue' and 'break' statements, and not after 'return', 'var', 'throw' or expression statements. That would create an ugly inconsistency in the language. It is better to simply disallow semicolon insertion entirely, at the cost of accepting fewer JavaScript programs. [A somewhat related Mozilla bug report is . This report is interesting because the developers indicate that they know ES3 explicitly excludes the case in question from the semicolon insertion rules (because there is no LineTerminator), but they change the SpiderMonkey implementation to perform it anyway, in order to work around an error in *one* web page. This attitude, that perceived compatibility problems should trump standards compliance, seems not to be uncommon among JavaScript implementors, and is quite worrying from a security point of view. (The specific case, and similar cases, will be correctly rejected by a Jacaranda verifier.)] LEX_CLARIFICATIONS: This group merely restates two requirements of the ES3 lexical grammar that are not reflected in the actual grammar productions: ECMA-262 section 7.3 specifies that "A line terminator cannot occur within any token, not even a string." Therefore, the "line continuation" syntax supported by some ECMAScript implementations, in which a line terminator occurs immediately after a «\» code unit, MUST NOT be used in Jacaranda. As specified in ECMA-262 section 7.8.3, the source character immediately following a MUST NOT be an or . Module Definitions ================== MODULE_DEFINITION: A module definition MUST be parseable according to the Jacaranda lexical grammar; then, the resulting token stream MUST be parseable as the production below. Each of the 'imports' and 'optionalImports' properties of a module, if present, MUST be a literal array of string literals, none of which are a . The *import set* of a module is the union of the set of strings that occur in the module's 'import' property (or the empty set if not present) and its 'optionalImports' property (or the empty set if not present), and the implicit imports defined below. For each freely-used identifier of a module (as defined by the SCOPE_OF_IDENTIFIERS group), a string corresponding to the identifier name MUST occur in the module's import set. A module MUST NOT have any freely-assigned identifiers. : ( function ( $_ ) { with ( $_ ) { useNewSemantics ; return ; } } ) ; { DependOnNewSemantics = t.text == '$newmodule'; errors U= mod.fatal } : $module $newmodule : { errors U= { error(mlit, id, 'freely-used identifier not declared as an import') | id <- mlit.freelyUsed \ (mlit.imports U mlit.optionalImports U IMPLICIT_IMPORTS) } U { error(mlit, id, 'freely-assigned identifier at module level') | id <- mlit.freelyAssigned } IMPLICIT_IMPORTS = ES3F_BUILTINS U : // sef stands for "side-effect free" { } : { names = {nv.name} } , { names = list.names union {nv.name}, errors U= check(nv.name notin list.names, nv.name, 'duplicate property name in module object literal') } : : { name = n.name, if (name == 'imports') { imports = literalStringsToSet(v.value); errors U= check(nv.imports != undefined, nv, 'value of imports property is not a list of string literals') U { check(id isnot , nv, 'imported identifier is protected') | id <- nv.imports }; } if (name == 'optionalImports') { optionalImports = literalStringsToSet(v.value); errors U= check(nv.optionalImports != undefined, nv, 'value of optionalImports property is not a list of string literals') U { check(id isnot , nv, 'optionally imported identifier is protected') | id <- nv.optionalImports }; } } : null { value = undefined } true { value = undefined } false { value = undefined } { value = sl.SV } { value = undefined } - { value = undefined } { value = undefined } { value = undefined; errors U= check('this' notin f.freelyUsed, f, '"this" used freely by a function referenced in a module property') } : [ ] { value = [] } [ ] { value = list.value } : { value = undefined } , { value = undefined } { value = exp.value } , { value = (exp.value == undefined) ? undefined : list.value ++ [x.value] } : see ECMA-262 section 11.1.4 : see group EXPRESSIONS literalStringsToSet([]) = {} literalStringsToSet(x:xs) = undefined, if x.SV == undefined = {x.SV} U literalStringsToSet(xs), otherwise Rationale: An object literal is used as the argument to enable extensibility. Evaluating the object literal itself must have no side effects (because it will be evaluated before the call to $module or $newmodule). This is not a significant constraint on the main code of the module, which can be defined in a function expression. A deeply immutable copy of the object created from the object literal will be provided to caplets created from this module. This allows caplets created from the same module to share an arbitrary amount of immutable data (without needing one copy of it per caplet). Note that only anonymous function expressions are allowed, since a named function expression might have the side effect of defining the name in the surrounding scope. (This restriction applies to Jacaranda in general, not just the top level of a module definition -- see group FUNCTION_EXPRESSIONS.) The module definition could technically have been passed in as a string literal and then 'eval'ed, but that option was quickly rejected because it would require excessive use of quoting in the module code. does not attempt to be the largest possible side-effect-free subset of ES3F expressions, just one that is sufficiently expressive for any meta-information that we might want to use to describe a module. In fact it is essentially JSON, except that: - anonymous functions are included; - comments are included; - string literals use the syntax specified by ES3F, not JSON; - property names in object literals can be identifiers, not only string literals. [Thanks to Kris Zyp for the idea that "JSON plus anonymous functions" is a potentially useful sublanguage of JavaScript, although we're not using it exactly as he suggested.] Trailing commas are not allowed in , because Jacaranda also does not allow them in (see group ARRAY_LITERALS). One way to submit a module to a Jacaranda interpreter is to simply execute the module definition as ES3F code -- for example in a