The rest of this page is out of date

FullPostfixExpression

PrimaryExpression

| FullPostfixExpression MemberOperator

| FullPostfixExpression Arguments

| PostfixExpression [no line break] ++

| PostfixExpression [no line break] --

FullNewExpression new FullNewSubexpression Arguments

ShortNewExpression new ShortNewSubexpression

FullNewSubexpression

PrimaryExpression

| FullNewSubexpression MemberOperator

ShortNewSubexpression

FullNewSubexpression

MemberOperator

[ ArgumentList ]

| . QualifiedIdentifier

| . ParenthesizedExpression

| @ QualifiedIdentifier

| @ ParenthesizedExpression

The @ operator performs a type cast. The second operand specifies the type. Both the . and the @ operators accept either a QualifiedIdentifier or a ParenthesizedExpression as the second operand. If it is a ParenthesizedExpression, the second operand of . must evaluate to a string. a.(x) is a synonym for a[x] except that the latter can be overridden via operator overloading.

The [] operator can take multiple (or even named) arguments. This allows users to define data structures such as multidimensional arrays via operator overloading.

Arguments ( ArgumentList )

ArgumentList

«empty»

| ArgumentListPrefix

| NamedArgumentListPrefix

ArgumentListPrefix

AssignmentExpression^allowIn

| ArgumentListPrefix , AssignmentExpression^allowIn

NamedArgumentListPrefix

LiteralField

| ArgumentListPrefix , LiteralField

| NamedArgumentListPrefix , LiteralField

An ArgumentList can contain both positional and named arguments. Named arguments use the same syntax as object literals.

Prefix Unary Operators

UnaryExpression

PostfixExpression

| delete PostfixExpression

| typeof UnaryExpression

| eval UnaryExpression

| ++ PostfixExpression

| -- PostfixExpression

| + UnaryExpression

| - UnaryExpression

| ~ UnaryExpression

| ! UnaryExpression

Multiplicative Operators

MultiplicativeExpression

UnaryExpression

| MultiplicativeExpression * UnaryExpression

| MultiplicativeExpression / UnaryExpression

| MultiplicativeExpression % UnaryExpression

Additive Operators

AdditiveExpression

MultiplicativeExpression

| AdditiveExpression + MultiplicativeExpression

| AdditiveExpression - MultiplicativeExpression

Bitwise Shift Operators

ShiftExpression

AdditiveExpression

| ShiftExpression << AdditiveExpression

| ShiftExpression >> AdditiveExpression

| ShiftExpression >>> AdditiveExpression

Relational Operators

RelationalExpression^allowIn

ShiftExpression

| RelationalExpression^allowIn < ShiftExpression

| RelationalExpression^allowIn > ShiftExpression

| RelationalExpression^allowIn <= ShiftExpression

| RelationalExpression^allowIn >= ShiftExpression

| RelationalExpression^allowIn instanceof ShiftExpression

| RelationalExpression^allowIn in ShiftExpression

RelationalExpression^noIn

ShiftExpression

| RelationalExpression^noIn < ShiftExpression

| RelationalExpression^noIn > ShiftExpression

| RelationalExpression^noIn <= ShiftExpression

| RelationalExpression^noIn >= ShiftExpression

| RelationalExpression^noIn instanceof ShiftExpression

Equality Operators

EqualityExpression^b

RelationalExpression^b

| EqualityExpression^b == RelationalExpression^b

| EqualityExpression^b != RelationalExpression^b

| EqualityExpression^b === RelationalExpression^b

| EqualityExpression^b !== RelationalExpression^b

Binary Bitwise Operators

BitwiseAndExpression^b

EqualityExpression^b

| BitwiseAndExpression^b & EqualityExpression^b

BitwiseXorExpression^b

BitwiseAndExpression^b

| BitwiseXorExpression^b ^ BitwiseAndExpression^b

| BitwiseXorExpression^b ^ *

| BitwiseXorExpression^b ^ ?

BitwiseOrExpression^b

BitwiseXorExpression^b

| BitwiseOrExpression^b | BitwiseXorExpression^b

| BitwiseOrExpression^b | *

| BitwiseOrExpression^b | ?

Binary Logical Operators

LogicalAndExpression^b

BitwiseOrExpression^b

| LogicalAndExpression^b && BitwiseOrExpression^b

LogicalXorExpression^b

LogicalAndExpression^b

| LogicalXorExpression^b ^^ LogicalAndExpression^b

The ^^ operator is a logical exclusive-or operator. It evaluates both operands. If they both convert to true or both convert to false, then ^^ returns false; otherwise ^^ returns the unconverted value of whichever argument converted to true.

LogicalOrExpression^b

LogicalXorExpression^b

| LogicalOrExpression^b || LogicalXorExpression^b

Conditional Operator

ConditionalExpression^b

LogicalOrExpression^b

| LogicalOrExpression^b ? AssignmentExpression^b : AssignmentExpression^b

NonAssignmentExpression^b

LogicalOrExpression^b

| LogicalOrExpression^b ? NonAssignmentExpression^b : NonAssignmentExpression^b

Assignment Operators

AssignmentExpression^b

ConditionalExpression^b

| PostfixExpression = AssignmentExpression^b

| PostfixExpression CompoundAssignment AssignmentExpression^b

CompoundAssignment

*=

| /=

| %=

| +=

| -=

| <<=

| >>=

| >>>=

| &=

| ^=

| |=

| &&=

| ^^=

| ||=

Expressions

Expression^b

AssignmentExpression^b

| Expression^b , AssignmentExpression^b

OptionalExpression

Expression^allowIn

| «empty»

Type Expressions

TypeExpression^b NonAssignmentExpression^b

JavaScript 2.0

Core Language

Statements

Thursday, November 11, 1999

Most of the behavior of statements is the same as in JavaScript 1.5. Differences are highlighted below.

w {abbrev, abbrevNonEmpty, abbrevNoShortIf, full}

TopStatement^w

Statement^w

| LanguageDeclaration^w

Statement^w

AnnotatedDefinition^w

| EmptyStatement^w

| ExpressionStatement Semicolon^w

| AnnotatedBlock

| LabeledStatement^w

| IfStatement^w

| SwitchStatement

| DoStatement Semicolon^w

| WhileStatement^w

| ForStatement^w

| WithStatement^w

| ContinueStatement Semicolon^w

| BreakStatement Semicolon^w

| ReturnStatement Semicolon^w

| ThrowStatement Semicolon^w

| TryStatement

Semicolon^abbrev

;

| VirtualSemicolon

| «empty»

Semicolon^{abbrevNonEmpty}

;

| VirtualSemicolon

| «empty»

Semicolon^{abbrevNoShortIf}

;

| VirtualSemicolon

| «empty»

Semicolon^full

;

| VirtualSemicolon

Empty Statement

EmptyStatement^abbrev

;

| «empty»

EmptyStatement^{abbrevNonEmpty} ;

EmptyStatement^{abbrevNoShortIf} ;

EmptyStatement^full ;

Expression Statement

ExpressionStatement [lookahead{function, {}] Expression^allowIn

Block

AnnotatedBlock

Block

| Visibility [no line break] Block

Block { TopStatements }

TopStatements

TopStatement^abbrev

| TopStatementsPrefix TopStatement^{abbrevNonEmpty}

TopStatementsPrefix

TopStatement^full

| TopStatementsPrefix TopStatement^full

Boxes

A box has the syntax:

box { Statement ... Statement }

A box behaves like a regular block except that it forms its own scope. Variable and function definitions without a Visibility prefix inside the box belong to that box instead of the global scope or the enclosing function, class, or box.

Visibility-Specifying Blocks

A block can be annotated with a Visibility prefix as follows:

Visibility { Statement ... Statement }

Such a block behaves like a regular block except that every declaration inside that block (but not inside any enclosed function, class, box, or nested visibility-specifying block) that does not have an explicit Visibility prefix uses the Visibility prefix given by the block.

Visibility-specifying blocks are useful to define several items without having to repeat a Visibility prefix for each one. For example,

class foo {
  field z:integer;
  public field a;
  private field b;
  public method f() {}
  public method g(x:integer) {}
}

is equivalent to:

class foo {
  field z:integer;
  public {
    field a;
    private field b;
    method f() {}
    method g(x:integer) {}
  }
}

Labeled Statements

LabeledStatement^w Identifier : Statement^w

If Statement

IfStatement^abbrev

if ParenthesizedExpression Statement^abbrev

| if ParenthesizedExpression Statement^{abbrevNoShortIf} else Statement^abbrev

IfStatement^{abbrevNonEmpty}

if ParenthesizedExpression Statement^{abbrevNonEmpty}

| if ParenthesizedExpression Statement^{abbrevNoShortIf} else Statement^{abbrevNonEmpty}

IfStatement^full

if ParenthesizedExpression Statement^full

| if ParenthesizedExpression Statement^{abbrevNoShortIf} else Statement^full

IfStatement^{abbrevNoShortIf} if ParenthesizedExpression Statement^{abbrevNoShortIf} else Statement^{abbrevNoShortIf}

The semicolon is optional before the else.

Switch Statement

SwitchStatement

switch ParenthesizedExpression { }

| switch ParenthesizedExpression { CaseGroups LastCaseGroup }

CaseGroups

«empty»

| CaseGroups CaseGroup

CaseGroup CaseGuards CaseStatementsPrefix

LastCaseGroup CaseGuards CaseStatements

CaseGuards

CaseGuard

| CaseGuards CaseGuard

CaseGuard

case Expression^allowIn :

| default :

CaseStatements

Statement^abbrev

| CaseStatementsPrefix Statement^{abbrevNonEmpty}

CaseStatementsPrefix

Statement^full

| CaseStatementsPrefix Statement^full

Do-While Statement

DoStatement do Statement^{abbrevNonEmpty} while ParenthesizedExpression

The semicolon is optional before the closing while.

While Statement

WhileStatement^w while ParenthesizedExpression Statement^w

For Statements

ForStatement^w

for ( ForInitializer ; OptionalExpression ; OptionalExpression ) Statement^w

| for ( ForInBinding in Expression^allowIn ) Statement^w

ForInitializer

«empty»

| Expression^noIn

| VariableDefinitionKind VariableBindingList^noIn

ForInBinding

PostfixExpression

| VariableDefinitionKind VariableBinding^noIn

With Statement

WithStatement^w with ParenthesizedExpression Statement^w

Continue and Break Statements

ContinueStatement continue [no line break] OptionalLabel

BreakStatement break [no line break] OptionalLabel

OptionalLabel

«empty»

| Identifier

Return Statement

ReturnStatement return [no line break] OptionalExpression

Throw Statement

ThrowStatement throw [no line break] Expression^allowIn

Try Statement

TryStatement

try AnnotatedBlock CatchClauses

| try AnnotatedBlock FinallyClause

| try AnnotatedBlock CatchClauses FinallyClause

CatchClauses

CatchClause

| CatchClauses CatchClause

CatchClause catch ( TypedIdentifier^allowIn ) AnnotatedBlock

FinallyClause finally AnnotatedBlock

Programs

Program TopStatements

JavaScript 2.0

Core Language

Definitions

Thursday, November 11, 1999

Definitions

AnnotatedDefinition^w

Visibility [no line break] Definition^w

| Definition^w

Definition^w

VariableDefinition Semicolon^w

| FunctionDefinition

| MemberDefinition^w

| ClassDefinition

Definition Visibility

Visibility

ParenthesizedExpression

| local

| box

| private

| package

| public

| Identifier

Any definition can have a Visibility prefix. That prefix specifies the following:

The scope to which the definition applies
The definition's visibility, if any, outside that scope
Static type and version requirements for referencing the entity being declared

A Visibility prefix can be one of the prefixes in the table below, or it can be user-defined. User-defined Visibility prefixes allow the author of a package P to control definition visibility based on the version by which a client package imports P. User-defined Visibility prefixes also allow definition access to be controlled by the manner in which a client attempts to reference the definition.

The following are the predefined Visibility prefixes. The access privileges they provide are described in more detail in the next section. Unless overridden, the default Visibility is box.

`Visibility`	Access allowed from
`local`	only within current block
`box`	only within current package (when applied to a class member), function, or box
`private`	only within current class
`package`	only within current package
`public`	within any package that imports this package

Terminology

To understand the scope to which a definition applies we need to define a few terms. In the definitions below D represents a variable, function, member, or class definition.

Containing block: The containing block of D is the innermost Block (including function and class Blocks) lexically enclosing D. If there is no such block because D is at the top level, then the containing block of D is the package scope.
Containing box: The containing box of D is the innermost function Block, class Block, or box Block lexically enclosing D. If there is no such block, then the containing box of D is the package scope.
Containing class: The containing class of D is the innermost class Block lexically enclosing D. If there is no such block, then the containing class of D is the package scope.
Containing visibility specifier: The containing visibility specifier of D is the innermost function Block, class Block, box Block, or visibility-specifying block lexically enclosing D. If there is no such block, then the containing visibility specifier of D is the package scope.
Visibility default: If the containing visibility specifier of D is a visibility-specifying block E, then the visibility default of D is E's Visibility prefix. Otherwise, the visibility default of D is box.

Scope Rules

To determine the scope S to which a definition D applies, we look up the definition's Visibility prefix in the table below. A definition without a Visibility prefix uses its visibility default prefix.

`Visibility`	Scope where entity is declared
`local`	`D`'s containing block
`box`	`D`'s containing box
`private`	`D`'s containing class
`package`	`D`'s containing class
`public`	`D`'s containing class
User-defined	`D`'s containing class

The scope S is not the scope in which the definition is accessible; rather, it is the scope into which the declared entity is inserted.

If S is a class and Visibility is not local, then the declared entity will appear as a member of class S. If S is a class and Visibility is local, then the declared entity will only be created inside class S's block without becoming a member of class S; it is an error if this case arises for a method or field definition.

Once the scope S is known, the accessibility of definition D is determined by the table below. P is the lexically enclosing package.

`Visibility` prefix	Scope `S` is ...
`Visibility` prefix	a package `P`	a class `C`	a function `F`	a box `B`	a block `B`
`local`	Package `P`	Class `C`	Function `F`	Box `B`	Block `B`
`box`	Package `P`	Package `P`	Function `F`	Box `B`
`private`	Package `P`	Class `C`
`package`	Package `P`	Package `P`
`public`	Any package	Any package
User-defined	User-defined	User-defined

The rest of this page is out of date

All of these definitions share several common scoping rules:

A definition that applies to a scope can be referenced lexically from anywhere within that scope unless shadowed by a more local definition.
A definition that applies to a scope lasts until that scope is exited. No other definition may be executed for the same identifier applying to the same scope (with the exceptions that both a getter and a setter may be defined with the same name and that versions have a namespace separate from other definitions).
If code executing inside a scope s has already made an attempt to resolve identifier i and that resolution either bound i to a definition of i in a scope enclosing s or failed because i wasn't defined, then no definition of i applying to scope s may be executed.

Rules 3 and 4 state that once an identifier is resolved to a variable or function in a scope, that resolution cannot be changed. This permits efficient compilation and avoids confusion with programs such as:

const b:integer = 7;

function f():integer {
  function g():integer {return b}

  var a = g();
  const b:integer = 8;
  return g() - a;
}

Scopes

Definitions at the top level of a Program or at the top level of a ClassDefinition's Block may omit Visibility, in which case they are treated as if they had package visibility. When used outside a ClassDefinition's Block, private is equivalent to package.

does not apply to the current Block. Instead, it declares either an entity at the top level of the current package (if outside a ClassDefinition's Block) or a member of the current class (if inside a ClassDefinition's Block). In addition to lifting the definition out of the current scope in this way, Visibility also specifies the definition's visibility from other packages or classes. Visibility can take one of the following forms:

Most lexical scopes are established by Block productions in the grammar. Lexical scopes nest, and a definition in an inner scope can shadow definitions in outer ones.

In the example below the comments indicate the scope and visibility of each definition:

var a0;                 // Package-visible global variable
private var a1 = true;  // Package-visible global variable
package var a2;         // Package-visible global variable
public var a3;          // Public global variable

if (a1) {
  var b0;               // Local to this block
  private var b1;       // Package-visible global variable
  package var b2;       // Package-visible global variable
  public var b3;        // Public global variable
}

public function F() {   // Public global function
  var c0;               // Local to this function
  private var c1;       // Package-visible global variable
  package var c2;       // Package-visible global variable
  public var c3;        // Public global variable
}

function G() {          // Package-visible global function
  var d0;               // Never defined because G isn't called
  private var d1;       // Never defined because G isn't called
  package var d2;       // Never defined because G isn't called
  public var d3;        // Never defined because G isn't called
}

class C {               // Package-visible global class
  var e0;               // Package-visible class variable
  private var e1;       // Class-visible class variable
  package var e2;       // Package-visible class variable
  public var e3;        // Public class variable
  field e4;             // Package-visible instance variable
  private field e5;     // Class-visible instance variable
  package field e6;     // Package-visible instance variable
  public field e7;      // Public instance variable

  function H() {        // Package-visible class function
    var f0;             // Local to this function
    private var f1;     // Class-visible class variable
    package var f2;     // Package-visible class variable
    public var f3;      // Public class variable
    private field f4;   // Class-visible instance variable
    package field f5;   // Package-visible instance variable
    public field f6;    // Public instance variable
  }
  public method I() {}  // Public class method

  H();
}

F();

Versioning Public Identifiers

A public definition's identifier is exported to other packages. To help avoid accidental collisions between identifiers declared in different packages, identifiers can be selectively exported depending on the version requested by an importing package. An identifier definition with a version number newer than that requested by the importer will not be seen by that importer. The versioning facilities also include additional facilities that allow robust removal and renaming of identifiers.

VersionsAndRenames describes the set of versions in which an identifier is exported, together with a possible alias for the identifier:

VersionsAndRenames

[VersionRange [: Identifier] , ... , VersionRange [: Identifier]]

VersionRange

Version

| [Version] .. [Version]

Suppose a client package C imports version V of package P that exports identifier N with some VersionsAndRenames. If the VersionsAndRenames's VersionRange includes version V, then package C can use the corresponding Identifier alias to access package P's N. If the Identifier alias is omitted, then package C can use N to access package P's N. Multiple VersionRanges operate independently.

In most cases VersionsAndRenames is just a Version name (a string):

public "1.2" const z = 3;

If VersionsAndRenames is omitted, the default version "" is assumed.

Discussion

Scopes

Do we want to collapse all block scopes into one inside functions? On one hand this complicates the language conceptually and surprises Java and C++ programmers. On the other hand, this would match JavaScript 1.5 better and simplify closure creation when a closure is created nested inside several blocks in a function.

Visibilities

Should we make private illegal outside a class rather than making it equivalent to package?

Should we introduce a local Visibility prefix that explicitly means that the definition is visible locally? This wouldn't provide any additional functionality but it would provide a convenient name for talking about the four kinds of visibility prefixes.

What should the default visibilities be? The current defaults are loosely modeled after Java:

Definition Location	Default visibility
Package top level	`package` (equivalent to `local` in this case)
Inside a statement outside a function or class	`local`
Function or method code's top level	`local`
Inside a statement inside a function or method	`local`
Class definition block's top level	`package`
Inside a statement inside a class definition block	`local`

Should we have a protected Visibility? It has been omitted for now to keep the language simple, but there does not appear to be any fundamental reason why it could not be supported. If we do support it, should we choose the C++ protected concept (visible only in class and subclasses) or the Java protected concept (visible in class, subclasses, and the original class's package)?

JavaScript 2.0

Core Language

Variables

Thursday, November 11, 1999

Variable Definition

VariableDefinition VariableDefinitionKind VariableBindingList^allowIn

VariableDefinitionKind

var

| const

VariableBindingList^b

VariableBinding^b

| VariableBindingList^b , VariableBinding^b

VariableBinding^b TypedIdentifier^b VariableInitializer^b

TypedIdentifier^b

Identifier

| Identifier : TypeExpression^b

VariableInitializer^b

«empty»

| = AssignmentExpression^b

The general syntax for defining variables is:

VariableDefinition

[Visibility] var Identifier [: TypeExpression] [= AssignmentExpression] , ... , Identifier [: TypeExpression] [= AssignmentExpression] ;

| [Visibility] const Identifier [: TypeExpression] = AssignmentExpression , ... , Identifier [: TypeExpression] = AssignmentExpression ;

A variable defined with var can be modified, while one defined with const cannot. Identifier is the name of the variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression is evaluated at the time the variable definition is evaluated and should evaluate to a type t.

If provided, AssignmentExpression gives the variable's initial value v. If not, undefined is assumed; an error occurs if undefined cannot be coerced to type t. AssignmentExpression is evaluated just after the TypeExpression is evaluated. The value v is then coerced to the variable's type t and stored in the variable. If the variable is defined using var, any values subsequently assigned to the variable are also coerced to type t at the time of each such assignment.

Multiple variables separated by commas can be defined in the same VariableDefinition. The values of earlier variables are available in the TypeExpressions and AssignmentExpressions of later variables.

If omitted, TypeExpression defaults to type any. Thus, the definition

var a, b=3, c:integer=7, d, e:type=boolean, f:number, g:e, h:int;

is equivalent to:

var a:Any=undefined;
var b:Any=3;
var c:integer=7;
var d:integer=undefined;  // coerced to +0
var e:type=boolean;
var f:number=undefined;   // coerced to +0
var g:boolean=undefined;  // coerced to false
var h:int=undefined;      // coerced to int(0)

`const` Definitions

const means that Identifier cannot be written after it is defined. It does not mean that Identifier will have the same value the next time it is bound. For example, the following is legal; a new j binding is created each time through the loop:

var k = 0;
for (var i = 0; i < 10; i++) {
  local const j = i;
  k += j;
}

JavaScript 2.0

Core Language

Functions

Thursday, November 11, 1999

Function Definition

FunctionDefinition

NamedFunction

| AccessorFunction

AnonymousFunction function FunctionSignature Block

NamedFunction function Identifier FunctionSignature Block

AccessorFunction

function get [no line break] Identifier FunctionSignature Block

| function set [no line break] Identifier FunctionSignature Block

FunctionSignature ParameterSignature ResultSignature

ParameterSignature ( Parameters )

Parameters

«empty»

| RestParameter

| RequiredParameters

| OptionalParameters

| RequiredParameters , RestParameter

| OptionalParameters , RestParameter

RequiredParameters

RequiredParameter

| RequiredParameters , RequiredParameter

OptionalParameters

OptionalParameter

| RequiredParameters , OptionalParameter

| OptionalParameters , OptionalParameter

RequiredParameter TypedIdentifier^allowIn

OptionalParameter TypedIdentifier^allowIn = AssignmentExpression^allowIn

RestParameter

...

| ... TypedIdentifier^allowIn

| ... TypedIdentifier^allowIn = AssignmentExpression^allowIn

ResultSignature

«empty»

| : TypeExpression^allowIn

The rest of this page is slightly out of date

Function Definitions

To define a function we use the following syntax:

FunctionDefinition

[Visibility] function [get | set] Identifier ( Parameters ) [: TypeExpression] Block

If Visibility is absent, the above declaration defines a local function within the current Block scope. If Visibility is present, the above declaration declares either a global function (if outside a ClassDefinition's Block) or a class function (if inside a ClassDefinition's Block) according to the declaration scope rules.

The function's result type is TypeExpression, which defaults to type Any if not given. If the function does not return a value, it's good practice to set TypeExpression to void to document this fact.

Block contains the function body and is evaluated only when the function is called.

Parameters

Parameters has one of the following forms:

Parameters

RequiredParameter , ... , RequiredParameter [, OptionalParameter ... , OptionalParameter] [, ... [Identifier]]

| ... [Identifier]

If the ... is present, the function accepts more arguments than just the listed parameters. If an Identifier is given after the ..., then that Identifier is bound to an array of arguments given after the listed parameters. That Identifier is declared locally as though by the declaration const array Identifier.

Individual parameters have the forms:

RequiredParameter

Identifier [: TypeExpression]

OptionalParameter

Identifier [: TypeExpression] = AssignmentExpression

TypeExpression gives the parameter's type and defaults to type Any. If the parameter name Identifier is followed by a =, then that parameter is optional. If the n^th parameter is optional and a call to this function provides fewer than n arguments, then the n^th parameter is set to the value of its AssignmentExpression, coerced to the n^th parameter's type if necessary. The n^th parameter's AssignmentExpression is evaluated only if fewer than n arguments are given in a call.

A RequiredParameter may not follow an OptionalParameter. If a function has n RequiredParameters and m OptionalParameters and no ... in its parameter list, then any call of that function must supply at least n arguments and at most n+m arguments. If this function has a ... in its parameter list, then any call of that function must supply at least n arguments. These restrictions do not apply to traditional functions.

The parameters' Identifiers are local variables with types given by the corresponding TypeExpressions inside the function's Block. Code in the Block may read and write these variables. Arguments are passed by value, so writes to these variables do not affect the passed arguments' values in the caller.

In addition to local variables generated by the parameters' Identifiers, each function also has a predefined arguments local variable which holds an array (of type const array) of all arguments passed to this function.

Evaluation Order

When a function is called, the following list indicates the order of evaluation of the various expressions in a FunctionDefinition. These steps are taken only after all of the arguments have been evaluated.

Evaluate the first parameter's TypeExpression to obtain a type t.
If the first parameter is optional and no argument has been supplied, evaluate the first parameter's AssignmentExpression and let it be the first parameter's value.
Coerce the argument (or default) value to type t and bind the parameter's Identifier to the result.
Repeat steps 1-3 for each additional parameter.
If the FunctionDefinition's Parameters ends with a ... followed by an Identifier, bind that Identifier to an array comprised of the zero or more leftover arguments not already bound to a parameter.
Evaluate the FunctionDefinition's result TypeExpression to obtain a result type r.
Evaluate the body.
Coerce the result to type r and return it.

Note that later TypeExpressions and AssignmentExpressions can refer to previously bound arguments. Thus, the following is legal:

function choice(boolean a, type b, b c, b d=) b {
  return a ? c : d;
}

The call choice(true,integer,8,4) would return 8, while choice(false,integer,6) would return 0 (undefined coerced to type integer).

Relationship to Methods and Classes

Unless the function is a traditional function, the function definition using the above syntax does not define a class; the function's name cannot be used in a new expression, and the function does not have a this parameter. Any attempt to use this inside the function's body is an error. To define a method that can access this, use the method keyword.

If a FunctionDefinition is located at a class scope (either because it is located the top level of a ClassDefinition's Block or it has a Visibility prefix and is located inside a ClassDefinition's Block), then the function is a static method of the class. Unlike C++ or Java, JavaScript 2.0 does not use the static keyword to indicate such functions; instead, instance methods (i.e. non-static methods) are defined using the method keyword.

Getters and Setters

If a FunctionDefinition contains the keyword get or set, then the defined function is a getter or a setter.

A getter must not take any parameters and cannot have a ... in its Parameters list. Unlike an ordinary function, a getter is invoked by merely mentioning its name without an Arguments list in any expression except as the destination of an assignment. For example, the following code returns the string <2,3,1>:

var x:integer = 0;
function get serialNumber():integer {return ++x}

var y = serialNumber;
return "<" + serialNumber + "," + serialNumber + "," + y + ">";

A setter must take exactly one required parameter and cannot have a ... in its Parameters list. Unlike an ordinary function, a setter is invoked by merely mentioning its name (without an Arguments list) on the left side of an assignment or as the target of a mutator such as ++ or --. The result of the setter becomes the result of the assignment. For example, the following code returns the string <1,2,43>:

var x:integer = 0;
function get serialNumber():integer {return ++x}
function set serialNumber(n:integer):integer {return x=n}

var s = "<" + serialNumber + "," + serialNumber;
serialNumber = 42;
return s + "," + serialNumber + ">";

A setter can have the same name as a getter in the same lexical scope. A getter or setter cannot be extracted from its variable, so the notion of the type of a getter or setter is vacuous; a getter or setter can only be called.

Contrast the following:

var x:integer = 0;
function f():integer {return ++x}
function g():Function {return f}
function get h():Function {return f}

f;     // Evaluates to function f
g;     // Evaluates to function g
h;     // Evaluates to function f (not h)
f();   // Evaluates to 1
g();   // Evaluates to function f
h();   // Evaluates to 2
g()(); // Evaluates to 3

We can use a getter and a setter to create an alias to another variable, as in:

function get myAlias() {return Pkg::var}
function set myAlias(x) {return Pkg::var = x}

myAlias = myAlias+4;

Traditional Functions

Traditional function definitions are provided for compatibility with JavaScript 1.5. The syntax is as follows:

TraditionalFunctionDefinition

[Visibility] traditional function Identifier ( Identifier , ... , Identifier ) Block

A function declared with the traditional keyword cannot have any argument or result type declarations, optional arguments, or getter or setter keyword. Such a function is treated as though every argument were optional and more arguments than just the listed ones were allowed. Thus, the definition

traditional function Identifier ( Identifier , ... , Identifier ) Block

behaves like the following function definition:

function Identifier ( Identifier = , ... , Identifier = , ... ) Block

Furthermore, a traditional function defines its own class and treats this in the same manner as JavaScript 1.5.

Functions in Expressions

Every function (except a getter or a setter) is also a value and has type Function. Like other values, it can be stored in a variable, passed as an argument, and returned as a result. The identifiers in a function are all lexically scoped.

Function Expressions

We can use a variant of a function definition to define a function inside an expression. The syntax is:

FunctionExpression

function [Identifier] ( Parameters ) [: TypeExpression] Block

This expression defines a function and returns it as a value of type Function. The function can be named by providing the Identifier, but this name is only accessible from inside the function's Block.

To avoid confusion between a FunctionDefinition and a FunctionExpression, a Statement (and a few other grammar nonterminals) may not begin with a FunctionExpression. To place a FunctionExpression at the beginning of a Statement, enclose it in parentheses.

A FunctionDefinition is merely convenient syntax for a const variable definition and a FunctionExpression:

[Visibility] function Identifier ( Parameters ) [: TypeExpression] Block

is equivalent to:

[Visibility] const Identifier : Function = function Identifier ( Parameters ) [: TypeExpression] Block ;

Function Calls

Unless a function is a getter or a setter, we call that function by listing its arguments in parentheses after the function expression, just as in JavaScript 1.5:

FullPostfixExpression

FullPostfixExpression ( AssignmentExpression , ... , AssignmentExpression )

| other postfix expressions

Discussion

Getters and Setters

By consensus in the ECMA TC39 modularity subcommittee, we decided to use the above syntax for getters and setters instead of:

FunctionDefinition

[Visibility] [getter | setter] function Identifier ( Parameters ) [: TypeExpression] Block

| TraditionalFunctionDefinition

The decision was based on aesthetics; neither syntax is more difficult to implement than the other.

Optional Parameters

Do we want to have a named rest parameter (as in the proposal above), or only support the arguments special local variable as in JavaScript 1.5? The main difference is in the handling of fixed arguments -- they must be added to the arguments array but can be omitted from the rest array.

Traditional Functions

The traditional keyword is ugly, so let's take a look at some alternatives. Unless we want to continue to make each function into a class (as JavaScript 1.5 does), we need some way to indicate which functions are also classes and which ones are not. Also, we'd like to be able to indicate which functions can be called with more or fewer than the desired number of arguments and which cannot.

One possibility would be to state that any function that uses a type annotation in its signature (either the parameter list or the result type) is a new-style function and does not define a class; other functions would declare classes. Furthermore, new-style functions would have to be called with the exact number of arguments unless some parameters are optional or a ... is present in the parameter list. These are analogous to the rules that ANSI C used to distinguish new-style functions from traditional C functions. As with ANSI C, we have somewhat of a difficulty with functions that take no parameters; such functions would need to specify a return type to be considered new-style.

C++ did away with the ANSI C treatment of traditional C functions. We could do the same by having a pragma (analogous to Perl's use pragmas) that could indicate that all functions are to be considered new-style unless prefixed by the traditional keyword. If we do this, we should decide whether the default setting of this pragma would be on or off.

JavaScript 2.0

Core Language

Classes

Thursday, November 11, 1999

Class Member Definitions

MemberDefinition^w

FieldDefinition Semicolon^w

| MethodDefinition^w

| ConstructorDefinition

FieldDefinition field [no line break] VariableBindingList^allowIn

MethodDefinition^w

ConcreteMethodDefinition

| AbstractMethodDefinition^w

ConcreteMethodDefinition MethodPrefix [no line break] MethodName FunctionSignature Block

AbstractMethodDefinition^w MethodPrefix [no line break] MethodName FunctionSignature Semicolon^w

MethodPrefix

method

| override [no line break] method

| final [no line break] method

| final [no line break] override [no line break] method

MethodName

Identifier

| get [no line break] Identifier

| set [no line break] Identifier

ConstructorDefinition constructor [no line break] ConstructorName ParameterSignature Block

ConstructorName

new

| Identifier

Class Definition

ClassDefinition

class Identifier Superclasses Block

| class extends TypeExpression^allowIn Block

Superclasses

«empty»

| extends TypeExpression^allowIn

This page is out of date

Class Definitions

In JavaScript 2.0 we define classes using the class keyword. Limited classes can also be defined via JavaScript 1.5-style functions, but doing so is discouraged for new code.

ClassDefinition

[Visibility] class Identifier [extends TypeExpression] Block

| [Visibility] class extends TypeExpression Block

The first format declares a class with the name Identifier, binding Identifier to this class in the scope specified by the Visibility prefix (which usually includes the ClassDefinition's Block). Identifier is a constant variable with type type and can be used anywhere a type expression is allowed.

When the first ClassDefinition format is evaluated, the following steps take place:

A new type t is created.
If extends TypeExpression is given, TypeExpression is evaluated to obtain a type s, which must be another class. If extends TypeExpression is absent, type s defaults to the class Object.
Type t is made a subtype of type s.
Identifier is lexically bound in the scope given by Visibility; however, at this time Identifier does not have a legal type yet and any attempt to read or write it results in an error.
Block is evaluated.
If Block is evaluated successfully (without throwing out an exception), all const, var, function, constructor, and class declarations evaluated at its top level (or placed at its top level by the scope rules) become class members of type t. All field and method declarations evaluated at the Block's top level (or placed at its top level by the scope rules) become instance members of type t.
The value of Identifier becomes type t. From now on Identifier is a constant and its value cannot be altered.

A ClassDefinition's Block is evaluated just like any other Block, so it can contain expressions, statements, loops, etc. Such statements that do not contain declarations do not contribute members to the class being declared, but they are evaluated when the class is declared.

Class Extensions

If a ClassDefinition omits the class name Identifier, it extends the original class rather than creating a subclass. A class extension may define new methods and class constants and variables, but it does not have special privileges in accessing the original class definition's private members (or package members if in a separate package). A class extension may not override methods, and it may not define constructors or instance variables.

Each instance of the original class is automatically also an instance of the extended class. Several extensions can apply to the same class.

An extension is useful to add methods to system classes, as in the following code in some user package P:

class extends string {
  public method scramble() string {...}
  public method unscramble() string {...}
}

var x = "abc".scramble();

Once the class extension is evaluated, methods scramble and unscramble become available on all strings. There is no possibility of name clashes with extensions of class string in other, unrelated packages because the names scramble and unscramble belong to package P and not the system package that defines string. Any packages that import package P will also be able to call scramble and unscramble on strings, but other packages will not.

Members

A class has an associated set of class members and another set of instance members. Class members are properties of the class itself, while instance members are properties of each instance object of this class and have independent values for different instance objects.

Class members are one of the following:

Constants declared with the const keyword.
Class variables declared with the var keyword.
Class functions declared with the function keyword.
Constructors declared with the constructor keyword.
Nested classes declared with the class keyword.

Instance members are one of the following:

Fields declared with the field keyword.
Methods declared with the method keyword.

Members can only be defined within the intersection of the lexical and dynamic extent of a ClassDefinition's Block. A few examples illustrate this rule.

The code

var bool extended = false;

function callIt(x) {return x()}

class C {
  extended = true;
  public function square(integer x) integer {return x*x}
  if (extended) {
    public function cube(integer x) integer {return x*x*x}
  } else {
    public function reciprocal(number x) number {return 1/x}
  }

  field string firstName, lastName;
  method name() string {return firstName + lastName}

  public function genMethod(boolean b) {
    if (b) {
      public field time = 0;
    } else {
      public field date = 0;
    }
  }

  genMethod(true);
}

defines class C with members square (a class function), cube (a class function), firstName (an instance variable), lastName (an instance variable), name (an instance method), and genMethod (a class function).

On the other hand, executing the following code after the above example would be illegal due to three different errors:

genMethod(false);   // Field date declared outside of C's block's dynamic extent

public field color; // Field declared outside a class's block

function genField() {
  public field style;
}

class D {
  genField();       // Field style declared outside D's block's lexical extent
}

Visibility

While a ClassDefinition's Block is being evaluated, the already defined class members (other than constructors) are visible and usable by the code in that Block. Afterwards members can be accessed in one of several ways:

Code inside the ClassDefinition's Block can access class members merely by mentioning their names.
Code anywhere within the current class, anywhere within the current package (if a member's Visibility is package or omitted), or anywhere within the current package or any package that imports the appropriate version of the current package (if a member's Visibility is public) can access class members by using the . operator on the class.
Code anywhere within the current class, anywhere within the current package (if a member's Visibility is package or omitted), or anywhere within the current package or any package that imports the appropriate version of the current package (if a member's Visibility is public) can access instance members by using the . operator on any of the class's instances.

Inheritance

A subclass inherits all members except constructors from its superclass. Class variables have only one global value, not one value per subclass. A subclass may override visible methods, but it may not override or shadow any other visible members. On the other hand, imports and versioning can hide members' names from some or all users in importing packages, including subclasses in importing packages.

Member Definitions

We have already seen the definition syntax for variables and constants, functions, and classes. Any of these defined at a ClassDefinition's Block's top level (or placed at its top level by the scope rules) become class members of the class.

Fields, methods, and constructor definitions have their own syntax described below. These definitions must be lexically enclosed by a ClassDefinition's Block.

MemberDefinition

FieldDefinition

| MethodDefinition

| ConstructorDefinition

Field Definitions

FieldDefinition

[Visibility] field Identifier [: TypeExpression] [= AssignmentExpression] , ... , Identifier [: TypeExpression] [= AssignmentExpression] ;

A FieldDefinition is similar to a VariableDefinition except that it defines an instance variable of the lexically enclosing class. Each new instance of the class contains a new, independent set of instance variables initialized to the values given by the AssignmentExpressions in the FieldDefinition.

Identifier is the name of the instance variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression is evaluated at the time the variable definition is evaluated and should evaluate to a type t. The TypeExpressions and AssignmentExpressions are evaluated once, at the time the FieldDefinition is evaluated, rather than every time an instance of the class is constructed; their values are saved for use in constructors.

If omitted, TypeExpression defaults to type any.

If provided, AssignmentExpression gives the instance variable's initial value v. If not, undefined is assumed; an error occurs if undefined cannot be coerced to type t. AssignmentExpression is evaluated just after the TypeExpression is evaluated. The value v is then coerced to the variable's type t and stored in the instance variable. Any values subsequently assigned to the instance variable are also coerced to type t at the time of each such assignment.

Multiple instance variables separated by commas can be defined in the same FieldDefinition.

A field cannot be overridden in a subclass.

Method Definitions

MethodDefinition

[Visibility] [final] [override] method [get | set] Identifier ( Parameters ) [: TypeExpression] Block

| [Visibility] [final] [override] method [get | set] Identifier ( Parameters ) [: TypeExpression] ;

A MethodDefinition is similar to a FunctionDefinition except that it defines an instance method of the lexically enclosing class. Parameters, the result TypeExpression, and the body Block behave just like for function definitions, with the following differences:

Every method has a predefined parameter this that refers to the instance object of the method's class on which the method was called.
A method is not in itself a value and has no type. There is no way to extract an undispatched method from a class. The . operator produces a function (more specifically, a closure) that is already dispatched and has this bound to the left operand of the . operator.
There is no analogue to functions' traditional syntax for methods. Optional parameters must be specified explicitly.

We call a regular method by combining the . operator with a function call. For example:

class C { field x:integer = 3; method m() {return x} method n(x) {return x+4} } var c = new C; c.m(); // returns 3 c.n(7); // returns 11 var f:Function = c.m; // f is a zero-argument function with this bound to c f(); // returns 3 c.x = 8; f(); // returns 8

Method Overriding

A class c may override a method m defined in its superclass s. To do this, c should define a method m' with the same name as m and use the override keyword in the definition of m'. Overriding a method without using the override keyword or using the override keyword when not overriding a method results in a warning intended to catch misspelled method names. The warning is not an error to allow subclass c to either define a method if it is not present in s or override it if it is present in s -- this situation can arise when s is imported from a different package and provides several versions.

The overriding method m' does not have to have the same number or type of parameters as the overridden method m. In fact, since parameter types can be arbitrary expressions and are evaluated only during a call, checking for parameter type compatibility when the overriding method m is declared would require solving the halting problem. Moreover, defining overriding methods that are more general than overridden methods is useful.

A method defined with the final keyword cannot be overridden (or further overridden) in subclasses.

Getter and Setter Methods

If a MethodDefinition contains the keyword get or set, then the defined method is a getter or a setter. These are analogous to getter and setter functions in that they are invoked without listing the parentheses after the method name.

A getter or setter method cannot be overridden. We could relax this restriction, but then we'd also have to allow overriding of fields by getters, setters, or other fields, and, as a corollary, allow fields to be declared final.

Constructor Definitions

ConstructorDefinition

[Visibility] constructor Identifier ( Parameters ) Block

A constructor is a class function that creates a new instance of the lexically enclosing class c. A constructor's body Block is required to call one of c's superclass's constructors (when and how?). Afterwards it may access the instance object under construction via the this local variable. A constructor should not return a value with a return statement; the newly created object is returned automatically.

A constructor can have any non-reserved name, in which case we would invoke it as though it were a class function. In addition, a constructor's Identifier can have the special name new, in which case we invoke it using the new prefix operator syntax as in JavaScript 1.5.

JavaScript 2.0

Core Language

Packages

Thursday, November 11, 1999

This page is out of date

Overview

Packages are an abstraction mechanism for grouping and distributing related code. Packages are designed to be linked at run time to allow a program to take advantage of packages written elsewhere or provided by the embedding environment. JavaScript 2.0 offers a number of facilities to make packages robust for dynamic linking:

Selected package contents can be protected from outside reference
Classes can maintain invariants that cannot be violated by code outside the class and/or package
Function arguments and data structure references can be type-checked to limit the kinds of unexpected inputs the package's code can experience
Packages can export multiple versions, allowing graceful upgrades to packages without changing the code that uses them

A package is a file (or analogous container) of JavaScript 2.0 code. There is no specific JavaScript statement that introduces or names a package -- every file is presumed to be a package. A package itself has no name, but it has a specific URI by which other packages can import it.

A package P typically starts with import statements that import other packages used by package P. A package that is meant to be used by other packages typically has one or more version declarations that declare versions available for export.

Package Loading

A package's body is described by the Program grammar nonterminal. A package is loaded (its body is evaluated) when the package is first imported or invoked directly (if, for example, the package is on an HTML web page). Some standard packages may also be loaded when the JavaScript engine first starts up.

Two attempts to load the same package in the same environment result in sharing of that package. What constitutes an environment is necessarily application-dependent. However, if package P1 loads packages P2 and P3, both of which load package P4, then P4 is loaded only once and thereafter its code and data is shared by P2 and P3.

When a package is loaded, all of its statements are evaluated in order, which may cause other packages to be loaded along the way when import statements are encountered. A package's symbols are available for export to other packages only after the package's body has been successfully evaluated. Unlike in Java, circularities are not allowed in the graph of package imports.

To create packages A and B that access each others' symbols, we need to instead define a hidden package C that consists of all of the code that would have gone into A and B. Package C should define versions verA and verB and tag the symbols it exports with either verA or verB to indicate whether these symbols belong in package A or B. Package A should then be empty except for a directive (or several directives if there are multiple versions of A and verA) that reexports C's symbols tagged with verA. Similarly, package B should reexport C's symbols tagged with verB. To make this work we need a reexport directive. Is this really necessary? Also, do we want a mechanism for hiding package C from general view so that users can only use it through A or B?

Exports

We can export a symbol in a package by giving it public Visibility.

Imports

To import symbols from a package we use the import statement:

ImportStatement

import ImportList ;

| import ImportList Block

| import ImportList Block else CodeStatement

ImportList

ImportItem , ... , ImportItem

ImportItem

[[protected] Identifier =] NonAssignmentExpression [: Version]

The first form of the import statement (without a Block) imports symbols into the current lexical scope. The second and third forms import symbols into the lexical scope of the Block. If the imports are unsuccessful, the first two forms of the import statement throw an exception, while the last form executes the CodeStatement after the else keyword.

An import statement can import one or more packages separated by commas. Each ImportItem specifies one package to be imported. The NonAssignmentExpression should evaluate to a string that contains a URI where the package may be found. If present, Version indicates the version of the package's exports to be imported; if not present, Version defaults to version 1.

An ImportItem can introduce a name for the imported package if the NonAssignmentExpression is preceded by Identifier =. Identifier becomes bound (either in the current lexical scope or in the Block's scope) to the imported package as a whole. Individual symbols can be extracted from the package by using Identifier with the :: operator. For example, if package at URI P has public symbols a and b, then after the statement

import x=P;

P's symbols can be referenced as either a, b, x::a, or x::b.

If an ImportItem contains the keyword protected, then the imported symbols can only be accessed using the :: operator. If we were to import package P using

import protected x=P;

then we'd have to access P's symbols using either x::a or x::b.

If two imports in the same scope import packages with clashing symbols, then neither symbol is accessible unless qualified using the :: operator. If an imported symbol clashes with a symbol declared in the same scope, then the declared symbol shadows the imported symbol. Scope rules 3 and 4 apply here as well, so the following code is illegal because a is referenced and then redefined:

import x=P; var y=a; // References P's a const a=17; // Redefines a in same scope

Version names cannot be imported.

Discussion

Package Names

Do we want to use URIs to locate packages, or do we want to invent our own, separate mechanism to do this?

Visibilities

Should we make private illegal outside a class rather than making it equivalent to package?

Should we introduce a local Visibility prefix that explicitly means that the declaration is visible locally? This wouldn't provide any additional functionality but it would provide a convenient name for talking about the four kinds of visibility prefixes.

What should the default visibilities be? The current defaults are loosely modeled after Java:

Definition Location	Default visibility
Package top level	`package` (equivalent to `local` in this case)
Inside a statement outside a function or class	`local`
Function or method code's top level	`local`
Inside a statement inside a function or method	`local`
Class declaration block's top level	`package`
Inside a statement inside a class declaration block	`local`

JavaScript 2.0

Core Language

Language Declarations

Thursday, November 11, 1999

Language declarations allow a script writer to select the language to use for a script or a particular section of a script. A language denotes either a major language such as JavaScript 2.0 or a variation such as strict mode.

Developers often find it desirable to be able to write a single script that takes advantage of the latest features in a host environment such as a browser while at the same time working in older host environments that do not support these features. JavaScript 2.0's language declarations enable one to easily write such scripts. One may still need to use techniques such as the LANGUAGE HTML attribute to support pre-JavaScript 2.0 environments, but at least the number of such environments that will need to be special-cased will not increase in the future.

Language declarations are a dual of versioning: language declarations let a script run under a variety of historical hosts, while versioning lets a host run a variety of historical scripts.

Syntax

LanguageDeclaration^w language LanguageId LanguageIdList LanguageAlternatives LanguageSemicolon^w

LanguageSemicolon^abbrev

;

| «empty»

LanguageSemicolon^{abbrevNonEmpty}

;

| «empty»

LanguageSemicolon^full ;

LanguageAlternatives

«empty»

| LanguageAlternatives | LanguageIdList

LanguageIdList

«empty»

| LanguageIdList LanguageId

LanguageId

Identifier

| Number

A language declaration uses the syntax above. The keyword language is followed by one or more language alternatives separated by vertical bars. Each language alternative consists of zero or more LanguageIds, which are either identifiers or numbers. The first language alternative must contain at least one LanguageId. The semicolon at the end of the LanguageDeclaration cannot be inserted by line-break semicolon insertion.

When a JavaScript environment is lexing and parsing a JavaScript program and it encounters a language declaration, it checks whether any of the language alternatives can be satisfied. If at least one can, the environment picks the first language alternative that can be satisfied and processes the rest of the containing block (until the closing } or until the end of the program if at the top level) using that language. A subsequent language declaration in the same block can further change the language.

If no language alternatives can be satisfied, then the JavaScript environment skips to the end of the containing block (until the closing matching } or until the end of the program if at the top level). Further language declarations in the same block are ignored. No error occurs unless the failing language declaration is executed as a statement, in which case it throws a syntax error. [See rationale for a discussion of some of the issues here.]

The following LanguageIds are currently defined:

`LanguageId`	Language
`1.0`	JavaScript 1.0
`1.1`	JavaScript 1.1
`1.2`	JavaScript 1.2
`1.3`	JavaScript 1.3
`1.4`	JavaScript 1.4
`1.5`	JavaScript 1.5 (ECMAScript Edition 3)
`2.0`	JavaScript 2.0
`strict`	Strict mode
`traditional`	Traditional mode (default)

It is meaningless to combine two or more numeric LanguageIds in the same alternative:

language 1.0 2.0;

will always fail. On the other hand, it is meaningful and useful to separate them with vertical bars. For example, one can indicate that one prefers JavaScript 2.1 but is willing to accept JavaScript 2.0 if 2.1 is not available:

language 2.1 | 2.0;

An empty alternative will always succeed. One can use it to indicate a preference for strict mode but willingness to work without it:

language strict |;

Language declarations are always lexically scoped and never extend past the end of the enclosing block.

This document specifies the 2.0 language and its strict and traditional modes. The consequences of mixing in other languages are implementation-defined, but implementations are encouraged to do something reasonable.

Strict Mode

Many parts of JavaScript 2.0 are relaxed or unduly convoluted due to compatibility requirements with JavaScript 1.5. Strict mode sacrifices some of this compatibility for simplicity and additional error checking. Strict mode is intended to be used in newly written JavaScript 2.0 programs, although existing JavaScript 1.5 programs may be retrofitted.

The opposite of strict mode is traditional mode, which is the default. A program can readily mix strict and traditional portions.

Strict mode has the following effects:

Line-break semicolon insertion is turned off. (Grammatical semicolon insertion remains turned on.)
[no line break] restrictions in grammar productions are ignored. Line breaks can be placed anywhere between input tokens.
FunctionDefinitions define constants rather than variables.
Calls to functions defined under strict mode are checked for the correct number of arguments except in traditional functions and in functions that explicitly allow a variable number of arguments. (The mode of the call site does not matter.)
Implementations may choose to disable other compatibility extensions such as support for octal literals. These are not officially part of JavaScript 2.0 but most implementations support these in traditional mode for compatibility with older programs.

Predefined Types

The following types are predefined in JavaScript 2.0:

Type	Set of Values
`void`	`undefined`
`Null`	`null`
`boolean`	`true` and `false`
`integer`	Double-precision IEEE floating-point numbers that are mathematical integers, including positive and negative zeroes but excluding infinities and NaN
`number`	Double-precision IEEE floating-point numbers, including positive and negative zeroes and infinities and NaN
`character`	Single 16-bit unicode characters
`string`	Immutable strings of unicode characters
`Function`	All functions and `null`
`array`	All arrays
`Array`	All arrays and `null`
`type`	All types
`Type`	All types and `null`
`object`	All values except `undefined` and `null`
`Object`	All values except `undefined`
`Any`	All values

By convention, predefined types whose names start with an upper-case letter include the value null, while predefined types whose names start with a lower-case letter do not include null. User-defined type names do not have to follow this convention.

Unlike in JavaScript 1.5, there is no distinction between objects and primitive values. All values can have methods. Some values can be sealed, which disallows addition of ad-hoc properties. User-defined classes can be made to behave like primitives.

The above type names are not reserved words. They are considered to be defined in a scope that encloses a package's global scope, so a package could use these type names as identifiers. However, defining these identifiers for other uses might be confusing because it would shadow the corresponding type names (the types themselves would continue to exist, but they could not be accessed by name).

The names Boolean, Number, and String have been deliberately left unused to enable implementations to use them to emulate the behavior of the JavaScript 1.5 Boolean, Number, and String wrapper objects. These are not part of JavaScript 2.0, but an implementation may support them for compatibility.

The name function could not be used to mean "all functions" because it is a reserved word. Use Function^* instead.

Literals

A literal number that has an integral value has type integer; otherwise it has type number. integer is a subtype of number, so every integer value is also a number value. A literal string that has exactly one 16-bit unicode character has type character; otherwise it has type string. character is a subtype of string, so every character value is also a string value.

User-Defined Types

Any class defined using the class declaration is also a type that denotes the set of all of its and its descendants' instances. These include the predefined classes, so Object, Date, etc. are all types. null is an instance of a user-defined class c if it is an instance of any of c's superclasses.

Compound Types

We can use the following operators to construct more complex types. t and u are type expressions in the expressions below.

Type	Values
`t` `\|` `*`	`null` and all values of type `t`
`t` `^` `*`	All values of type `t` except `null`
`t` `\|` `?`	`undefined` and all values of type `t`
`t` `^` `?`	All values of type `t` except `undefined`
`t` `\|` `u`	All values belonging to either type `t` or type `u` or both
`t` `&` `u`	All values simultaneously belonging to both type `t` and type `u`

The language does not syntactically distinguish type expressions from value expressions, so a type expression can also use any other value operators such as !, +, and . (member access). Except for parentheses, most of them are not very useful, though.

Subtyping

We write a b to denote that a is a subtype of b. Subtyping is transitive, so if a b and b c then a c is also true. Subtyping is also reflexive: a a.

The following subtype and type equivalence relations hold. t, u, and v represent arbitrary types.

`t` `t` `\|` `u`	`t` `&` `u` `t`
`t` `\|` `t` = `t`	`t` `&` `t` = `t`
`t` `\|` `u` = `u` `\|` `t`	`t` `&` `u` = `u` `&` `t`
(`t` `\|` `u`) `\|` `v` = `t` `\|` (`u` `\|` `v`)	(`t` `&` `u`) `&` `v` = `t` `&` (`u` `&` `v`)
`t` `\|` `*` = `t` `\|` `Null`	`t` `\|` `?` = `t` `\|` `void`
`integer` `number` `object`	`character` `string` `object`
`boolean` `object`	`array` `object`
`type` `object`
`Array` = `array` `\|` `Null`	`Type` = `type` `\|` `Null`
`Object` = `object` `\|` `Null`
`t` `Any`

We write v t to indicate that v is a value that is a member of type t. The following subtyping rule holds: if v t and t s, then v s holds as well. Any particular value v is simultaneously a member of many types.

Meaning of Types

Types are generally used to restrict the set of objects that can be held in a variable or passed as a function argument. For example, the declaration

var integer x;

restricts the values that can be held in variable x to be integers.

A type declaration never affects the semantics of reading the variable or accessing one of its members. Thus, as long as expression new MyType() returns a value of type MyType, the following two code snippets are equivalent:

var MyType x = new MyType();
x.foo();

var x = new MyType();
x.foo();

This equivalence always holds, even if these snippets are inside the declaration of class MyType and foo is a private field of that class. As a corollary, adding true type annotations does not change the meaning of a program.

Type Expressions

A type is also a value (whose type is type) and can be used in expressions, assigned to variables, passed to functions, etc. For example, the code

const type Z = integer;
function abs_val(Z i) Z {
  return i<0 ? -i : i;
}

is equivalent to:

function abs_val(integer i) integer {
  return i<0 ? -i : i;
}

As another example, the following method takes a type and returns an instance of that type:

method QueryInterface(type t) t { ... }

Coercions

Coercions can take place in the following situations:

Assigning a value v to a variable or field of type t
Passing an argument v to a function whose corresponding parameter has type t
Returning a result v from a function declared to return a value of type t
Using the v@t operator.

In any of these cases, if v t, then v is passed unchanged. If v t, then an error occurs unless v is undefined, in which case the following coercions are tried, in order:

If Null t, then null is used instead of undefined.
If boolean t, then false is used instead of undefined.
If integer t, then +0.0 is used instead of undefined.
If string t, then "" is used instead of undefined.

If none of the coercions succeeds, an error occurs.

Some types such as machine integers define additional coercions. These are listed along with descriptions of these types.

`@` Operator

One can explicitly request a coercion in an expression by using the @ operator. This operator has the same precedence as . and coerces its left operand to the right operand, which must be a type. ... v@t ... can be used in an expression and has the same effect as:

function coerce_to_t(t a) t {return a} // Declared at the top level

... coerce_to_t(v) ...

assuming that coerce_to_t is an identifier not used anywhere else. The @ operator is useful as a type assertion as in w@Window. It's a postfix operator to simplify cascading expressions:

w@Window.child@Window.pos

is equivalent to:

(((w@Window).child)@Window).pos

Type Casts

A type cast performs more aggressive transformations than a type coercion. To cast a value to a given type, we use the type as a function, passing it the value as an argument:

type(value)

For example, integer(258.1) returns the integer 258, and string(2+2==4) returns the string "true".

Need to specify the semantics of type casts. They are intended to mimic the current ToNumber, ToString, etc. methods.

Discussion

Colon Syntax

Would we rather have the colon syntax for declaring types? Two sample declarations would be:

var x:integer = 7;
function f(a:integer, b:Object):number {...}

A few considerations:

On the Pascal/Modula/Ada vs. C/C++/Java syntax debate, JavaScript tends to use syntax more similar to Java.
We already use the colon syntax for statement labels and object literal elements (for example {a:17, b:33}). The latter would present a conundrum if we ever wanted to declare field types in an object literal. Some users have been using these as a convenient facility for passing named arguments to functions.
Using the colon syntax would allow us to drop the requirement that field be a reserved word.

Type Expressions

Do we want to make type expressions have a distinct syntax from value expressions? I have not heard any "pro" arguments. Here are the "con" arguments:

Creating two different syntaxes for two kinds of expressions adds to the complexity of the language.
JavaScript is a dynamic language and it is useful to manipulate types as though they were first-class values.
It's difficult to unambiguously distinguish type expressions from value expressions. In the expression (expr1)(expr2), is expr1 a type or a value expression? If the two have the same syntax, it doesn't matter.

JavaScript 2.0

Libraries

Versions

Thursday, November 11, 1999

Motivation

As a package evolves over time it often becomes necessary to change its exported interface. Most of these changes involve adding symbols (global and class members), although occasionally a symbol may be deleted or renamed. In a monolithic environment where all JavaScript source code comes preassembled from the same source, this is not a problem. On the other hand, if packages are dynamically linked from several sources then versioning problems are likely to arise.

One of the most common avoidable problems is collision of symbols. Unless we solve this problem, an author of a library will not be able to add even one symbol in a future version of his library because that symbol could already be in use by some client or some other library that a client also links with. This problem occurs both in the global namespace and in the namespaces within classes from which clients are allowed to inherit.

Example

Here's an example of how such a collision can arise. Suppose that a library provider creates a library called BitTracker that exports a class Data. This library becomes so successful that it is bundled with all web browsers produced by the BrowsersRUs company:

package BitTracker;

public class Data {
  public field author;
  public field contents;
  function save() {...}
};

function store(d) {
  ...
  storeOnFastDisk(d);
}

Now someone else writes a web page W that takes advantage of BitTracker. The class Picture derives from Data and adds, among other things, a method called size that returns the dimensions of the picture:

import BitTracker;

class Picture extends Data {
  public method size() {...}
  field palette;
};

function orientation(d) {
  if (d.size().h >= d.size().v)
    return "Landscape";
  else
    return "Portrait";
}

The author of the BitTracker library, who hasn't seen W, decides in response to customer requests to add a method called size that returns the number of bytes of data in a Data object. He then releases the new and improved BitTracker library. BrowsersRUs includes this library with its latest NavigatorForInternetComputing 17.0 browser:

package BitTracker;

public class Data {
  public field author;
  public field contents;
  public method size() {...}
  function save() {...}
};

function store(d) {
  ...
  if (d.size() > limit)
    storeOnSlowDisk(d);
  else
    storeOnFastDisk(d);
}

An unsuspecting user U upgrades his old BrowsersRUs browser to the latest NavigatorForInternetComputing 17.0 browser and a week later is dismayed to find that page W doesn't work anymore. U's granddaughter Alyssa P. Hacker tries to explain to U that he's experiencing a name conflict on the size methods, but U has no idea what she is talking about. U attempts to contact the author of W, but she has moved on to other pursuits and is on a self-discovery mission to sub-Saharan Africa. Now U is steaming at BrowsersRUs, which in turn is pointing its finger at the author of BitTracker.

Solutions

How could the author of BitTracker have avoided this problem? Simply choosing a name other than size wouldn't work, because there could be some other page W2 that conflicts with the new name. There are several possible approaches:

Naming conventions. We could require each symbol to be prefixed by the full name of the party from which this symbol originates. Unfortunately, this would get tedious and unnecessarily impact casual uses of the language. Furthermore, this approach is impractical for the names of methods because it is often desirable to share the same method name across several classes to attain polymorphism; this would not be possible if Netscape's objects all used the com_netscape_length method while MIT's objects used the edu_mit_length method.
Explicit imports. We could require each client package to import every external symbol it references. This works reasonably well for global symbols but becomes tedious for the names of class members, which would have to be imported separately for each class. Alternatives exist for bulk importing members of a class, but they are somewhat complicated and do not work for interfaces or multiple inheritance.
Versions. We could require package authors to mark the symbols they export with explicit versions. A package's developer could introduce a new version of the package with additional symbols as long as those symbols were made invisible to prior versions.

The last approach appears to be the most desirable because it places the smallest burden on casual users of the language, who merely have to import the packages they use and supply the current version numbers in the import statements. A package author has to be careful not to disturb the set of visible prior-version symbols when releasing an updated package, but authors of dynamically linkable packages are assumed to be more sophisticated users of the language and could be supplied with tools to automatically check updated packages' consistency.

Overview

The versioning system in JavaScript 2.0 only affects exports of symbols. The concept of a version does not apply to a package's internal code; it is up to package developers to ensure that newer releases of their packages continue to behave compatibly with older ones.

Terminology

A version describes the API of a package. A release refers to the entirety of a package, including its code. One release can export many versions of its API. A package developer should make sure that multiple releases of a package that export version V export exactly the same set of symbols in version V.

Example

As an example, suppose that a developer wrote a sorting package P with functions sort and merge that called bubble sort in version "1.0". In the next release the developer adds a function called stablesort and includes it in version "2.0". In a subsequent release the developer changes the sort algorithm to a quicksort that calls stablesort as a subroutine. That last release of the package might look like:

const V1_0 = new Version("1.0","");       // The "" makes version "1.0" be the default
const V2_0 = new Version("2.0","1.0");

public var serialNumber;

public function sort(compare: Function, array: any[]):any[] {...}
public function merge(compare: Function, array1: any[], array2: any[]):any[] {...}
V2_0 function stablesort(compare: Function, array: any[]):any[] {...}

Suppose, further, that client package C1 imports version "1.0" of P, client package C2 simultaneously imports version "2.0" of P, and a search for P yields the latest release described above. There would be only one instance of P running -- the latest release. Both clients would get the same sort and merge functions, and both would see the same serialNumber variable (in particular, if client C1 wrote to serialNumber, then client C2 would see the updated value), but only client package C2 would see the stablesort function. Both clients would get the quicksort release of sort. If client package C1 defined its own stablesort function, then that function would not conflict with P's stablesort; furthermore, P's sort would still refer to P's stablesort in its internal subroutine call.

Had only the first release of P been available, client package C2 would obtain an error because version 2 of P's API would not be available. Client C1 could run normally, although the sort function it calls would use bubble sort instead of the quicksort.

Note that the last release of P did not change the API so it did not need a new version. Of course, it could define a new version if for some reason it wanted clients to be able to demand the last release of P even though its API is the same as the second release.

The remainder of this page is out of date. Versions are now created using ordinary object calls on a versioning library.

Version Declarations

Version Names

A version name Version is a quoted string literal such as "1.2" or "Private Interface 2.0". Two version names are equal if their strings are equal. A special version whose name is the empty string "" is called the default version.

Declaration Syntax

A package must declare every version it uses except "", which is declared by default if not explicitly declared. A version must be declared before its first use. A given version name may be declared only once per package. A package declares a version name Version using the version declaration:

VersionDefinition

[Visibility] version Version [> VersionList] ;

| [Visibility] version Version [= Version] ;

VersionList

Version , ... , Version

A version declaration cannot be nested inside a ClassDefinition's Block.

If Visibility is present, it must be either private, package, or public (without VersionsAndRenames). Unlike in other declarations, the default is public, which makes Version accessible by other packages. A private or package Visibility hides its Version from other packages; such a Version can be used only by being included in the VersionList of another Version. Also unlike other declarations, all Version declarations are global.

Version Ordering

If the Version being declared is followed by a > and a VersionList, then the Version is said to be greater than all of the Versions in the VersionList. We write v1 :> v2 to indicate that v1 is greater than v2 and v1 : v2 to indicate that either v1 and v2 are the same version or v1 :> v2. Order is transitive, which means that if v1 :> v2 and v2 :> v3, then v1 :> v3. This order induces a partial order on the set of all versions. It is possible for two versions to be unordered with respect to each other, in which case they are not equal and neither is greater than the other.

If the Version v1 being declared is followed by a = and another Version v2, then v1 becomes an alias for v2, and they may be used interchangeably.

Version Ranges

A VersionRange specifies a subset of all versions. This subset contains all versions that are both greater than or equal to a given Version1 and less than or equal to a given Version2. A VersionRange can have either of the following forms:

VersionRange

Version

| [Version1] .. [Version2]

The first form specifies the one-element set {Version}. The second form specifies the set of all Versions v such that v : Version1 and Version2 : v. If Version1 is omitted, the condition v : Version1 is dropped. If Version2 is omitted, the condition Version2 : v is dropped.

Discussion

Version Numbers 1

The original version of this specification allowed both strings and numbers as Version names. Two version names were equal if their toString representations were identical, so version names 2.0 and "2" were identical but 2.0 and "2.0" were not. In addition, numbered versions had an implicit order: For any two versions v1 and v2 whose names could be represented as numbers, v1 :> v2 if and only if v1 was numerically greater than v2. Additionally, every version except 0 was greater than version 0. It was an error to define explicit version containment relations that would violate this default order, directly or indirectly.

Numbered Version names were dropped for simplicity and to avoid confusion with versions such as 1.2.3 (which would be a syntax error unless quoted).

Version Numbers 2

Another, simpler, approach is to require all Version names to be nonnegative integers (without quotes). Versions would not need to be declared, and all versions would be totally ordered in numerical order. A disadvantage of this approach is that the total order keeps versions from being branched.

Dynamic Version Definitions

Currently version definitions are fixed. These could be turned into function calls that define versions and list their relationships. If we can get a variable or constant to hold a set of version names, then we could use these variables rather than specific version names in the VersionsAndRenames lists after public keywords. This would provide another level of abstraction and flexibility.

Separate Version Definitions

Yet another approach is to consolidate all of the information in VersionsAndRenames into a set of export statements, say, at the top of the file rather than being interspersed throughout a package along with public declarations. This would make it easier to see all of the identifiers exported by a particular version of the package, but it would also likely lead to inconsistencies when someone forgets to update an export statement after inserting another variable, function, field, or method definition. Such errors would likely be caught after a package has been released.

JavaScript 2.0

Libraries

Machine Types

Thursday, November 11, 1999

Purpose

The machine types library is an optional library that provides additional low-level types for use in JavaScript 2.0 programs. On implementations that support this library, these types provide faster, Java-style integer operations that are useful for communicating between JavaScript 2.0 and other programming languages and for performance-critical code. These types are not intended to replace number and integer for general-purpose scripting.

When the machine types library is imported via an import of "machine-types" version 1, the following types become available:

Type	Values
`byte`	Machine integers between -128 and 127 inclusive
`ubyte`	Machine integers between 0 and 255 inclusive
`short`	Machine integers between -32768 and 32767 inclusive
`ushort`	Machine integers between 0 and 65535 inclusive
`int`	Machine integers between -2147483648 and 2147483647 inclusive
`uint`	Machine integers between 0 and 4294967295 inclusive
`long`	Machine integers between -9223372036854775808 and 9223372036854775807 inclusive
`ulong`	Machine integers between 0 and 18446744073709551615 inclusive

Values belonging to the eight machine integer types above are distinct from each other and from values of type integer. Thus, byte(7) is distinct from int(7), which in turn is distinct from the plain integer 7. However, the coercions listed below usually hide these distinctions.

No subtype relations hold between the machine types.

The above type names are not reserved words.

Coercions

The following coercions take place:

An ordinary integer value v can be coerced to one of the machine integer types M if v is within range of the target type M. Both +0 and -0 coerce to the machine integer 0. Note that non-integer numbers are not coerced to any of the machine types.
A machine integer value m can be coerced to any of the machine integer types M. If m is not within range of the target type M, it is treated modulo |M|, where |byte| = |ubyte| = 256, |short| = |ushort| = 65536, |int| = |uint| = 2³², and |long| = |ulong| = 2⁶⁴.
A machine integer value m can be coerced to type integer or number as long as m can be represented exactly using the IEEE double-precision floating-point format. 0 always becomes +0.

Operations

Machine integers support the arithmetic operators +, -, *, /, %, comparisons ==, !=, <, >, <=, =>, and bitwise logical operations ~, &, |, ^, <<, >>. If supplied two operands of different machine integer types M1 and M2, all of these binary operators except << and >> first coerce both operands to the same type M. If M1 appears before M2 in the list byte, ubyte, short, ushort, int, uint, long, ulong, then M is M2; otherwise M is M1. Then these operators perform the operation and finally return the result as a value of type M. If the result is not within range of the target type M, it is treated modulo |M|.

If one of the operands is a machine integer of type M and the other is an integer value v, then v is first coerced to type M.

The result type of a shift expression (<< or >>) is the same as the type of its first operand. The second operand's type does not affect the type of the result. Right shifts are signed if the first operand has type byte, short, int, or long, and unsigned if it has type ubyte, ushort, uint, or ulong.

Discussion

These rules are designed to permit machine integer operations to be implemented as single instructions on most processor architectures yet give predictable results. Overflows wrap around instead of signaling errors because such behavior is useful for many bit-manipulation algorithms and permits much better optimization of performance-critical code. Code that is concerned about overflows should be using regular integer instead of the machine integer types.

Disjointness of Machine Types

Why are values of the eight machine integer types distinct? This was done because of a desire to allow arithmetic operators to only support 32 bits when operating on int values. Let's take a look at the alternative:

Suppose we unify the values of all eight machine types so that int(2000000000) is indistinguishable from long(2000000000). To what precision should an operator like + calculate its results? Clearly, if we're adding two long values and the result is within the range of long values, then we'd expect to get the right result. In particular, long(2000000000) + long(2000000000) should yield long(4000000000). However, long(2000000000) is indistinguishable from int(2000000000), so int(2000000000) + int(2000000000) should also yield long(4000000000), which is not representable as an int value. Thus, even if both operands are known to be int values, the + operator has to use 64-bit arithmetic.

If a has type int and we compute a+1, then we have to use 64-bit arithmetic because the result could be 2147483648. However, if we compute var int r = a+1 instead, then a smart compiler could make do with 32-bit arithmetic because the result is treated modulo 2³². However, this trick would not work with an expression such as var boolean b = a+1 > 0.

The alternative is viable but it leads to more demand for 64-bit arithmetic. It does have the advantage that one does not need to worry about intermediate overflows as long as the values don't approach 2⁶⁴.

Single-Precision Floating-Point Type

Do we want to support a float type for holding single-precision IEEE floating-point numbers? This type may be useful for:

communicating with other languages;
applications that need fast numeric performance; and
code using floats originally written in another language such as C++ or Java that one would want to replicate exactly in JavaScript; without support for the float type the JavaScript version would give different answers from the original.

One difficulty with supporting float is deciding what the coercion rules should be. If we invoke + with one number operand and one float operand, should the result be a float or a number? One might expect number, but this makes adding constants to floats using single-precision arithmetic awkward since every constant is a number. If s is a float, the expression s+1 would yield a number instead of a float because 1 is a number. One would have to write s+float(1) instead.

JavaScript 2.0

Libraries

Operator Overloading

Thursday, November 11, 1999

Overview

Operator overloading is useful to implement Spice-style units without having to add units to the core of the JavaScript 2.0 language. Operator overloading is done via an optional library that, when imported, exposes several additional methods of the Object class. This library is analogous to the internationalization library in that it does not have to be present on all implementations of JavaScript 2.0; implementations without this library do not support operator overloading.

JavaScript 2.0

Formal Description

Thursday, November 11, 1999

Semantic Notation
Stages
Lexer Grammar Summary (also available as Word 98 rtf)
Lexer Grammar and Semantics (Word 98 rtf)
Regular Expression Grammar Summary (Word 98 rtf)
Regular Expression Grammar and Semantics (Word 98 rtf)
Parser Grammar Summary (Word 98 rtf)

This chapter presents the formal syntax and semantics of JavaScript 2.0. The syntax notation and semantic notation sections explain the notation used for this description. A simple metalanguage based on a typed lambda calculus is used to specify the semantics.

The syntax and semantic sections are available in both HTML 4.0 and Microsoft Word 98 RTF formats. In the HTML versions each use of a grammar nonterminal or metalanguage value, type, or field is hyperlinked to its definition, making the HTML version preferred for browsing. On the other hand, the RTF version looks much better when printed. The fonts, colors, and other formatting of the various grammar and semantic elements are all encoded as CSS (in HTML) or Word (in RTF) styles and can be altered if desired.

The syntax and semantics sections are machine-generated from code supplied to a small engine that can type-check and execute the semantics directly. This engine is in the CVS tree at mozilla/js/semantics; the input files are at mozilla/js/semantics/JS20.

JavaScript 2.0

Formal Description

Semantic Notation

Thursday, November 11, 1999

Introduction

To precisely specify the semantics of JavaScript 2.0, we use the notation described below to define the behavior of all JavaScript 2.0 constructs and their interactions.

Semantic Values

The semantics describe the meaning of a JavaScript 2.0 program in terms of operations on simpler objects borrowed from mathematics collectively called semantic values. Semantic values can be held in semantic variables and passed to semantic functions. The kinds of semantic values used in this specification are summarized in the table below and explained in the next few sections:

Semantic Value Examples	Description
	The result of a nonterminating computation
syntaxError	The result of a computation that returns by throwing a semantic exception
	The result of a semantic function that does not return a useful value
true, false	Booleans
-3, 0, 1, 2, 93	Mathematical integers
1/2, -12/7	Mathematical rational numbers
1.0, 3.5, 2.0e-10, -0.0, -, NaN	Double-precision IEEE floating-point numbers
`A`, `b`, `«LF»`, `«uFFFF»`	Characters (Unicode 16-bit code points)
[`value0`, ... , `valuen-1`]	Vectors — indexed lists of semantic values
, `abc` , `1«TAB»`5	Strings
{`value1`, `value2`, ... , `valuen`}	Mathematical sets of semantic values
`name1` `value1`, `name2` `value2`, ... , `namen` `valuen`	Tuples with named member semantic values
`name` or `name` `value`	Tagged semantic values
function(n: Integer) n*n	Semantic functions

There is a special semantic value (pronounced as "bottom") that represents the result of an inconsistent or nonterminating computation. Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to or calling a semantic function with as any argument also yields without evaluating any remaining operands or arguments (in technical terms, semantic functions and operators are strict in all of their arguments unless specified otherwise).

If interpreting a JavaScript program according to the semantics here gives a result, an actual implementation executing that JavaScript program will either fail to terminate or throw an exception because it runs out of memory or stack space.

Semantic values of the form value represents the result of a computation that throws a semantic exception. value is the exception's value (which must be a member of the SemanticException semantic type). Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to value or calling a semantic function with value as any argument also yields value (with the same value) without evaluating any remaining operands or arguments.

The throw statement takes a value v and returns v. The catch statement converts v back to v.

Semantic functions that do not return a useful value return the semantic value . There are no operations defined on .

Booleans

The semantic values true and false are booleans. The not, and, or, and xor operators operate on booleans. Like most other operators, and, or, and xor evaluate both operands before returning a result; these operators do not short-circuit.

Integers

Unless specified otherwise, numbers in the semantics written without a slash or decimal point are mathematical integers: ..., -3, -2, -1, 0, 1, 2, 3, .... The usual mathematical operators +, -, *, and unary - can be used on integers. Integers can be compared using =, , <, , >, and .

Rationals

Numbers in the semantics written with a slash are mathematical rational numbers. Every integer is also a rational. Rational numbers include, for example, 0, 1, 2, -1, 1/2, -12/7, and -24/14; the last two are different ways of writing the same rational number. The usual mathematical operators +, -, *, /, and unary - can be used on rationals. Rationals can be compared using =, , <, , >, and .

Doubles

Numbers in the semantics written with a decimal point are double-precision IEEE floating-point numbers (often abbreviated as doubles), including distinct +0.0, -0.0, +, -, and NaN. Doubles are distinct from integers and rationals; when writing doubles in the semantics, we always include a decimal point to distinguish them from integers and rationals.

Doubles other than +, -, and NaN are called finite. We define the significand of a finite double d as follows:

The significand of +0.0 or -0.0 is 0.
If d is normalized, we can uniquely represent it as s m 2^e, where m and e are integers, s {-1, 1}, 2⁵² m < 2⁵³, and -1074 e 971. m is the significand.
If d is denormalized, we can uniquely represent it as s m 2^-1074, where m is an integer, s {-1, 1}, and 0 < m < 2⁵². m is the significand.

Characters

Characters are single Unicode 16-bit code points. We write them enclosed in single quotes and . There are exactly 65536 characters: «u0000», «u0001», ...,A, B, C, ..., «uFFFF» (see also notation for non-ASCII characters). Unicode surrogates are considered to be pairs of characters for the purpose of this specification.

The characterToCode and codeToCharacter semantic functions convert between characters and their integer Unicode values.

Vectors

A semantic vector contains zero or more elements indexed by integers starting from zero. We write a vector value by enclosing a comma-separated list of values inside bold brackets:

[element0, element1, ... , elementn-1]

For example, the following semantic value is a vector whose elements are four strings:

[parsley, sage, rosemary, thyme]

The empty vector is written as [].

Let u = [e0, e1, ... , en-1] and v = [f0, f1, ... , fm-1] be vectors, i and j be integers, and x be a value. The following notations describe common operations on vectors:

Notation	Result Value
`u` `v`	The concatenated vector [`e0`, `e1`, ... , `en-1`, `f0`, `f1`, ... , `fm-1`]
\|`u`\|	The length `n` of the vector
`u`[`i`]	The `i`^th element `ei`, or if `i`<0 or `in`
`u`[`i` ... `j`]	The vector slice [`ei`, `ei+1`, ... , `ej`] consisting of all elements of `u` between the `i`^th and the `j`^th, inclusive, or if `i`<0, `jn`, or `j`<`i`-1. The result is the empty vector [] if `j`=`i`-1.
`u`[`i` ...]	The vector slice [`ei`, `ei+1`, ... , `en-1`] consisting of all elements of `u` between the `i`^th and the end, or if `i`<0 or `i`>`n`. The result is the empty vector [] if `i`=`n`.
`u`[`i` `x`]	The vector [`e0`, ... , `ei-1`, `x`, `ei+1`, ... , `en-1`] with the `i`^th element replaced by the value `x` and the other elements unchanged, or if `i`<0 or `in`

Semantic vectors are functional; there is no notation for modifying a semantic vector in place.

Strings

A semantic string is merely a vector of characters. For notational convenience we can write a string literal as zero or more characters enclosed in double quotes. Thus,

Wonder«LF»

is equivalent to:

[W, o, n, d, e, r, «LF»]

In addition to all of the other vector operations, we can use =, , <, , >, and to compare two strings.

Sets

A semantic set is an unordered collection of values. Each value may occur at most once in a set. There must be a well-defined = semantic operator defined on all pairs of values in the set, and that operator must induce an equivalence relation.

A semantic set is denoted by enclosing a comma-separated list of values inside braces:

{element1, element2, ... , elementn}

The empty set is written as {}.

For example, the following set contains seven integers:

{3, 0, 10, 11, 12, 13, -5}

When using elements such as integers and characters that have an obvious total order, we can also write sets by using the ... range operator. For example, we can rewrite the above set as:

{0, -5, 3 ... 3, 10 ... 13}

If the beginning of the range is equal to the end of the range, then the range consists of only one element: {7 ... 7} is the same as {7}. If the end of the range is one "less" than the beginning, then the range contains no elements: {7 ... 6} is the same as {}. If the end of the range is more than one "less" than the beginning, then the set is .

Let A and B be sets and x be a value. The following notations describe common operations on sets:

Notation	Result Value
\|`A`\|	The number of elements in the set `A`; if `A` has infinitely many elements
min `A`	If there exists a value `m` that satisfies both `m` `A` and for all elements `x` `A`, `x` `m`, then return `m`; otherwise return (this could happen either if `A` is empty or if `A` has an infinite descending sequence of elements with no lower bound in `A`)
max `A`	If there exists a value `m` that satisfies both `m` `A` and for all elements `x` `A`, `x` `m`, then return `m`; otherwise return (this could happen either if `A` is empty or if `A` has an infinite ascending sequence of elements with no upper bound in `A`)
`A` `B`	The intersection of sets `A` and `B` (the set of all values that are present both in `A` and in `B`)
`A` `B`	The union of sets `A` and `B` (the set of all values that are present in at least one of `A` or `B`)
`A` - `B`	The difference of sets `A` and `B` (the set of all values that are present in `A` but not `B`)
`x` `A`	Return true if `x` is an element of set `A` and false if not
`A` = `B`	Return true if the two sets `A` and `B` are equal and false otherwise. Sets `A` and `B` are equal if every element of `A` is also in `B` and every element of `B` is also in `A`.

min and max are only defined for sets whose elements can be compared with <.

Tuples

A semantic tuple is an aggregate of several named semantic values. Tuples are sometimes called records or structures in other languages. A tuple is denoted by a comma-separated list of names and values between bold triangular brackets:

name1 value1, name2 value2, ... , namen valuen

Each namei valuei pair is called a field. The order of fields in a tuple is irrelevant, so x 3, y 4 is the same as y 4, x 3. A tuple's names must all be distinct.

Let w be an expression that evaluates to a tuple name1 value1, name2 value2, ... , namen valuen. We can extract the value of the field named namei from w by using the notation w.namei. w is required to have this field. For example, x 3, y 4.x is 3.

In the HTML versions of the semantics, each use of namei is linked back to its tuple type's definition.

Oneofs

A semantic oneof is a pair consisting of a name (called the tag) and a value. Oneofs are sometimes called variants or tagged unions in other languages. A oneof is denoted by writing the tag followed by the value:

name value

For brevity, when value is , we can omit it altogether, so red is the same as red .

Let o be an expression that evaluates to some oneof n v. We can perform the following operations on o:

Notation	Result Value
`o`.`name`	The value `v` if `n` is `name`; otherwise
`o` is `name`	true if `n` is `name`; false otherwise

For example, (red 5) is blue evaluates to false, while (red 5) is red evaluates to true. (red 5).red evaluates to 5.

In addition to the operators above, the case statement evaluates one of several expressions based on a oneof tag.

In the HTML versions of the semantics, each use of name is linked back to its oneof type's definition.

Functions

A semantic function receives zero or more arguments, performs computations, and returns a result. We write a semantic function as follows:

function(param1: type1, ... , paramn: typen) body

Here param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, and body is an expression that computes the function's result. When the function is called with argument values v1 through vn, the function's body is evaluated and the resulting value returned to the caller. body can refer to the parameters param1 through paramn; each reference to a parameter parami evaluates to the corresponding argument value vi. Arguments are passed by value (which in this language is equivalent to passing them by reference because there is no way to write to a parameter).

Function parameters are statically scoped. When functions are nested and an inner function f defines a parameter with the same name as a parameter of an outer function g, then f's parameter shadows g's parameter inside f.

The only operation allowed on a semantic function f is calling it, which we do using the f(arg1, ..., argn) syntax. In the presence of side effects, f is evaluated first, followed by the argument expressions arg1 through argn, in left-to-right order. If the result of evaluating f or any of the argument expressions is , then the call immediately returns without evaluating the following argument expressions, if any. If the result of evaluating f or any of the argument expressions is v for some value v, then the call immediately returns that v without evaluating the following argument expressions, if any. Otherwise, f's body is evaluated and the resulting value returned to the caller.

Semantic Types

A semantic type is a possibly infinite set of semantic values. Names of semantic types are shown in Capitalized Red Small Caps, and compound semantic type expressions are in red.

We use semantic types to make the semantics more readable by declaring the semantic type of each semantic variable (including function argument variables). Each such declaration states that the only values that will be stored in a semantic variable will be members of that variable's semantic type. These declarations can be proven statically. The JavaScript semantics have been machine type-checked to ensure that every type declaration holds, so, for example, if the semantics state that variable x has type Integer then there does not exist any place that could assign the value true to x.

Semantic type annotations allow us to restrict the description of each semantic operator and function to only describe its behavior on arguments that are members of the arguments' semantic types. Thus, for example, we need not describe the behavior of the + semantic operator when passed the semantic values true and as operands because we can prove that this case cannot arise.

Every semantic type includes the values and v for all values v whose semantic type is SemanticException. For brevity we do not list and v in the tables below.

Basic Semantic Types

The following are the basic semantic types:

Type	Set of Values
Void	{}
Boolean	{true, false}
Integer	{..., -2, -1, 0, 1, 2, ...} (All mathematical integers)
Rational	All mathematical rational numbers
Double	All double-precision IEEE floating-point numbers, including , -, and NaN
Character	All 65536 characters
String	Shorthand for Character[] (see vector types below)
SemanticException	Set of all values that can be thrown as semantic exceptions. This type is defined separately inside each grammar that throws such exceptions.

The type Rational includes Integer as a subtype because every integer is also a rational number. Except for and v, the types Rational and Double are disjoint.

Compound Semantic Types

We can construct compound semantic types using the notation below. Here t, t1, t2, ..., tn represent some existing semantic types.

Type	Set of Values
`t`[]	All vectors [`v0`, ... , `vn-1`] all of whose elements `v0`, ... , `vn-1` have type `t`. Note that the empty vector [] is a member of every vector type `t`[].
{`t`}	All sets {`v1`, `v2`, ... , `vn`} all of whose elements `v1`, ... , `vn` have type `t`. Note that the empty set {} is a member of every set type {`t`}.
tuple {`name1`: `t1`; ... ; `namen`: `tn`}	All tuples `name1` `v1`, ... , `namen` `vn` for which each `vi` has type `ti` for 1 `i` `n`. The `namei`'s must be distinct; the order in which the `namei`: `ti` fields are listed does not matter.
oneof {`name1`: `t1`; ... ; `namen`: `tn`}	All oneofs of the form `namei` `v`, where 1 `i` `n` and `v` has type `ti`. If `tk` is Void, then `namek`: `tk` can be abbreviated as simply `namek` in the oneof semantic type syntax. The `namei`'s must be distinct; the order in which the `namei`: `ti` alternatives are listed does not matter.
`t1` `t2` ... `tn` `t`	Some* functions that take `n` arguments of types `t1` through `tn` respectively and produce a result of type `t`. If `n` is zero (the function takes no arguments), we write this type as () `t`. * Technically speaking, this semantic type includes only functions that are continuous in the domain-theoretical sense; this avoids set-theoretical paradoxes.
() `t`

The type constructors earlier in the table bind tighter than ones later in the table, so, for example, Integer[] Rational[] is equivalent to (Integer[]) (Rational[]) (a function that takes a vector of Integers and returns a vector of Rationals) rather than ((Integer[]) Rational)[] (a vector of functions, each of which takes a vector of Integers and returns a Rational). In the rare cases where this is needed, parentheses are used to override precedence.

Semantic Operators

The table below lists the semantic operators in order from the highest precedence (tightest-binding) to the lowest precedence (loosest-binding). Operators under the same heading of the table have the same precedence and associate left-to-right, so, for example, 7-3+2-1 is interpreted as ((7-3)+2)-1 instead of 7-(3+(2-1)) or (7-(3+2))-1. When needed, parentheses can be used to group expressions.

The type signatures of the operators are also listed. Some operators are polymorphic; t, t1, t2, ..., and tn can represent any semantic types. The types of some operators are underdetermined; for example, [] can have type t[] for any type t. In these cases the particular choice of type is inferred from the context.

Each operator in the table below is strict: it evaluates all of its operands left-to-right, and if any operand evaluates to , then the operator immediately returns without evaluating the following operands, if any. If any operand evaluates to v for some value v, then the operator immediately returns that v without evaluating the following operands, if any.

Operator	Signatures	Description
Nonassociative Operators
(`x`)	`t` `t`	Return `x`. Parentheses are used to override operator precedence.
\|`u`\|	`t`[] Integer	`u` is a vector [`e0`, `e1`, ... , `en-1`]. Return the length `n` of that vector.
\|`u`\|	{`t`} Integer	The number of elements in the set `u`; if `u` has infinitely many elements
[`x0`, `x1`, ... , `xn-1`]	`t` ... `t` `t`[]	Return a vector with the elements `x0`, `x1`, ... , `xn-1`.
{`x1`, `x2`, ... , `xn`}	`t` ... `t` {`t`}	Return a set with the elements `x1`, `x2`, ... , `xn`. Any duplicate elements are included only once in the set. When `t` is Integer or Character, we can also replace any of the `xi`'s by a range `xi` ... `yi` that contains all integers or characters greater than or equal to `xi` and less than or equal to `yi`. `yi` must not be less than `xi` "minus" one.
`name1` `x1`, ... , `namen` `xn`	`t1` ... `tn` tuple {`name1`: `t1`; ... ; `namen`: `tn`}	Return a tuple with the fields `name1` `x1`, ... , `namen` `xn`.
`name`	oneof {`name`; `name2`: `t2`; ... ; `namen`: `tn`}	Return a oneof value with tag `name` and value .
Action[`nonterminal_i`]	Determined by Action's declaration	This notation can only be used inside an action definition for a grammar production that has nonterminal `nonterminal` on the production's right side. Return the value of action Action invoked on the `i`^th instance of nonterminal `nonterminal` on the right side of . The subscript `i` can be omitted if there is only one instance of nonterminal `nonterminal` in .
`nonterminal_i`	Character	This notation can only be used inside an action definition for a grammar production that has nonterminal `nonterminal` on the production's left or right side. Furthermore, every complete expansion of grammar nonterminal `nonterminal` must expand it into a single character. Return the character to which the `i`^th instance of nonterminal `nonterminal` on the right side of expands. The subscript `i` can be omitted if there is only one instance of nonterminal `nonterminal` in . If the subscript is omitted and nonterminal `nonterminal` appears on the left side of , then this expression returns the single character to which this whole production expands.
Suffix Operators
`u`[`i`]	`t`[] Integer `t`	`u` is a vector [`e0`, `e1`, ... , `en-1`]. Return the `i`^th element `ei`, or if `i`<0 or `in`.
`u`[`i` ... `j`]	`t`[] Integer Integer `t`[]	`u` is a vector [`e0`, `e1`, ... , `en-1`]. Return the vector slice [`ei`, `ei+1`, ... , `ej`] consisting of all elements of `u` between the `i`^th and the `j`^th, inclusive, or if `i`<0, `jn`, or `j`<`i`-1. The result is the empty vector [] if `j`=`i`-1.
`u`[`i` ...]	`t`[] Integer `t`[]	`u` is a vector [`e0`, `e1`, ... , `en-1`]. Return the vector slice [`ei`, `ei+1`, ... , `en-1`] consisting of all elements of `u` between the `i`^th and the end, or if `i`<0 or `i`>`n`. The result is the empty vector [] if `i`=`n`.
`u`[`i` `x`]	`t`[] Integer `t` `t`[]	`u` is a vector [`e0`, `e1`, ... , `en-1`]. Return the vector [`e0`, ... , `ei-1`, `x`, `ei+1`, ... , `en-1`] with the `i`^th element replaced by the value `x` and the other elements unchanged, or if `i`<0 or `in`.
`w`.`namei`	tuple {`name1`: `t1`; ... ; `namen`: `tn`} `ti`	`w` is a tuple `name1` `v1`, ... , `namen` `vn`. Return the value `vi` of `w`'s field named `namei`.
`w`.`namei`	oneof {`name1`: `t1`; ... ; `namen`: `tn`} `ti`	`w` is a oneof `namek` `v` for some `k` between 1 and `n` inclusive. Return the value `v` if `namei` is `namek`, or if not.
`f`(`x1`, ..., `xn`)	(`t1` ... `tn` `t`) `t1` ... `tn` `t`	Call the function `f` with the arguments `x1` through `xn` and return the result.
Prefix Operators
-`x`	Integer Integer or Rational Rational	The mathematical negation of `x`
min `A`	{`t`} `t`	Return the minimal element of set `A`. Specifically, if there exists a value `m` that satisfies both `m` `A` and for all elements `x` `A`, `x` `m`, then return `m`; otherwise return (this could happen either if `A` is empty or if `A` has an infinite descending sequence of elements with no lower bound in `A`). The type `t` must have = and < operations that define a total order.
max `A`	{`t`} `t`	Return the maximal element of set `A`. Specifically, if there exists a value `m` that satisfies both `m` `A` and for all elements `x` `A`, `x` `m`, then return `m`; otherwise return (this could happen either if `A` is empty or if `A` has an infinite ascending sequence of elements with no upper bound in `A`). The type `t` must have = and < operations that define a total order.
`name` `x`	`t` oneof {`name`: `t`; `name2`: `t2`; ... ; `namen`: `tn`}	Return a oneof value with tag `name` and value `x`.
Multiplicative Operators
`x` * `y`	Integer Integer Integer or Rational Rational Rational	The mathematical product of `x` and `y`
`x` / `y`	Rational Rational Rational	The mathematical quotient of `x` and `y`; if `y`=0
`A` `B`	{`t`} {`t`} {`t`}	The intersection of sets `A` and `B` (the set of all values that are present both in `A` and in `B`)
Additive Operators
`x` + `y`	Integer Integer Integer or Rational Rational Rational	The mathematical sum of `x` and `y`
`x` - `y`	Integer Integer Integer or Rational Rational Rational	The mathematical difference of `x` and `y`
`u` `v`	`t`[] `t`[] `t`[]	`u` is a vector [`e0`, `e1`, ... , `en-1`] and `v` is a vector [`f0`, `f1`, ... , `fm-1`]. Return the concatenated vector [`e0`, `e1`, ... , `en-1`, `f0`, `f1`, ... , `fm-1`].
`A` `B`	{`t`} {`t`} {`t`}	The union of sets `A` and `B` (the set of all values that are present in at least one of `A` or `B`)
`A` - `B`	{`t`} {`t`} {`t`}	The difference of sets `A` and `B` (the set of all values that are present in `A` but not `B`)
Comparison Operators
`x` = `y`	Rational Rational Boolean or Character Character Boolean or String String Boolean or {`t`} {`t`} Boolean	Comparisons return true if the relation holds or false if not. Rationals are compared mathematically. Characters are compared according to their code points. Two strings are equal when they have the same lengths and contain exactly the same sequences of characters. A string `x` is less than string `y` when either `x` is the empty string and `y` is not empty, the first character of `x` is less than the first character of `y`, or the first character of `x` is equal to the first character of `y` and the rest of string `x` is less than the rest of string `y`. Two sets `x` and `y` are equal if every element of `x` is also in `y` and every element of `y` is also in `x`. Only = and can be used to compare sets.
`x` `y`
`x` < `y`
`x` `y`
`x` > `y`
`x` `y`
`x` `A`	`t` {`t`} Boolean	Return true if `x` is an element of set `A` and false if not
`o` is `namei`	oneof {`name1`: `t1`; ... ; `namen`: `tn`} Boolean	`o` is a oneof `namek` `v` for some `k` between 1 and `n` inclusive. Return true if `namei` is `namek`, or false otherwise.
Logical Negation
not `a`	Boolean Boolean	true if `a` is false; false if `a` is true
Logical Conjunction
`a` and `b`	Boolean Boolean Boolean	true if both `a` and `b` are true; false if at least one of `a` and `b` is false
Logical Disjunction
`a` or `b`	Boolean Boolean Boolean	true if at least one of `a` and `b` is true; false if both `a` and `b` are false
`a` xor `b`	Boolean Boolean Boolean	true if `a` is true and `b` is false or `a` is false and `b` is true; false if both `a` and `b` are true or both `a` and `b` are false

Semantic Statements

Semantic statements are similar to the semantic operators above in that they are also used to construct expressions, take zero or more operands, and return a value. Unlike other semantic operators, semantic statements are usually non-strict: they do not always evaluate all of their operands. Semantic statements have lower precedence than any of the semantic operators above.

Some semantic statements are syntactic sugars, which means that they are defined as macros that expand into other, simpler statements and operators.

Function

function(param1: type1, ... , paramn: typen) body

See the description of function values.

Let

let var1: type1 = expr1; ... ; varn: typen = exprn in body

Evaluate expr1 through exprn in order and save the results. If any expri evaluates to , then immediately return without evaluating the following expr's. If any expri evaluates to v for some value v, then immediately return that v without evaluating the following expr's. Otherwise evaluate body with new local variable bindings of var1 through varn bound to the saved results of evaluating expr1 through exprn, respectively. Return the result of evaluating body.

type1 through typen are the local variables' respective semantic types. The type of the entire let expression is the type of its body.

The let expression above is syntactic sugar for:

(function(var1: type1, ... , varn: typen) body)(expr1, ... , exprn)

If

if expr then bodytrue else bodyfalse

Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to either true or false. If it evaluated to true, then evaluate bodytrue and return its result. If expr evaluated to false, then evaluate bodyfalse and return its result.

expr must have type Boolean. The entire if expression has any type t such that both bodytrue has type t and bodyfalse has type t.

Case

case expr of
    name1(var1: type1): body1;
    ...
    namen(varn: typen): bodyn;
    end

Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to a oneof name v where name matches namei for some i between 1 and n inclusive. Evaluate the corresponding bodyi with a new local variable vari bound to v. Return bodyi's result.

If we are not interested in using the oneof's value for a particular bodyi, we can shorten that bodyi's clause from:

namei(vari: typei): bodyi

to:

namei: bodyi

In this case no local variable is bound while evaluating bodyi.

expr must have type oneof {name1: type1; ... ; namen: typen}. The entire case expression has any type t such that all of its bodyi's have type t. The namei's must be distinct. The order in which the case clauses are listed does not matter.

Throw

throw expr

Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to some value v, in which case return v.

expr must have type SemanticException. The entire throw expression has any type whatsoever (because every semantic type includes v).

Try-Catch

try
bodytry
catch (var: SemanticException)
bodyhandler

Evaluate bodytry to obtain a value w. If w does not have the form v for some value v, then return w. Otherwise w is v for some value v. In this case evaluate bodyhandler with a new local variable var bound to v and return bodyhandler's result.

The type of var is always SemanticException. The entire try-catch expression has any type t such that both bodytry has type t and bodyhandler has type t.

Semantic Functions

The sections below list the predefined semantic functions, their type signatures, and short descriptions. All functions are strict and evaluate their arguments left-to-right.

Integer Manipulation

These functions perform bitwise operations on integers. The integers are treated as though they were written in binary notation, with each 1 bit representing true and 0 bit representing false. The integers must be nonnegative.

Function	Signature	Description
bitwiseAnd(`x`, `y`)	Integer Integer Integer	The bitwise AND of `x` and `y`
bitwiseOr(`x`, `y`)		The bitwise OR of `x` and `y`
bitwiseXor(`x`, `y`)		The bitwise XOR of `x` and `y`
bitwiseShift(`x`, `count`)	Integer Integer Integer	Shift `x` to the left by `count` bits. If `count` is negative, shift `x` to the right by -`count` bits. Bits shifted out are lost; bit shifted in are zero. This function is equivalent to multiplying `x` by 2^count and truncating the result (toward negative infinity) to an integer. `x` can be negative.

Double Manipulation

Function	Signature	Description
rationalToDouble(`r`)	Rational Double	The rational number `r` rounded to the nearest IEEE double-precision floating-point value as follows: Consider the set of all doubles, with -0.0, +, -, and NaN removed and with two additional values added to it that are not representable as doubles, namely 2¹⁰²⁴ and -2¹⁰²⁴. Choose the member of this set that is closest in value to `r`. If two values of the set are equally close, choose the one with an even significand; for this purpose, the two extra values 2¹⁰²⁴ and -2¹⁰²⁴ are considered to have even significands. Finally, if 2¹⁰²⁴ was chosen, replace it with +; if -2¹⁰²⁴ was chosen, replace it with -; if +0.0 was chosen, replace it with -0.0 if and only if `r` < 0; any other chosen value is used unchanged. The result is the value of rationalToDouble(`r`). This procedure corresponds exactly to the behavior of the IEEE 754 "round to nearest" mode.

Character Conversions

Function	Signature	Description
characterToCode(`c`)	Character Integer	The number of the Unicode code point `c`
codeToCharacter(`i`)	Integer Character	The Unicode code point number `i`, or if `i`<0 or `i`>65535

Character Utilities

The function digitValue is defined as follows:

digitValue(c: Character) : Integer
  = if c  {0 ... 9}
     then characterToCode(c) - characterToCode(0)
     else if c  {A ... Z}
     then characterToCode(c) - characterToCode(A) + 10
     else if c  {a ... z}
     then characterToCode(c) - characterToCode(a) + 10
     else

Character Class Queries

Function	Signature	Description
isOrdinaryInitialIdentifierCharacter(`c`)	Character Boolean	Return true if the nonterminal `OrdinaryInitialIdentifierCharacter` can expand into `c` and false otherwise
isOrdinaryContinuingIdentifierCharacter(`c`)	Character Boolean	Return true if the nonterminal `OrdinaryContinuingIdentifierCharacter` can expand into `c` and false otherwise

Semantic Definitions

Value Definitions

We can define a global semantic constant named var as follows:

var : type = expr

expr should evaluate to a value of type type. expr should not have side effects, and it should not evaluate to .

In the HTML versions of the semantics, each reference to the global semantic constant var is linked to var's definition.

Function Definitions

We can define a global semantic function named f as follows:

f(param1: type1, ... , paramn: typen) : type = body

param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, type is the function result's semantic type, and body is an expression that computes the function's result.

The above definition is syntactic sugar for the global constant definition:

f : type1 type2 ... typen type = function(param1: type1, ... , paramn: typen) body

In the HTML versions of the semantics, each reference to the global semantic function f is linked to f's definition.

For example, the function definition

square(x: Integer) : Integer = x*x

defines a function named square that takes an Integer parameter x and returns an Integer that is the square of x. This is equivalent to the following global definition:

square : Integer Integer = function(x: Integer) x*x

Type Definitions

We can give a new name to a semantic type t by using the type definition, which has the form:

type name = t

For example, the following notation defines RegExp as a shorthand for tuple {reBody: String; reFlags: String}:

type RegExp = tuple {reBody: String; reFlags: String}

In the HTML versions of the semantics, each reference to the semantic type name name is linked to name's definition.

Semantic Actions

Semantic actions tie together the grammar and the semantics. A semantic action ascribes semantic meaning to a grammar production.

To illustrate the use of semantic actions, we shall look at an example, followed by a detailed description of the notation for specifying semantic actions.

Example

Consider the following grammar, with the start nonterminal Numeral:

Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Digits

Digit

| Digits Digit

Numeral

Digits

| Digits # Digits

This grammar defines the syntax of an acceptable input: 37, 33#4 and 30#2 are acceptable syntactically, while 1a is not. However, the grammar does not indicate what these various inputs mean. That is the job of the semantics, which are defined in terms of actions on the parse tree of grammar rule expansions. Consider the following sample set of actions defined on this grammar, with a starting Numeral action called (in this example) Value:

type SemanticException = oneof {syntaxError}

action Value[Digit] : Integer = digitValue(Digit)

action DecimalValue[Digits] : Integer

DecimalValue[Digits Digit] = Value[Digit]

DecimalValue[Digits Digits₁ Digit] = 10*DecimalValue[Digits₁] + Value[Digit]

action BaseValue[Digits] : Integer Integer

BaseValue[Digits  Digit](base: Integer)
  = let d: Integer = Value[Digit]
     in if d < base
         then d
         else throw syntaxError

BaseValue[Digits  Digits₁ Digit](base: Integer)
  = let d: Integer = Value[Digit]
     in if d < base
         then base*BaseValue[Digits₁](base) + d
         else throw syntaxError

action Value[Numeral] : Integer

Value[Numeral Digits] = DecimalValue[Digits]

Value[Numeral  Digits₁ # Digits₂]
  = let base: Integer = DecimalValue[Digits₂]
     in if base  2 and base  10
         then BaseValue[Digits₁](base)
         else throw syntaxError

Action names are written in violet cursive type. The last action definition states in the example above that the action Value can be applied to any expansion of the nonterminal Numeral, and the result is an Integer. This action maps all acceptable inputs to integers or syntaxError. If the result is syntaxError, then the input satisfies the grammar but contains an error detected by the semantics; this is the case for the input 30#2. A result of would indicate a nonterminating computation; this cannot happen in this example.

There are two definitions of the Value action on Numeral, one for each grammar production that expands Numeral. Each definition of an action is allowed to call actions on the terminals and nonterminals on the right side of the expansion. For example, Value applied to the first Numeral production (the one that expands Numeral into Digits) simply applies the DecimalValue action to the expansion of the nonterminal Digits and returns the result. On the other hand, Value applied to the second Numeral production (the one that expands Numeral into Digits # Digits) performs a computation using the results of the DecimalValue and BaseValue applied to the two expansions of the Digits nonterminals. In this case there are two identical nonterminals Digits on the right side of the expansion, so we use subscripts to indicate on which one we're calling the actions DecimalValue and BaseValue.

The BaseValue action illustrates a syntactic sugar for defining an action that is a function; this syntactic sugar is analogous to that for defining global functions.

The Value action on Digit illustrates the direct use of a nonterminal in a semantic expression: digitValue(Digit). Here the Digit semantic expression evaluates to the character into which the Digit grammar rule expands.

We can fully evaluate the semantics on our sample inputs to get the following results:

Input	Semantic Result
`37`	37
`33#4`	15
`30#2`	syntaxError

Action Declarations

action Action[nonterminal] : type

This declaration states that action Action is defined on nonterminal nonterminal. Any reference to action Action[nonterminal] in a semantic expression returns a value of type type. The values of action Action must be defined using action definitions for each grammar production that has nonterminal on the left side.

Action Definitions

Action[nonterminal expansion] = expr

This notation defines the value of action Action on nonterminal nonterminal in the case where nonterminal nonterminal expands to the given expansion. expansion can contain zero or more terminals and nonterminals (as well as other notations allowed on the right side of a grammar production). Furthermore, the terminals and nonterminals of expansion can be subscripted to allow them to be unambiguously referenced by action references or nonterminal references inside expr.

The type of action Action on nonterminal nonterminal must be declared using an action declaration. expr must have the type given by that action declaration.

nonterminal expansion must be one of the productions in the grammar.

Action Function Definitions

Action[nonterminal expansion](param1: type1, ... , paramn: typen) = body

This notation is a syntactic sugar for defining an action whose value is a function. This notation is equivalent to:

Action[nonterminal expansion] =
function(param1: type1, ... , paramn: typen) body

Combined Action Declarations and Definitions

action Action[nonterminal] : type = expr

This declaration is sometimes used when all expansions of nonterminal nonterminal share the same action semantics. This declaration states both the type type of action Action on nonterminal nonterminal as well as that action's value expr. Note that the expansions are not given between the square brackets, and expr can refer only to the nonterminal nonterminal on the left side of grammar productions. No additional action definitions are needed for nonterminal nonterminal.

See the Value action on Digit in the example above for an example of this declaration.

JavaScript 2.0

Formal Description

Stages

Thursday, November 11, 1999

This page is out of date

The source code is processed in the following stages:

If necessary, convert the source code into the Unicode UTF-16 format, normalized form C.
Split the source code into tokens using the lexer grammar and lexer semantics.
Parse the resulting sequence of tokens using the parser grammar and evaluate it using the parser semantics [To be provided].

Lexing

Processing stage 2 is done as follows:

Let tokens be an empty array of Token metalanguage records. (As defined in the lexer semantics, a Token can be either an identifier, a keyword, a punctuation symbol, a number, a number with a unit, a string, or the end token.)
Let input be the input sequence of Unicode characters. Append a special placeholder End to the end of input.
Let regExpMayFollow be a Boolean variable. Initialize it to true.
Apply the lexer grammar to parse the longest possible prefix of input. If regExpMayFollow is true, use the start symbol NextToken^re. If regExpMayFollow is false, use the start symbol NextToken^div. The result of the parse should be a parse tree T. If the parse failed, return a syntax error.
Compute the action Token on T to obtain a Token t. If t is the end token, return the tokens array and go to the parse stage.
Append t to the end of the tokens array.
Compute the action RegExpMayFollow on T to obtain a Boolean value and assign that value to the regExpMayFollow variable.
Remove the characters matched by T from input, leaving only the yet-unparsed suffix of input.
Go to step 4.

If an implementation encounters an error while lexing, it is permitted to either report the error immediately or defer it until the affected token would actually be used by the parser. This flexibility allows an implementation to do lexing at the same time it parses the source program.

Provide language prohibiting an identifier from immediately following a number. This will fall out of the revised definition of QuantityLiteral.

Show mapping from Token structures to parser grammar terminals (obvious, but needs to be written).

Parsing

To be provided

JavaScript 2.0

Formal Description

Lexer Grammar

Thursday, November 11, 1999

This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

The start symbols are NextToken^re and NextToken^div depending on whether a / should be interpreted as a regular expression or division.

Unicode Character Classes

UnicodeCharacter Any Unicode character

UnicodeInitialAlphabetic Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)

UnicodeAlphanumeric Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)

WhiteSpaceCharacter

«TAB» | «VT» | «FF» | «SP» | «u00A0»

| «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»

| «u2008» | «u2009» | «u200A» | «u200B»

| «u3000»

LineTerminator «LF» | «CR» | «u2028» | «u2029»

ASCIIDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Comments

LineComment / / LineCommentCharacters

LineCommentCharacters

«empty»

| LineCommentCharacters NonTerminator

NonTerminator UnicodeCharacter except LineTerminator

BlockComment / * BlockCommentCharacters * /

BlockCommentCharacters

«empty»

| BlockCommentCharacters NonSlash

| PreSlashCharacters /

PreSlashCharacters

«empty»

| BlockCommentCharacters NonAsteriskOrSlash

| PreSlashCharacters /

NonSlash UnicodeCharacter except /

NonAsteriskOrSlash UnicodeCharacter except * | /

White space

WhiteSpace

«empty»

| WhiteSpace WhiteSpaceCharacter

| WhiteSpace LineTerminator

| WhiteSpace LineComment LineTerminator

| WhiteSpace BlockComment

Tokens

t {re, div}

NextToken^t WhiteSpace Token^t

Token^re

| Punctuator

| NumericLiteral

| QuantityLiteral

| StringLiteral

| RegExpLiteral

| EndOfInput

Token^div

| Punctuator

| DivisionPunctuator

| NumericLiteral

| QuantityLiteral

| StringLiteral

| EndOfInput

EndOfInput

End

| LineComment End

Keywords and identifiers

IdentifierName

InitialIdentifierCharacter

| IdentifierName ContinuingIdentifierCharacter

InitialIdentifierCharacter

OrdinaryInitialIdentifierCharacter

| \ HexEscape

OrdinaryInitialIdentifierCharacter UnicodeInitialAlphabetic | $ | _

ContinuingIdentifierCharacter

OrdinaryContinuingIdentifierCharacter

| \ HexEscape

OrdinaryContinuingIdentifierCharacter UnicodeAlphanumeric | $ | _

IdentifierOrReservedWord IdentifierName

Punctuators

Punctuator

PunctuatorRE

| PunctuatorDiv

PunctuatorRE

!

| ! =

| ! = =

| #

| %

| % =

| &

| & &

| & & =

| & =

| (

| *

| * =

| +

| + =

| ,

| -

| - =

| - >

| .

| . .

| . . .

| :

| : :

| ;

| <

| < <

| < < =

| < =

| =

| = =

| = = =

| >

| > =

| > >

| > > =

| > > >

| > > > =

| ?

| @

| [

| ^

| ^ =

| ^ ^

| ^ ^ =

| {

| |

| | =

| | |

| | | =

| ~

PunctuatorDiv

)

| + +

| - -

| ]

| }

DivisionPunctuator

/

| / =

Numeric literals

NumericLiteral

DecimalLiteral

| HexIntegerLiteral [lookahead{HexDigit}]

DecimalLiteral

Mantissa

| Mantissa LetterE SignedInteger

LetterE E | e

Mantissa

DecimalIntegerLiteral

| DecimalIntegerLiteral .

| DecimalIntegerLiteral . Fraction

| . Fraction

DecimalIntegerLiteral

0

NonZeroDecimalDigits

NonZeroDigit

| NonZeroDecimalDigits ASCIIDigit

NonZeroDigit 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Fraction DecimalDigits

SignedInteger

DecimalDigits

| + DecimalDigits

| - DecimalDigits

DecimalDigits

ASCIIDigit

| DecimalDigits ASCIIDigit

HexIntegerLiteral

0 LetterX HexDigit

| HexIntegerLiteral HexDigit

LetterX X | x

HexDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Quantity literals

QuantityLiteral NumericLiteral QuantityName

QuantityName [lookahead{LetterE, LetterX}] IdentifierName

String literals

q {single, double}

StringLiteral

' StringChars^single '

| " StringChars^double "

StringChars^q

«empty»

| StringChars^q StringChar^q

StringChar^q

LiteralStringChar^q

| \ StringEscape

LiteralStringChar^single UnicodeCharacter except ' | \ | LineTerminator

LiteralStringChar^double UnicodeCharacter except " | \ | LineTerminator

StringEscape

ControlEscape

| ZeroEscape

| HexEscape

| IdentityEscape

IdentityEscape NonTerminator except UnicodeAlphanumeric

ControlEscape

b

| f

| n

| r

| t

| v

ZeroEscape 0 [lookahead{ASCIIDigit}]

HexEscape

x HexDigit HexDigit

| u HexDigit HexDigit HexDigit HexDigit

Regular expression literals

RegExpLiteral RegExpBody RegExpFlags

RegExpFlags

«empty»

| RegExpFlags ContinuingIdentifierCharacter

RegExpBody / RegExpFirstChar RegExpChars /

RegExpFirstChar

OrdinaryRegExpFirstChar

| \ NonTerminator

OrdinaryRegExpFirstChar NonTerminator except \ | / | *

RegExpChars

«empty»

| RegExpChars RegExpChar

RegExpChar

OrdinaryRegExpChar

| \ NonTerminator

OrdinaryRegExpChar NonTerminator except \ | /

JavaScript 2.0

Formal Description

Lexer Semantics

Thursday, November 11, 1999

The lexer semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexer grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The start symbols are NextToken^re and NextToken^div depending on whether a / should be interpreted as a regular expression or division.

Semantics

type SemanticException = oneof {syntaxError}

Unicode Character Classes

Syntax

UnicodeCharacter Any Unicode character

UnicodeInitialAlphabetic Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)

UnicodeAlphanumeric Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)

WhiteSpaceCharacter

«TAB» | «VT» | «FF» | «SP» | «u00A0»

| «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»

| «u2008» | «u2009» | «u200A» | «u200B»

| «u3000»

LineTerminator «LF» | «CR» | «u2028» | «u2029»

ASCIIDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

action DecimalValue[ASCIIDigit] : Integer = digitValue(ASCIIDigit)

Comments

Syntax

LineComment / / LineCommentCharacters

LineCommentCharacters

«empty»

| LineCommentCharacters NonTerminator

NonTerminator UnicodeCharacter except LineTerminator

BlockComment / * BlockCommentCharacters * /

BlockCommentCharacters

«empty»

| BlockCommentCharacters NonSlash

| PreSlashCharacters /

PreSlashCharacters

«empty»

| BlockCommentCharacters NonAsteriskOrSlash

| PreSlashCharacters /

NonSlash UnicodeCharacter except /

NonAsteriskOrSlash UnicodeCharacter except * | /

White space

Syntax

WhiteSpace

«empty»

| WhiteSpace WhiteSpaceCharacter

| WhiteSpace LineTerminator

| WhiteSpace LineComment LineTerminator

| WhiteSpace BlockComment

Tokens

Syntax

t {re, div}

NextToken^t WhiteSpace Token^t

Token^re

| Punctuator

| NumericLiteral

| QuantityLiteral

| StringLiteral

| RegExpLiteral

| EndOfInput

Token^div

| Punctuator

| DivisionPunctuator

| NumericLiteral

| QuantityLiteral

| StringLiteral

| EndOfInput

EndOfInput

End

| LineComment End

Semantics

type RegExp = tuple {reBody: String; reFlags: String}

type Quantity = tuple {amount: Double; unit: String}

type Token
  = oneof {
           identifier: String;
           keyword: String;
           punctuator: String;
           number: Double;
           quantity: Quantity;
           string: String;
           regularExpression: RegExp;
           end}

action Token[NextToken^t] : Token

Token[NextToken^t WhiteSpace Token^t] = Token[Token^t]

action RegExpMayFollow[NextToken^t] : Boolean

RegExpMayFollow[NextToken^t WhiteSpace Token^t] = RegExpMayFollow[Token^t]

action Token[Token^t] : Token

Token[Token^t IdentifierOrReservedWord] = Token[IdentifierOrReservedWord]

Token[Token^t Punctuator] = Token[Punctuator]

Token[Token^div DivisionPunctuator] = punctuator Punctuator[DivisionPunctuator]

Token[Token^t NumericLiteral] = number DoubleValue[NumericLiteral]

Token[Token^t QuantityLiteral] = quantity QuantityValue[QuantityLiteral]

Token[Token^t StringLiteral] = string StringValue[StringLiteral]

Token[Token^re RegExpLiteral] = regularExpression REValue[RegExpLiteral]

Token[Token^t EndOfInput] = end

action RegExpMayFollow[Token^t] : Boolean

RegExpMayFollow[Token^t IdentifierOrReservedWord]
= RegExpMayFollow[IdentifierOrReservedWord]

RegExpMayFollow[Token^t Punctuator] = RegExpMayFollow[Punctuator]

RegExpMayFollow[Token^div DivisionPunctuator] = true

RegExpMayFollow[Token^t NumericLiteral] = false

RegExpMayFollow[Token^t QuantityLiteral] = false

RegExpMayFollow[Token^t StringLiteral] = false

RegExpMayFollow[Token^re RegExpLiteral] = false

RegExpMayFollow[Token^t EndOfInput] = true

Keywords and identifiers

Syntax

IdentifierName

InitialIdentifierCharacter

| IdentifierName ContinuingIdentifierCharacter

InitialIdentifierCharacter

OrdinaryInitialIdentifierCharacter

| \ HexEscape

OrdinaryInitialIdentifierCharacter UnicodeInitialAlphabetic | $ | _

ContinuingIdentifierCharacter

OrdinaryContinuingIdentifierCharacter

| \ HexEscape

OrdinaryContinuingIdentifierCharacter UnicodeAlphanumeric | $ | _

Semantics

action Name[IdentifierName] : String

Name[IdentifierName InitialIdentifierCharacter]
= [CharacterValue[InitialIdentifierCharacter]]

Name[IdentifierName IdentifierName₁ ContinuingIdentifierCharacter]
= Name[IdentifierName₁] [CharacterValue[ContinuingIdentifierCharacter]]

action ContainsEscapes[IdentifierName] : Boolean

ContainsEscapes[IdentifierName InitialIdentifierCharacter]
= ContainsEscapes[InitialIdentifierCharacter]

ContainsEscapes[IdentifierName IdentifierName₁ ContinuingIdentifierCharacter]
= ContainsEscapes[IdentifierName₁] or ContainsEscapes[ContinuingIdentifierCharacter]

action CharacterValue[InitialIdentifierCharacter] : Character

CharacterValue[InitialIdentifierCharacter OrdinaryInitialIdentifierCharacter]
= OrdinaryInitialIdentifierCharacter

CharacterValue[InitialIdentifierCharacter  \ HexEscape]
  = if isOrdinaryInitialIdentifierCharacter(CharacterValue[HexEscape])
     then CharacterValue[HexEscape]
     else throw syntaxError

action ContainsEscapes[InitialIdentifierCharacter] : Boolean

ContainsEscapes[InitialIdentifierCharacter OrdinaryInitialIdentifierCharacter] = false

ContainsEscapes[InitialIdentifierCharacter \ HexEscape] = true

action CharacterValue[ContinuingIdentifierCharacter] : Character

CharacterValue[ContinuingIdentifierCharacter OrdinaryContinuingIdentifierCharacter]
= OrdinaryContinuingIdentifierCharacter

CharacterValue[ContinuingIdentifierCharacter  \ HexEscape]
  = if isOrdinaryContinuingIdentifierCharacter(CharacterValue[HexEscape])
     then CharacterValue[HexEscape]
     else throw syntaxError

action ContainsEscapes[ContinuingIdentifierCharacter] : Boolean

ContainsEscapes[ContinuingIdentifierCharacter OrdinaryContinuingIdentifierCharacter]
= false

ContainsEscapes[ContinuingIdentifierCharacter \ HexEscape] = true

reservedWordsRE : String[]
  = [“abstract”,
      “break”,
      “case”,
      “catch”,
      “class”,
      “const”,
      “continue”,
      “debugger”,
      “default”,
      “delete”,
      “do”,
      “else”,
      “enum”,
      “eval”,
      “export”,
      “extends”,
      “final”,
      “finally”,
      “for”,
      “function”,
      “goto”,
      “if”,
      “implements”,
      “import”,
      “in”,
      “instanceof”,
      “native”,
      “new”,
      “package”,
      “private”,
      “protected”,
      “public”,
      “return”,
      “static”,
      “switch”,
      “synchronized”,
      “throw”,
      “throws”,
      “transient”,
      “try”,
      “typeof”,
      “var”,
      “volatile”,
      “while”,
      “with”]

reservedWordsDiv : String[] = [“false”, “null”, “super”, “this”, “true”]

nonReservedWords : String[]
  = [“box”,
      “constructor”,
      “field”,
      “get”,
      “language”,
      “local”,
      “method”,
      “override”,
      “set”,
      “version”]

keywords : String[] = reservedWordsRE reservedWordsDiv nonReservedWords

member(id: String, list: String[]) : Boolean
  = if |list| = 0
     then false
     else if id = list[0]
     then true
     else member(id, list[1 ...])

Syntax

IdentifierOrReservedWord IdentifierName

Semantics

action Token[IdentifierOrReservedWord] : Token

Token[IdentifierOrReservedWord  IdentifierName]
  = let id: String = Name[IdentifierName]
     in if member(id, keywords) and not ContainsEscapes[IdentifierName]
         then keyword id
         else identifier id

action RegExpMayFollow[IdentifierOrReservedWord] : Boolean

RegExpMayFollow[IdentifierOrReservedWord  IdentifierName]
  = let id: String = Name[IdentifierName]
     in member(id, reservedWordsRE) and not ContainsEscapes[IdentifierName]

Punctuators

Syntax

Punctuator

PunctuatorRE

| PunctuatorDiv

PunctuatorRE

!

| ! =

| ! = =

| #

| %

| % =

| &

| & &

| & & =

| & =

| (

| *

| * =

| +

| + =

| ,

| -

| - =

| - >

| .

| . .

| . . .

| :

| : :

| ;

| <

| < <

| < < =

| < =

| =

| = =

| = = =

| >

| > =

| > >

| > > =

| > > >

| > > > =

| ?

| @

| [

| ^

| ^ =

| ^ ^

| ^ ^ =

| {

| |

| | =

| | |

| | | =

| ~

PunctuatorDiv

)

| + +

| - -

| ]

| }

DivisionPunctuator

/

| / =

Semantics

action Token[Punctuator] : Token

Token[Punctuator PunctuatorRE] = punctuator Punctuator[PunctuatorRE]

Token[Punctuator PunctuatorDiv] = punctuator Punctuator[PunctuatorDiv]

action RegExpMayFollow[Punctuator] : Boolean

RegExpMayFollow[Punctuator PunctuatorRE] = true

RegExpMayFollow[Punctuator PunctuatorDiv] = false

action Punctuator[PunctuatorRE] : String

Punctuator[PunctuatorRE !] = “!”

Punctuator[PunctuatorRE ! =] = “!=”

Punctuator[PunctuatorRE ! = =] = “!==”

Punctuator[PunctuatorRE #] = “#”

Punctuator[PunctuatorRE %] = “%”

Punctuator[PunctuatorRE % =] = “%=”

Punctuator[PunctuatorRE &] = “&”

Punctuator[PunctuatorRE & &] = “&&”

Punctuator[PunctuatorRE & & =] = “&&=”

Punctuator[PunctuatorRE & =] = “&=”

Punctuator[PunctuatorRE (] = “(”

Punctuator[PunctuatorRE *] = “*”

Punctuator[PunctuatorRE * =] = “*=”

Punctuator[PunctuatorRE +] = “+”

Punctuator[PunctuatorRE + =] = “+=”

Punctuator[PunctuatorRE ,] = “,”

Punctuator[PunctuatorRE -] = “-”

Punctuator[PunctuatorRE - =] = “-=”

Punctuator[PunctuatorRE - >] = “->”

Punctuator[PunctuatorRE .] = “.”

Punctuator[PunctuatorRE . .] = “..”

Punctuator[PunctuatorRE . . .] = “...”

Punctuator[PunctuatorRE :] = “:”

Punctuator[PunctuatorRE : :] = “::”

Punctuator[PunctuatorRE ;] = “;”

Punctuator[PunctuatorRE <] = “<”

Punctuator[PunctuatorRE < <] = “<<”

Punctuator[PunctuatorRE < < =] = “<<=”

Punctuator[PunctuatorRE < =] = “<=”

Punctuator[PunctuatorRE =] = “=”

Punctuator[PunctuatorRE = =] = “==”

Punctuator[PunctuatorRE = = =] = “===”

Punctuator[PunctuatorRE >] = “>”

Punctuator[PunctuatorRE > =] = “>=”

Punctuator[PunctuatorRE > >] = “>>”

Punctuator[PunctuatorRE > > =] = “>>=”

Punctuator[PunctuatorRE > > >] = “>>>”

Punctuator[PunctuatorRE > > > =] = “>>>=”

Punctuator[PunctuatorRE ?] = “?”

Punctuator[PunctuatorRE @] = “@”

Punctuator[PunctuatorRE [] = “[”

Punctuator[PunctuatorRE ^] = “^”

Punctuator[PunctuatorRE ^ =] = “^=”

Punctuator[PunctuatorRE ^ ^] = “^^”

Punctuator[PunctuatorRE ^ ^ =] = “^^=”

Punctuator[PunctuatorRE {] = “{”

Punctuator[PunctuatorRE |] = “|”

Punctuator[PunctuatorRE | =] = “|=”

Punctuator[PunctuatorRE | |] = “||”

Punctuator[PunctuatorRE | | =] = “||=”

Punctuator[PunctuatorRE ~] = “~”

action Punctuator[PunctuatorDiv] : String

Punctuator[PunctuatorDiv )] = “)”

Punctuator[PunctuatorDiv + +] = “++”

Punctuator[PunctuatorDiv - -] = “--”

Punctuator[PunctuatorDiv ]] = “]”

Punctuator[PunctuatorDiv }] = “}”

action Punctuator[DivisionPunctuator] : String

Punctuator[DivisionPunctuator /] = “/”

Punctuator[DivisionPunctuator / =] = “/=”

Numeric literals

Syntax

NumericLiteral

DecimalLiteral

| HexIntegerLiteral [lookahead{HexDigit}]

Semantics

action DoubleValue[NumericLiteral] : Double

DoubleValue[NumericLiteral DecimalLiteral]
= rationalToDouble(RationalValue[DecimalLiteral])

DoubleValue[NumericLiteral HexIntegerLiteral [lookahead{HexDigit}]]
= rationalToDouble(IntegerValue[HexIntegerLiteral])

expt(base: Rational, exponent: Integer) : Rational
  = if exponent = 0
     then 1
     else if exponent < 0
     then 1/expt(base, -exponent)
     else base*expt(base, exponent - 1)

Syntax

DecimalLiteral

Mantissa

| Mantissa LetterE SignedInteger

LetterE E | e

Mantissa

DecimalIntegerLiteral

| DecimalIntegerLiteral .

| DecimalIntegerLiteral . Fraction

| . Fraction

DecimalIntegerLiteral

0

NonZeroDecimalDigits

NonZeroDigit

| NonZeroDecimalDigits ASCIIDigit

NonZeroDigit 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Fraction DecimalDigits

Semantics

action RationalValue[DecimalLiteral] : Rational

RationalValue[DecimalLiteral Mantissa] = RationalValue[Mantissa]

RationalValue[DecimalLiteral Mantissa LetterE SignedInteger]
= RationalValue[Mantissa]*expt(10, IntegerValue[SignedInteger])

action RationalValue[Mantissa] : Rational

RationalValue[Mantissa DecimalIntegerLiteral] = IntegerValue[DecimalIntegerLiteral]

RationalValue[Mantissa DecimalIntegerLiteral .] = IntegerValue[DecimalIntegerLiteral]

RationalValue[Mantissa DecimalIntegerLiteral . Fraction]
= IntegerValue[DecimalIntegerLiteral] + RationalValue[Fraction]

RationalValue[Mantissa . Fraction] = RationalValue[Fraction]

action IntegerValue[DecimalIntegerLiteral] : Integer

IntegerValue[DecimalIntegerLiteral 0] = 0

IntegerValue[DecimalIntegerLiteral NonZeroDecimalDigits]
= IntegerValue[NonZeroDecimalDigits]

action IntegerValue[NonZeroDecimalDigits] : Integer

IntegerValue[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit]

IntegerValue[NonZeroDecimalDigits NonZeroDecimalDigits₁ ASCIIDigit]
= 10*IntegerValue[NonZeroDecimalDigits₁] + DecimalValue[ASCIIDigit]

action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)

action RationalValue[Fraction] : Rational

RationalValue[Fraction DecimalDigits]
= IntegerValue[DecimalDigits]/expt(10, NDigits[DecimalDigits])

Syntax

SignedInteger

DecimalDigits

| + DecimalDigits

| - DecimalDigits

Semantics

action IntegerValue[SignedInteger] : Integer

IntegerValue[SignedInteger DecimalDigits] = IntegerValue[DecimalDigits]

IntegerValue[SignedInteger + DecimalDigits] = IntegerValue[DecimalDigits]

IntegerValue[SignedInteger - DecimalDigits] = -IntegerValue[DecimalDigits]

Syntax

DecimalDigits

ASCIIDigit

| DecimalDigits ASCIIDigit

Semantics

action IntegerValue[DecimalDigits] : Integer

IntegerValue[DecimalDigits ASCIIDigit] = DecimalValue[ASCIIDigit]

IntegerValue[DecimalDigits DecimalDigits₁ ASCIIDigit]
= 10*IntegerValue[DecimalDigits₁] + DecimalValue[ASCIIDigit]

action NDigits[DecimalDigits] : Integer

NDigits[DecimalDigits ASCIIDigit] = 1

NDigits[DecimalDigits DecimalDigits₁ ASCIIDigit] = NDigits[DecimalDigits₁] + 1

Syntax

HexIntegerLiteral

0 LetterX HexDigit

| HexIntegerLiteral HexDigit

LetterX X | x

HexDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

action IntegerValue[HexIntegerLiteral] : Integer

IntegerValue[HexIntegerLiteral 0 LetterX HexDigit] = HexValue[HexDigit]

IntegerValue[HexIntegerLiteral HexIntegerLiteral₁ HexDigit]
= 16*IntegerValue[HexIntegerLiteral₁] + HexValue[HexDigit]

action HexValue[HexDigit] : Integer = digitValue(HexDigit)

Quantity literals

Syntax

QuantityLiteral NumericLiteral QuantityName

QuantityName [lookahead{LetterE, LetterX}] IdentifierName

Semantics

action QuantityValue[QuantityLiteral] : Quantity

QuantityValue[QuantityLiteral NumericLiteral QuantityName]
= amount DoubleValue[NumericLiteral], unit Name[QuantityName]

action Name[QuantityName] : String

Name[QuantityName [lookahead{LetterE, LetterX}] IdentifierName]
= Name[IdentifierName]

String literals

Syntax

q {single, double}

StringLiteral

' StringChars^single '

| " StringChars^double "

Semantics

action StringValue[StringLiteral] : String

StringValue[StringLiteral ' StringChars^single '] = StringValue[StringChars^single]

StringValue[StringLiteral " StringChars^double "] = StringValue[StringChars^double]

Syntax

StringChars^q

«empty»

| StringChars^q StringChar^q

StringChar^q

LiteralStringChar^q

| \ StringEscape

LiteralStringChar^single UnicodeCharacter except ' | \ | LineTerminator

LiteralStringChar^double UnicodeCharacter except " | \ | LineTerminator

Semantics

action StringValue[StringChars^q] : String

StringValue[StringChars^q «empty»] = “”

StringValue[StringChars^q StringChars^q₁ StringChar^q]
= StringValue[StringChars^q₁] [CharacterValue[StringChar^q]]

action CharacterValue[StringChar^q] : Character

CharacterValue[StringChar^q LiteralStringChar^q] = LiteralStringChar^q

CharacterValue[StringChar^q \ StringEscape] = CharacterValue[StringEscape]

Syntax

StringEscape

ControlEscape

| ZeroEscape

| HexEscape

| IdentityEscape

IdentityEscape NonTerminator except UnicodeAlphanumeric

Semantics

action CharacterValue[StringEscape] : Character

CharacterValue[StringEscape ControlEscape] = CharacterValue[ControlEscape]

CharacterValue[StringEscape ZeroEscape] = CharacterValue[ZeroEscape]

CharacterValue[StringEscape HexEscape] = CharacterValue[HexEscape]

CharacterValue[StringEscape IdentityEscape] = IdentityEscape

Syntax

ControlEscape

b

| f

| n

| r

| t

| v

Semantics

action CharacterValue[ControlEscape] : Character

CharacterValue[ControlEscape b] = ‘«BS»’

CharacterValue[ControlEscape f] = ‘«FF»’

CharacterValue[ControlEscape n] = ‘«LF»’

CharacterValue[ControlEscape r] = ‘«CR»’

CharacterValue[ControlEscape t] = ‘«TAB»’

CharacterValue[ControlEscape v] = ‘«VT»’

Syntax

ZeroEscape 0 [lookahead{ASCIIDigit}]

Semantics

action CharacterValue[ZeroEscape] : Character

CharacterValue[ZeroEscape 0 [lookahead{ASCIIDigit}]] = ‘«NUL»’

Syntax

HexEscape

x HexDigit HexDigit

| u HexDigit HexDigit HexDigit HexDigit

Semantics

action CharacterValue[HexEscape] : Character

CharacterValue[HexEscape x HexDigit₁ HexDigit₂]
= codeToCharacter(16*HexValue[HexDigit₁] + HexValue[HexDigit₂])

CharacterValue[HexEscape  u HexDigit₁ HexDigit₂ HexDigit₃ HexDigit₄]
  = codeToCharacter(
         4096*HexValue[HexDigit₁] + 256*HexValue[HexDigit₂] + 16*HexValue[HexDigit₃] +
         HexValue[HexDigit₄])

Regular expression literals

Syntax

RegExpLiteral RegExpBody RegExpFlags

RegExpFlags

«empty»

| RegExpFlags ContinuingIdentifierCharacter

RegExpBody / RegExpFirstChar RegExpChars /

RegExpFirstChar

OrdinaryRegExpFirstChar

| \ NonTerminator

OrdinaryRegExpFirstChar NonTerminator except \ | / | *

RegExpChars

«empty»

| RegExpChars RegExpChar

RegExpChar

OrdinaryRegExpChar

| \ NonTerminator

OrdinaryRegExpChar NonTerminator except \ | /

Semantics

action REValue[RegExpLiteral] : RegExp

REValue[RegExpLiteral RegExpBody RegExpFlags]
= reBody REBody[RegExpBody], reFlags REFlags[RegExpFlags]

action REFlags[RegExpFlags] : String

REFlags[RegExpFlags «empty»] = “”

REFlags[RegExpFlags RegExpFlags₁ ContinuingIdentifierCharacter]
= REFlags[RegExpFlags₁] [CharacterValue[ContinuingIdentifierCharacter]]

action REBody[RegExpBody] : String

REBody[RegExpBody / RegExpFirstChar RegExpChars /]
= REBody[RegExpFirstChar] REBody[RegExpChars]

action REBody[RegExpFirstChar] : String

REBody[RegExpFirstChar OrdinaryRegExpFirstChar] = [OrdinaryRegExpFirstChar]

REBody[RegExpFirstChar \ NonTerminator] = [‘\’, NonTerminator]

action REBody[RegExpChars] : String

REBody[RegExpChars «empty»] = “”

REBody[RegExpChars RegExpChars₁ RegExpChar]
= REBody[RegExpChars₁] REBody[RegExpChar]

action REBody[RegExpChar] : String

REBody[RegExpChar OrdinaryRegExpChar] = [OrdinaryRegExpChar]

REBody[RegExpChar \ NonTerminator] = [‘\’, NonTerminator]

JavaScript 2.0

Formal Description

Regular Expression Grammar

Thursday, November 11, 1999

This LR(1) grammar describes the regular expression syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

Unicode Character Classes

UnicodeCharacter Any Unicode character

UnicodeAlphanumeric Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)

LineTerminator «LF» | «CR» | «u2028» | «u2029»

Regular Expression Definitions

Regular Expression Patterns

RegularExpressionPattern Disjunction

Disjunctions

Disjunction

Alternative

| Alternative | Disjunction

Alternatives

Alternative

«empty»

| Alternative Term

Terms

Term

Assertion

| Atom

| Atom Quantifier

Quantifier

QuantifierPrefix

| QuantifierPrefix ?

QuantifierPrefix

*

| +

| ?

| { DecimalDigits }

| { DecimalDigits , }

| { DecimalDigits , DecimalDigits }

DecimalDigits

DecimalDigit

| DecimalDigits DecimalDigit

DecimalDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Assertions

Assertion

^

| $

| \ b

| \ B

Atoms

Atom

PatternCharacter

| .

| \ AtomEscape

| CharacterClass

| ( Disjunction )

| ( ? : Disjunction )

| ( ? = Disjunction )

| ( ? ! Disjunction )

PatternCharacter UnicodeCharacter except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |

Escapes

AtomEscape

DecimalEscape

| CharacterEscape

CharacterEscape

ControlEscape

| c ControlLetter

| HexEscape

| IdentityEscape

ControlLetter

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

| a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z

IdentityEscape UnicodeCharacter except UnicodeAlphanumeric

ControlEscape

f

| n

| r

| t

| v

Decimal Escapes

DecimalEscape DecimalIntegerLiteral [lookahead{DecimalDigit}]

DecimalIntegerLiteral

0

NonZeroDecimalDigits

NonZeroDigit

| NonZeroDecimalDigits DecimalDigit

NonZeroDigit 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Hexadecimal Escapes

HexEscape

x HexDigit HexDigit

| u HexDigit HexDigit HexDigit HexDigit

HexDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Character Class Escapes

CharacterClassEscape

s

| S

| d

| D

| w

| W

User-Specified Character Classes

CharacterClass

[ [lookahead{^}] ClassRanges ]

| [ ^ ClassRanges ]

ClassRanges

«empty»

| NonemptyClassRanges^dash

d {dash, noDash}

NonemptyClassRanges^d

ClassAtom^dash

| ClassAtom^d NonemptyClassRanges^noDash

| ClassAtom^d - ClassAtom^dash ClassRanges

Character Class Range Atoms

ClassAtom^d

ClassCharacter^d

| \ ClassEscape

ClassCharacter^dash UnicodeCharacter except \ | ]

ClassCharacter^noDash ClassCharacter^dash except -

ClassEscape

DecimalEscape

| b

| CharacterEscape

JavaScript 2.0

Formal Description

Regular Expression Semantics

Thursday, November 11, 1999

The regular expression semantics describe the actions the regular expression engine takes in order to transform a regular expression pattern into a function for matching against input strings. For convenience, the regular expression grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The regular expression semantics below are working (except for case-insensitive matches) and have been tried on sample cases, but they could be formatted better.

Semantics

type SemanticException = oneof {syntaxError}

Unicode Character Classes

Syntax

UnicodeCharacter Any Unicode character

UnicodeAlphanumeric Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)

LineTerminator «LF» | «CR» | «u2028» | «u2029»

Semantics

lineTerminators : {Character} = {‘«LF»’, ‘«CR»’, ‘«u2028»’, ‘«u2029»’}

reWhitespaces : {Character} = {‘«FF»’, ‘«LF»’, ‘«CR»’, ‘«TAB»’, ‘«VT»’, ‘ ’}

reDigits : {Character} = {‘0’ ... ‘9’}

reWordCharacters : {Character} = {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’, ‘_’}

Regular Expression Definitions

Semantics

type REInput = tuple {str: String; ignoreCase: Boolean; multiline: Boolean}

Field str is the input string. ignoreCase and multiline are the corresponding regular expression flags.

type REResult = oneof {success: REMatch; failure}

type REMatch = tuple {endIndex: Integer; captures: Capture[]}

A REMatch holds an intermediate state during the pattern-matching process. endIndex is the index of the next input character to be matched by the next component in a regular expression pattern. If we are at the end of the pattern, endIndex is one plus the index of the last matched input character. captures is a zero-based array of the strings captured so far by capturing parentheses.

type Capture = oneof {present: String; absent}

type Continuation = REMatch REResult

A Continuation is a function that attempts to match the remaining portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. If a match is possible, it returns a success result that contains the final REMatch state; if no match is possible, it returns a failure result.

type Matcher = REInput REMatch Continuation REResult

A Matcher is a function that attempts to match a middle portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. Since the remainder of the pattern heavily influences whether (and how) a middle portion will match, we must pass in a Continuation function that checks whether the rest of the pattern matched. If the continuation returns failure, the matcher function may call it repeatedly, trying various alternatives at pattern choice points.

The REInput parameter contains the input string and is merely passed down to subroutines.

type MatcherGenerator = Integer Matcher

A MatcherGenerator is a function executed at the time the regular expression is compiled that returns a Matcher for a part of the pattern. The Integer parameter contains the number of capturing left parentheses seen so far in the pattern and is used to assign static, consecutive numbers to capturing parentheses.

characterSetMatcher(acceptanceSet: {Character}, invert: Boolean) : Matcher
  = function(t: REInput, x: REMatch, c: Continuation)
         let i: Integer = x.endIndex;
             s: String = t.str
         in if i = |s|
             then failure
             else if s[i]  acceptanceSet xor invert
             then c(endIndex (i + 1), captures x.captures)
             else failure

characterSetMatcher returns a Matcher that matches a single input string character. If invert is false, the match succeeds if the character is a member of the acceptanceSet set of characters (possibly ignoring case). If invert is true, the match succeeds if the character is not a member of the acceptanceSet set of characters (possibly ignoring case).

characterMatcher(ch: Character) : Matcher = characterSetMatcher({ch}, false)

characterMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is the same as ch (possibly ignoring case).

Regular Expression Patterns

Syntax

RegularExpressionPattern Disjunction

Semantics

action Exec[RegularExpressionPattern] : REInput Integer REResult

Exec[RegularExpressionPattern  Disjunction]
  = let match: Matcher = GenMatcher[Disjunction](0)
     in function(t: REInput, index: Integer)
             match(
                 t,
                 endIndex index, captures fillCapture(CountParens[Disjunction]),
                 successContinuation)

successContinuation(x: REMatch) : REResult = success x

fillCapture(i: Integer) : Capture[]
  = if i = 0
     then []_Capture
     else fillCapture(i - 1)  [absent]

Disjunctions

Syntax

Disjunction

Alternative

| Alternative | Disjunction

Semantics

action GenMatcher[Disjunction] : MatcherGenerator

GenMatcher[Disjunction Alternative] = GenMatcher[Alternative]

GenMatcher[Disjunction  Alternative | Disjunction₁](parenIndex: Integer)
  = let match1: Matcher = GenMatcher[Alternative](parenIndex);
         match2: Matcher = GenMatcher[Disjunction₁](parenIndex + CountParens[Alternative])
     in function(t: REInput, x: REMatch, c: Continuation)
             case match1(t, x, c) of
                success(y: REMatch): success y;
                failure: match2(t, x, c)
                end

action CountParens[Disjunction] : Integer

CountParens[Disjunction Alternative] = CountParens[Alternative]

CountParens[Disjunction Alternative | Disjunction₁]
= CountParens[Alternative] + CountParens[Disjunction₁]

Alternatives

Syntax

Alternative

«empty»

| Alternative Term

Semantics

action GenMatcher[Alternative] : MatcherGenerator

GenMatcher[Alternative  «empty»](parenIndex: Integer)
  = function(t: REInput, x: REMatch, c: Continuation)
         c(x)

GenMatcher[Alternative  Alternative₁ Term](parenIndex: Integer)
  = let match1: Matcher = GenMatcher[Alternative₁](parenIndex);
         match2: Matcher = GenMatcher[Term](parenIndex + CountParens[Alternative₁])
     in function(t: REInput, x: REMatch, c: Continuation)
             let d: Continuation
                     = function(y: REMatch)
                            match2(t, y, c)
             in match1(t, x, d)

action CountParens[Alternative] : Integer

CountParens[Alternative «empty»] = 0

CountParens[Alternative Alternative₁ Term]
= CountParens[Alternative₁] + CountParens[Term]

Terms

Syntax

Term

Assertion

| Atom

| Atom Quantifier

Semantics

action GenMatcher[Term] : MatcherGenerator

GenMatcher[Term  Assertion](parenIndex: Integer)
  = function(t: REInput, x: REMatch, c: Continuation)
         if TestAssertion[Assertion](t, x)
         then c(x)
         else failure

GenMatcher[Term Atom] = GenMatcher[Atom]

GenMatcher[Term  Atom Quantifier](parenIndex: Integer)
  = let match: Matcher = GenMatcher[Atom](parenIndex);
         min: Integer = Minimum[Quantifier];
         max: Limit = Maximum[Quantifier];
         greedy: Boolean = Greedy[Quantifier]
     in if
             (case max of
                finite(m: Integer): m < min;
                infinite: false
                end)
         then throw syntaxError
         else repeatMatcher(match, min, max, greedy, parenIndex, CountParens[Atom])

action CountParens[Term] : Integer

CountParens[Term Assertion] = 0

CountParens[Term Atom] = CountParens[Atom]

CountParens[Term Atom Quantifier] = CountParens[Atom]

Syntax

Quantifier

QuantifierPrefix

| QuantifierPrefix ?

QuantifierPrefix

*

| +

| ?

| { DecimalDigits }

| { DecimalDigits , }

| { DecimalDigits , DecimalDigits }

DecimalDigits

DecimalDigit

| DecimalDigits DecimalDigit

DecimalDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

type Limit = oneof {finite: Integer; infinite}

resetParens(x: REMatch, p: Integer, nParens: Integer) : REMatch
  = if nParens = 0
     then x
     else let y: REMatch = endIndex x.endIndex, captures x.captures[p  absent]
           in resetParens(y, p + 1, nParens - 1)

repeatMatcher(body: Matcher, min: Integer, max: Limit, greedy: Boolean, parenIndex: Integer, nBodyParens: Integer)
  : Matcher
  = function(t: REInput, x: REMatch, c: Continuation)
         if
             (case max of
                finite(m: Integer): m = 0;
                infinite: false
                end)
         then c(x)
         else let d: Continuation
                       = function(y: REMatch)
                              if min = 0 and y.endIndex = x.endIndex
                              then failure
                              else let newMin: Integer
                                            = if min = 0
                                               then 0
                                               else min - 1;
                                        newMax: Limit
                                            = case max of
                                                  finite(m: Integer): finite (m - 1);
                                                  infinite: infinite
                                                  end
                                    in repeatMatcher(
                                            body,
                                            newMin,
                                            newMax,
                                            greedy,
                                            parenIndex,
                                            nBodyParens)(t, y, c);
                   xr: REMatch = resetParens(x, parenIndex, nBodyParens)
               in if min  0
                   then body(t, xr, d)
                   else if greedy
                   then case body(t, xr, d) of
                             success(z: REMatch): success z;
                             failure: c(x)
                             end
                   else case c(x) of
                            success(z: REMatch): success z;
                            failure: body(t, xr, d)
                            end

action Minimum[Quantifier] : Integer

Minimum[Quantifier QuantifierPrefix] = Minimum[QuantifierPrefix]

Minimum[Quantifier QuantifierPrefix ?] = Minimum[QuantifierPrefix]

action Maximum[Quantifier] : Limit

Maximum[Quantifier QuantifierPrefix] = Maximum[QuantifierPrefix]

Maximum[Quantifier QuantifierPrefix ?] = Maximum[QuantifierPrefix]

action Greedy[Quantifier] : Boolean

Greedy[Quantifier QuantifierPrefix] = true

Greedy[Quantifier QuantifierPrefix ?] = false

action Minimum[QuantifierPrefix] : Integer

Minimum[QuantifierPrefix *] = 0

Minimum[QuantifierPrefix +] = 1

Minimum[QuantifierPrefix ?] = 0

Minimum[QuantifierPrefix { DecimalDigits }] = IntegerValue[DecimalDigits]

Minimum[QuantifierPrefix { DecimalDigits , }] = IntegerValue[DecimalDigits]

Minimum[QuantifierPrefix { DecimalDigits₁ , DecimalDigits₂ }]
= IntegerValue[DecimalDigits₁]

action Maximum[QuantifierPrefix] : Limit

Maximum[QuantifierPrefix *] = infinite

Maximum[QuantifierPrefix +] = infinite

Maximum[QuantifierPrefix ?] = finite 1

Maximum[QuantifierPrefix { DecimalDigits }] = finite IntegerValue[DecimalDigits]

Maximum[QuantifierPrefix { DecimalDigits , }] = infinite

Maximum[QuantifierPrefix { DecimalDigits₁ , DecimalDigits₂ }]
= finite IntegerValue[DecimalDigits₂]

action IntegerValue[DecimalDigits] : Integer

IntegerValue[DecimalDigits DecimalDigit] = DecimalValue[DecimalDigit]

IntegerValue[DecimalDigits DecimalDigits₁ DecimalDigit]
= 10*IntegerValue[DecimalDigits₁] + DecimalValue[DecimalDigit]

action DecimalValue[DecimalDigit] : Integer = digitValue(DecimalDigit)

Assertions

Syntax

Assertion

^

| $

| \ b

| \ B

Semantics

action TestAssertion[Assertion] : REInput REMatch Boolean

TestAssertion[Assertion  ^](t: REInput, x: REMatch)
  = if x.endIndex = 0
     then true
     else t.multiline and t.str[x.endIndex - 1]  lineTerminators

TestAssertion[Assertion  $](t: REInput, x: REMatch)
  = if x.endIndex = |t.str|
     then true
     else t.multiline and t.str[x.endIndex]  lineTerminators

TestAssertion[Assertion \ b](t: REInput, x: REMatch)
= atWordBoundary(x.endIndex, t.str)

TestAssertion[Assertion \ B](t: REInput, x: REMatch)
= not atWordBoundary(x.endIndex, t.str)

atWordBoundary(i: Integer, s: String) : Boolean = inWord(i - 1, s) xor inWord(i, s)

inWord(i: Integer, s: String) : Boolean
  = if i = -1 or i = |s|
     then false
     else s[i]  reWordCharacters

Atoms

Syntax

Atom

PatternCharacter

| .

| \ AtomEscape

| CharacterClass

| ( Disjunction )

| ( ? : Disjunction )

| ( ? = Disjunction )

| ( ? ! Disjunction )

PatternCharacter UnicodeCharacter except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |

Semantics

action GenMatcher[Atom] : MatcherGenerator

GenMatcher[Atom PatternCharacter](parenIndex: Integer)
= characterMatcher(PatternCharacter)

GenMatcher[Atom .](parenIndex: Integer) = characterSetMatcher(lineTerminators, true)

GenMatcher[Atom \ AtomEscape] = GenMatcher[AtomEscape]

GenMatcher[Atom  CharacterClass](parenIndex: Integer)
  = let a: {Character} = AcceptanceSet[CharacterClass]
     in characterSetMatcher(a, Invert[CharacterClass])

GenMatcher[Atom  ( Disjunction )](parenIndex: Integer)
  = let match: Matcher = GenMatcher[Disjunction](parenIndex + 1)
     in function(t: REInput, x: REMatch, c: Continuation)
             let d: Continuation
                     = function(y: REMatch)
                            let updatedCaptures: Capture[]
                                    = y.captures[parenIndex
                                           present t.str[x.endIndex ... y.endIndex - 1]]
                            in c(endIndex y.endIndex, captures updatedCaptures)
             in match(t, x, d)

GenMatcher[Atom ( ? : Disjunction )] = GenMatcher[Disjunction]

GenMatcher[Atom  ( ? = Disjunction )](parenIndex: Integer)
  = let match: Matcher = GenMatcher[Disjunction](parenIndex)
     in function(t: REInput, x: REMatch, c: Continuation)
             case match(t, x, successContinuation) of
                success(y: REMatch): c(endIndex x.endIndex, captures y.captures);
                failure: failure
                end

GenMatcher[Atom  ( ? ! Disjunction )](parenIndex: Integer)
  = let match: Matcher = GenMatcher[Disjunction](parenIndex)
     in function(t: REInput, x: REMatch, c: Continuation)
             case match(t, x, successContinuation) of
                success(y: REMatch): failure;
                failure: c(x)
                end

action CountParens[Atom] : Integer

CountParens[Atom PatternCharacter] = 0

CountParens[Atom .] = 0

CountParens[Atom \ AtomEscape] = 0

CountParens[Atom CharacterClass] = 0

CountParens[Atom ( Disjunction )] = CountParens[Disjunction] + 1

CountParens[Atom ( ? : Disjunction )] = CountParens[Disjunction]

CountParens[Atom ( ? = Disjunction )] = CountParens[Disjunction]

CountParens[Atom ( ? ! Disjunction )] = CountParens[Disjunction]

Escapes

Syntax

AtomEscape

DecimalEscape

| CharacterEscape

Semantics

action GenMatcher[AtomEscape] : MatcherGenerator

GenMatcher[AtomEscape  DecimalEscape](parenIndex: Integer)
  = let n: Integer = EscapeValue[DecimalEscape]
     in if n = 0
         then characterMatcher(‘«NUL»’)
         else if n > parenIndex
         then throw syntaxError
         else backreferenceMatcher(n)

GenMatcher[AtomEscape CharacterEscape](parenIndex: Integer)
= characterMatcher(CharacterValue[CharacterEscape])

GenMatcher[AtomEscape CharacterClassEscape](parenIndex: Integer)
= characterSetMatcher(AcceptanceSet[CharacterClassEscape], false)

backreferenceMatcher(n: Integer) : Matcher
  = function(t: REInput, x: REMatch, c: Continuation)
         case nthBackreference(x, n) of
            present(ref: String):
                  let i: Integer = x.endIndex;
                      s: String = t.str
                  in let j: Integer = i + |ref|
                  in if j > |s|
                      then failure
                      else if s[i ... j - 1] = ref
                      then c(endIndex j, captures x.captures)
                      else failure;
            absent: c(x)
            end

nthBackreference(x: REMatch, n: Integer) : Capture = x.captures[n - 1]

Syntax

CharacterEscape

ControlEscape

| c ControlLetter

| HexEscape

| IdentityEscape

ControlLetter

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

| a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z

IdentityEscape UnicodeCharacter except UnicodeAlphanumeric

ControlEscape

f

| n

| r

| t

| v

Semantics

action CharacterValue[CharacterEscape] : Character

CharacterValue[CharacterEscape ControlEscape] = CharacterValue[ControlEscape]

CharacterValue[CharacterEscape c ControlLetter]
= codeToCharacter(bitwiseAnd(characterToCode(ControlLetter), 31))

CharacterValue[CharacterEscape HexEscape] = CharacterValue[HexEscape]

CharacterValue[CharacterEscape IdentityEscape] = IdentityEscape

action CharacterValue[ControlEscape] : Character

CharacterValue[ControlEscape f] = ‘«FF»’

CharacterValue[ControlEscape n] = ‘«LF»’

CharacterValue[ControlEscape r] = ‘«CR»’

CharacterValue[ControlEscape t] = ‘«TAB»’

CharacterValue[ControlEscape v] = ‘«VT»’

Decimal Escapes

Syntax

DecimalEscape DecimalIntegerLiteral [lookahead{DecimalDigit}]

DecimalIntegerLiteral

0

NonZeroDecimalDigits

NonZeroDigit

| NonZeroDecimalDigits DecimalDigit

NonZeroDigit 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

action EscapeValue[DecimalEscape] : Integer

EscapeValue[DecimalEscape DecimalIntegerLiteral [lookahead{DecimalDigit}]]
= IntegerValue[DecimalIntegerLiteral]

action IntegerValue[DecimalIntegerLiteral] : Integer

IntegerValue[DecimalIntegerLiteral 0] = 0

IntegerValue[DecimalIntegerLiteral NonZeroDecimalDigits]
= IntegerValue[NonZeroDecimalDigits]

action IntegerValue[NonZeroDecimalDigits] : Integer

IntegerValue[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit]

IntegerValue[NonZeroDecimalDigits NonZeroDecimalDigits₁ DecimalDigit]
= 10*IntegerValue[NonZeroDecimalDigits₁] + DecimalValue[DecimalDigit]

action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)

Hexadecimal Escapes

Syntax

HexEscape

x HexDigit HexDigit

| u HexDigit HexDigit HexDigit HexDigit

HexDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

action CharacterValue[HexEscape] : Character

CharacterValue[HexEscape x HexDigit₁ HexDigit₂]
= codeToCharacter(16*HexValue[HexDigit₁] + HexValue[HexDigit₂])

action HexValue[HexDigit] : Integer = digitValue(HexDigit)

Character Class Escapes

Syntax

CharacterClassEscape

s

| S

| d

| D

| w

| W

Semantics

action AcceptanceSet[CharacterClassEscape] : {Character}

AcceptanceSet[CharacterClassEscape s] = reWhitespaces

AcceptanceSet[CharacterClassEscape S] = {‘«NUL»’ ... ‘«uFFFF»’} - reWhitespaces

AcceptanceSet[CharacterClassEscape d] = reDigits

AcceptanceSet[CharacterClassEscape D] = {‘«NUL»’ ... ‘«uFFFF»’} - reDigits

AcceptanceSet[CharacterClassEscape w] = reWordCharacters

AcceptanceSet[CharacterClassEscape W] = {‘«NUL»’ ... ‘«uFFFF»’} - reWordCharacters

User-Specified Character Classes

Syntax

CharacterClass

[ [lookahead{^}] ClassRanges ]

| [ ^ ClassRanges ]

ClassRanges

«empty»

| NonemptyClassRanges^dash

d {dash, noDash}

NonemptyClassRanges^d

ClassAtom^dash

| ClassAtom^d NonemptyClassRanges^noDash

| ClassAtom^d - ClassAtom^dash ClassRanges

Semantics

action AcceptanceSet[CharacterClass] : {Character}

AcceptanceSet[CharacterClass [ [lookahead{^}] ClassRanges ]]
= AcceptanceSet[ClassRanges]

AcceptanceSet[CharacterClass [ ^ ClassRanges ]] = AcceptanceSet[ClassRanges]

action Invert[CharacterClass] : Boolean

Invert[CharacterClass [ [lookahead{^}] ClassRanges ]] = false

Invert[CharacterClass [ ^ ClassRanges ]] = true

action AcceptanceSet[ClassRanges] : {Character}

AcceptanceSet[ClassRanges «empty»] = {}_Character

AcceptanceSet[ClassRanges NonemptyClassRanges^dash]
= AcceptanceSet[NonemptyClassRanges^dash]

action AcceptanceSet[NonemptyClassRanges^d] : {Character}

AcceptanceSet[NonemptyClassRanges^d ClassAtom^dash] = AcceptanceSet[ClassAtom^dash]

AcceptanceSet[NonemptyClassRanges^d ClassAtom^d NonemptyClassRanges^noDash₁]
= AcceptanceSet[ClassAtom^d] AcceptanceSet[NonemptyClassRanges^noDash₁]

AcceptanceSet[NonemptyClassRanges^d  ClassAtom^d₁ - ClassAtom^dash₂ ClassRanges]
  = let range: {Character}
             = characterRange(AcceptanceSet[ClassAtom^d₁], AcceptanceSet[ClassAtom^dash₂])
     in range  AcceptanceSet[ClassRanges]

characterRange(low: {Character}, high: {Character}) : {Character}
  = if |low|  1 or |high|  1
     then throw syntaxError
     else let l: Character = min low;
               h: Character = min high
           in if l  h
               then {l ... h}
               else throw syntaxError

Character Class Range Atoms

Syntax

ClassAtom^d

ClassCharacter^d

| \ ClassEscape

ClassCharacter^dash UnicodeCharacter except \ | ]

ClassCharacter^noDash ClassCharacter^dash except -

ClassEscape

DecimalEscape

| b

| CharacterEscape

Semantics

action AcceptanceSet[ClassAtom^d] : {Character}

AcceptanceSet[ClassAtom^d ClassCharacter^d] = {ClassCharacter^d}

AcceptanceSet[ClassAtom^d \ ClassEscape] = AcceptanceSet[ClassEscape]

action AcceptanceSet[ClassEscape] : {Character}

AcceptanceSet[ClassEscape  DecimalEscape]
  = if EscapeValue[DecimalEscape] = 0
     then {‘«NUL»’}
     else throw syntaxError

AcceptanceSet[ClassEscape b] = {‘«BS»’}

AcceptanceSet[ClassEscape CharacterEscape] = {CharacterValue[CharacterEscape]}

AcceptanceSet[ClassEscape CharacterClassEscape] = AcceptanceSet[CharacterClassEscape]

JavaScript 2.0

Formal Description

Parser Grammar

Thursday, November 11, 1999

This LALR(1) grammar describes the syntax of the JavaScript 2.0 proposal. The starting nonterminal is Program. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

Terminals

General tokens: Identifier Number RegularExpression String VirtualSemicolon

Punctuation tokens: ! != !== % %= & && &&= &= ( ) * *= + ++ += , - -- -= . ... / /= : :: ; < << <<= <= = == === > >= >> >>= >>> >>>= ? @ [ ] ^ ^= ^^ ^^= { | |= || ||= } ~

Future punctuation tokens: # ->

Reserved words: break case catch class const continue default delete do else eval extends false final finally for function if in instanceof new null package private public return super switch this throw true try typeof var while with

Future reserved words: abstract debugger enum export goto implements import interface native protected static synchronized throws transient volatile

Non-reserved words: box constructor field get language local method override set version

Expressions

b {allowIn, noIn}

Identifiers

Identifier

Identifier

| box

| constructor

| field

| get

| language

| local

| method

| set

| override

| version

QualifiedIdentifier

Identifier

| QualifiedIdentifier :: Identifier

| ParenthesizedExpression :: Identifier

Primary Expressions

PrimaryExpression

null

| true

| false

| Number

| Number [no line break] String

| String

| this

| super

| QualifiedIdentifier

| ? Identifier

| RegularExpression

| ParenthesizedExpression

| ParenthesizedExpression [no line break] String

| ArrayLiteral

| ObjectLiteral

| FunctionExpression

ParenthesizedExpression ( Expression^allowIn )

Function Expressions

FunctionExpression

AnonymousFunction

| NamedFunction

Object Literals

ObjectLiteral

{ }

| { FieldList }

FieldList

LiteralField

| FieldList , LiteralField

LiteralField FieldName : AssignmentExpression^allowIn

FieldName

QualifiedIdentifier

| String

| Number

Array Literals

ArrayLiteral [ ElementList ]

ElementList

LiteralElement

| ElementList , LiteralElement

LiteralElement

«empty»

| AssignmentExpression^allowIn

Postfix Unary Operators

PostfixExpression

FullPostfixExpression

FullPostfixExpression

PrimaryExpression

| FullPostfixExpression MemberOperator

| FullPostfixExpression Arguments

| PostfixExpression [no line break] ++

| PostfixExpression [no line break] --

FullNewExpression new FullNewSubexpression Arguments

ShortNewExpression new ShortNewSubexpression

FullNewSubexpression

PrimaryExpression

| FullNewSubexpression MemberOperator

ShortNewSubexpression

FullNewSubexpression