JavaScript 2.0
upnext

Friday, November 12, 1999

A multi-page version of this document is also available.


JavaScript 2.0 is an experimental proposal maintained by waldemar for future changes in the JavaScript language. The eventual language may differ significantly from this proposal, but the goal is to move in the directions indicated here and do so via a coordinated plan rather than adding miscellaneous features ad hoc on a release-by-release basis.

JavaScript is Netscape's implementation of the ECMAScript standard. The development of JavaScript 2.0 is heavily coordinated with the ECMA TC39 modularity subgroup. The intent is to make JavaScript 2.0 and ECMAScript Edition 4 be the same language, and this document will evolve as necessary to accomplish this.

Contents

Changes

The following are recent major changes in this document:

Date Revisions
Nov 11, 1999 Continuing major reorganization of this document....
Nov 5, 1999 Reorganized the document's structure into chapters. Structured the core language chapter more in the bottom-up style of the ECMAScript standard than in the previous issue-oriented style. Combined and moved rationales and issues into an appendix. Added introduction page. Removed or reworded many obsolete paragraphs throughout the document.
Nov 2, 1999 Modified the parser grammar: added [no line break] constraints, removed version lists after public keywords, added box and user-defined visibility keywords, and added named function arguments.
Oct 29, 1999 Revised the execution model based on recent ECMA modularity group discussions. JavaScript 2.0 now has a hybrid execution model instead of a pure dynamic one, which allows for better compatibility with JavaScript 1.5.
Oct 20, 1999 Added throw and try-catch semantic operators to semantic notation and used them to signal syntax errors detected by the semantics that would be impossible or too messy to detect in the grammars. Updated formal description pages to match recent ECMA TC39 subcommittee decisions: eliminated octal numbers and escapes (both in strings and in regular expressions) to match ECMAScript Edition 3, switched to using the Identifier : TypeExpression syntax for type declarations, and added local blocks and the local visibility specifier. Also simplified the parser grammar for definitions and removed the « and » syntax for regular expression literals.
Jul 26, 1999 Wrote description of semantic notation. Updated grammar notation page to describe lookahead constraints. Updated regular expression semantics to match ECMA working group decisions for ECMAScript Edition 3; one of these included changing the behavior of (?= to not backtrack.
Jun 7, 1999 Revised all grammars and semantics to simplify the grammars. Fixed several errors and omissions in the regular expression grammar and semantics. Added support for (?= and (?!.
May 16, 1999 Added regular expression grammar and semantics.
May 12, 1999 Added preliminary Formal Description chapter.
Mar 25, 1999 Added Member Lookup page. Released second draft.
Mar 24, 1999 Added many clarifications, discussion sections, and small changes throughout the pages.
Mar 23, 1999 Rewrote Execution Model page and split it off from the Definitions page. Added discussion of float to Machine Types.
Mar 22, 1999 Removed numbered versions from the Versions page; added motivation, discussion, and version aliasing using =. Removed angle brackets < and > from VersionsAndRenames.
Mar 16, 1999 Rewrote Types page. Split off byte, ubyte, short, ushort, int, uint, long, ulong into an optional Machine Types library.
Feb 18, 1999 Released first draft.

Drafts

Older drafts are also available:


JavaScript 2.0
Introduction
previousupnext

Thursday, November 11, 1999


JavaScript 2.0 is the next major step in the evolution of the JavaScript language. JavaScript 2.0 incorporates the following features in addition to those already found in JavaScript 1.5:

These facilities reinforce each other while remaining fairly small and simple. Unlike in Java, the philosophy behind them is to provide the minimal necessary facilities that other parties can use to write packages that specialize the language for particular domains rather than define these packages as part of the language core.

The versioning and access control mechanisms make the language is suitable for programming-in-the-large.

The language remains firmly in the dynamic camp. Classes can be declared statically or dynamically. JavaScript 2.0 provides introspection facilities. In some ways JavaScript 2.0 is more dynamic than JavaScript 1.5. For example, it is much easier to conditionally declare functions in JavaScript 2.0 than in 1.5: one simply defines a function inside a conditional.

The overridable basic operators can be used to implement numbers with attached units similar to the Spice proposals. Rather than implement the full unit model in the language core, JavaScript 2.0 provides the syntactic and semantic hooks to allow one to implement a unit library with whatever sophistication one's application requires.


JavaScript 2.0
Introduction
Motivation
previousupnext

Thursday, November 11, 1999

Goals

The main goals of JavaScript 2.0 are:

The following are specifically not goals of JavaScript 2.0:

JavaScript is not currently an all-purpose programming language. Its strengths are its quick execution from source (thus enabling it to be distributed in web pages in source form), its dynamism, and its interfaces to Java and other environments. JavaScript 2.0 is intended to improve upon these strengths, while adding others such as the abilities to reliably compose JavaScript programs out of components and libraries and to write object-oriented programs. On the other hand, it is not our intent to have JavaScript 2.0 supplant languages such as C++ and Java, which will still be more suitable for writing many kinds of applications, including very large, performance-critical, and low-level ones.

Rationale

The proposed features are derived from the goals above. Consider, for example, the goals of writing modular and robust applications.

To achieve modularity we would like some kind of a library mechanism. The proposed package mechanism serves this purpose, but by itself it would not be enough. Unlike existing JavaScript programs which tend to be monolithic, packages and their clients are often written by different people at different times. Once we introduce packages, we encounter the problems of the author of a package not having access to all of its clients, or the author of a client not having access to all versions of the library it needs. If we add packages to the language without solving these problems, we will never be able to achieve robustness, so we must address these problems by creating facilities for defining abstractions between packages and clients.

To create these abstractions we make the language more disciplined by adding optional types and type-checking. We also introduce a coherent and disciplined syntax for defining classes and hierarchies and versioning of classes. Unlike JavaScript 1.5, the author of a class can guarantee invariants concerning its instances and can control access to its instances, making the package author's job tractable. The class syntax is also much more self-documenting than in JavaScript 1.5, making it easier to understand and use JavaScript 2.0 code. Defining subclasses is easy in JavaScript 2.0, while doing it robustly in JavaScript 1.5 is quite difficult.

To make packages work we need to make the language more robust in other areas as well. It would not be good if one package redefined Object.toString or added methods to the Array prototype and thereby corrupted another package. We can simplify the language by eliminating many idioms like these (except when running legacy programs, which would not use packages) and provide better alternatives instead. This has the added advantage of speeding up the language's implementation by eliminating thread synchronization points. Making the standard packages robust can also significantly reduce the memory requirements and improve speed on servers by allowing packages to be shared among many different requests rather than having to start with a clean set of packages for each request because some other request might have modified some property.

JavaScript 2.0 should interface with other languages even better than JavaScript 1.5 does. If the goal of integration is achieved, the user of an abstraction should not have to care much about whether the abstraction is written in JavaScript, Java, or another language. It should also be possible to make JavaScript abstractions that appear native to Java or other language users.

In order to achieve seamless interfacing with other languages, JavaScript should provide equivalents for the fundamental data types of those languages. Details such as syntax do not have to be the same, but the concepts should be there. JavaScript 1.5 lacks support for integers, making it hard to interface with a Java method that expects a long.

JavaScript is appearing in a number of different application domains, many of which are evolving. Rather than support all of these domains in the core JavaScript, JavaScript 2.0 should provide flexible facilities that allow these application domains to define their own, evolving standards that are convenient to use without requiring continuous changes to the core of JavaScript. JavaScript 2.0 addresses this goal by letting user programs define facilities such as getters, setters, and alternative definitions of operators --facilities that could only be done by the core of the language in JavaScript 1.5.


JavaScript 2.0
Introduction
Notation
previousupnext

Thursday, November 11, 1999

Character Notation

This proposal uses the following conventions to denote literal characters:

Printable ASCII literal characters (values 20 through 7E hexadecimal) are in a blue monospaced font. Other characters are denoted by enclosing their four-digit hexadecimal Unicode value between «u and ». For example, the non-breakable space character would be denoted in this document as «u00A0». A few of the common control characters are represented by name:

Abbreviation   Unicode Value
«NUL» «u0000»
«BS» «u0008»
«TAB» «u0009»
«LF» «u000A»
«VT» «u000B»
«FF» «u000C»
«CR» «u000D»
«SP» «u0020»

A space character is denoted in this document either by a blank space where it's obvious from the context or by «SP» where the space might be confused with some other notation.

Grammar Notation

Each LR(1) parser grammar and lexer grammar rule consists of a nonterminal, a , and one or more expansions of the nonterminal separated by vertical bars (|). The expansions are usually listed on separate lines but may be listed on the same line if they are short. An empty expansion is denoted as «empty».

Consider the sample rule:

SampleList 
   «empty»
|  ... Identifier
|  SampleListPrefix
|  SampleListPrefix , ... Identifier

This rule states that the nonterminal SampleList can represent one of four kinds of sequences of input tokens:

Input tokens are characters (and the special End placeholder) in the lexer grammar and lexer tokens in the parser grammar. Spaces separate input tokens and nonterminals from each other. An input token that consists of a space character is denoted as «SP». Other non-ASCII or non-printable characters are denoted by also using « and », as described in the character notation section.

Lookahead Constraints

If the phrase "[lookahead set]" appears in the expansion of a production, it indicates that the production may not be used if the immediately following input terminal is a member of the given set. That set can be written as a list of terminals enclosed in curly braces. For convenience, set can also be written as a nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand.

For example, given the rules

DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
DecimalDigits 
   DecimalDigit
|  DecimalDigits DecimalDigit

the rule

LookaheadExample 
   n [lookahead  {13579}] DecimalDigits
|  DecimalDigit [lookahead  {DecimalDigit}]

matches either the letter n followed by one or more decimal digits the first of which is even, or a decimal digit not followed by another decimal digit.

These lookahead constraints do not make the grammars more theoretically powerful than LR(1), but they do allow these grammars to be written more simply. The semantic engine compiles grammars with lookahead constraints into parse tables that have the same format as those produced from ordinary LR(1) or LALR(1) grammars.

Parametrized Rules

Many rules in the grammars occur in groups of analogous rules. Rather than list them individually, these groups have been summarized using the shorthand illustrated by the example below:

Metadefinitions such as

a   {normalinitial}
b   {allowInnoIn}

introduce grammar arguments a and b. If these arguments later parametrize the nonterminal on the left side of a rule, that rule is implicitly replicated into a set of rules in each of which a grammar argument is consistently substituted by one of its variants. For example, the sample rule

AssignmentExpressiona,b 
   ConditionalExpressiona,b
|  LeftSideExpressiona = AssignmentExpressionnormal,b
|  LeftSideExpressiona CompoundAssignment AssignmentExpressionnormal,b

expands into the following four rules:

AssignmentExpressionnormal,allowIn 
   ConditionalExpressionnormal,allowIn
|  LeftSideExpressionnormal = AssignmentExpressionnormal,allowIn
|  LeftSideExpressionnormal CompoundAssignment AssignmentExpressionnormal,allowIn
AssignmentExpressionnormal,noIn 
   ConditionalExpressionnormal,noIn
|  LeftSideExpressionnormal = AssignmentExpressionnormal,noIn
|  LeftSideExpressionnormal CompoundAssignment AssignmentExpressionnormal,noIn
AssignmentExpressioninitial,allowIn 
   ConditionalExpressioninitial,allowIn
|  LeftSideExpressioninitial = AssignmentExpressionnormal,allowIn
|  LeftSideExpressioninitial CompoundAssignment AssignmentExpressionnormal,allowIn
AssignmentExpressioninitial,noIn 
   ConditionalExpressioninitial,noIn
|  LeftSideExpressioninitial = AssignmentExpressionnormal,noIn
|  LeftSideExpressioninitial CompoundAssignment AssignmentExpressionnormal,noIn

AssignmentExpressionnormal,allowIn is now an unparametrized nonterminal and processed normally by the grammar.

Some of the expanded rules (such as the fourth one in the example above) may be unreachable from the grammar's starting nonterminal; these are ignored.

Special Lexer Rules

A few lexer rules have too many expansions to be practically listed. These are specified by descriptive text instead of a list of expansions after the .

Some lexer rules contain the metaword except. These rules match any expansion that is listed before the except but that does not match any expansion after the except. All of these rules ultimately expand into single characters. For exaple, the rule below matches any single UnicodeCharacter except the * and / characters:

NonAsteriskOrSlash  UnicodeCharacter except * | /

Informal Grammar Syntax

A few parts of the main body of this proposal still use an informal syntax to describe language constructs, although this syntax is being phased out. An example is the following:

VersionsAndRenames 
   [< VersionRange [: Identifier, ... , VersionRange [: Identifier>]
VersionRange 
   Version
|  [Version.. [Version]

VersionsAndRenames and VersionRange are the names of the grammar rules. The black square brackets represent optional items, and the black ... together with its neighbors represents optional repetition of zero or more items, so a VersionsAndRenames can have zero or more sets of VersionRange [: Identifier] separated by commas. A black | indicates that either its left or right alternative may be present, but not both; |'s have the lowest metasymbol precedence. Syntactic tokens to be typed literally are in a bold blue monospaced font. Grammar nonterminals are in green italic and correspond to the nonterminals in the parser grammar or lexer grammar.


JavaScript 2.0
Core Language
previousupnext

Thursday, November 11, 1999


This chapter presents an informal description of the core language. The exact syntax and semantics are specified in the formal description. Libraries are also specified in a separate library chapter.


JavaScript 2.0
Core Language
Concepts
previousupnext

Thursday, November 11, 1999

Types

The words type and class are used interchangeably in this specification. A type represents a possibly infinite set of values. A value can be a member of multiple such sets, so a value can have more than one type. A value may not have an intrinsic most specific type -- one can ask whether the value v is a member of a given type t, but this does not prevent the value v from also being a member of some unrelated type s. For example, null is a member of type Array as well as type Function, but neither Array nor Function is a subtype of the other.

On the other hand, a variable does have a particular type. If one declares a variable x of type Array, then whatever value is held in x is guaranteed to have type Array, and one can assign any value of type Array to x.


JavaScript 2.0
Core Language
Lexer
previousupnext

Thursday, November 11, 1999

This section presents an informal overview of the JavaScript 2.0 lexer. See the stages and lexer semantics sections in the formal description chapter for the details.

Changes since JavaScript 1.5

The JavaScript 2.0 lexer behaves in the same way as the JavaScript 1.5 lexer except for the following:

Source Code

JavaScript 2.0 source text consists of a sequence of UTF-16 Unicode version 2.1 or later characters normalized to Unicode Normalized Form C (canonical composition), as described in the Unicode Technical Report #15.

Comments and White Space

Comments and white space behave just like in JavaScript 1.5.

Punctuators

The following JavaScript 1.5 punctuation tokens are recognized in JavaScript 2.0:

!   !=   !==   %   %=   &   &&   &=   (   )   *   *=   +   ++   +=   ,   -   --   -=   .   /   /=   :   ::   ;   <   <<   <<=   <=   =   ==   ===   >   >=   >>   >>=   >>>   >>>=   ?   [   ]   ^   ^=   {   |   |=   ||   }   ~

The following punctuation tokens are new in JavaScript 2.0:

#   &&=   ->   ..   ...   @   ^^   ^^=   ||=

Keywords

The following reserved words are used in JavaScript 2.0:

break   case   catch   class   const   continue   default   delete   do   else   eval   extends   false   final   finally   for   function   if   in   instanceof   new   null   package   private   public   return   super   switch   this   throw   true   try   typeof   var   while   with

Out of these, the only word that was not reserved in JavaScript 1.5 is eval.

The following reserved words are reserved for future expansion:

abstract   debugger   enum   export   goto   implements   import   interface   native   protected   static   synchronized   throws   transient   volatile

The following words have special meaning in some contexts in JavaScript 2.0 but are not reserved and may be used as identifiers:

box   constructor   field   get   language   local   method   override   set   version

The following words name predefined types but are not reserved and may be used as identifiers (although this is not recommended):

Any   Array   array   boolean   character   Function   integer   Null   number   Object   object   string   Type   type   void

Semicolon Insertion

The JavaScript 2.0 grammar explicitly makes semicolons optional in the following situations:

Semicolons are optional in these situations even if they would construct empty statements. Strict mode has no effect on semicolon insertion in the above cases.

In addition, sometimes line breaks in the input stream are turned into VirtualSemicolon tokens. Specifically, if the first through the nth tokens of a JavaScript program form are grammatically valid but the first through the n+1st tokens are not and there is a line break (or a comment including a line break) between the nth tokens and the n+1st tokens, then the parser tries to parse the program again after inserting a VirtualSemicolon token between the nth and the n+1st tokens. This kind of VirtualSemicolon insertion does not occur in strict mode.

See also the semicolon insertion syntax rationale.

Regular Expression Literals

Regular expression literals begin with a slash (/) character not immediately followed by another slash (two slashes start a line comment). Like in JavaScript 1.5, regular expression literals are ambiguous with the division (/) or division-assignment (/=) tokens. The lexer treats a / or /= as a division or division-assignment token if either of these tokens would be allowed by the syntactic grammar as the next token; otherwise, the lexer treats a / or /= as starting a regular expression.

This unfortunate dependence of lexical parsing on grammatical parsing is inherited from JavaScript 1.5. See the regular expression syntax rationale for a discussion of the issues.

Units

When a numeric literal is be immediately followed by an optional underscore and an identifier, the lexer drops the underscore if it is present and converts the identifier to a string literal. The parser then treats the number and string as a unit expression. There are no reserved word restrictions on the identifier in this case; any identifier that begins with a letter will work, even if it is a reserved word.

For example, 3in and 3_in are both converted to 3 "in". 5xena is converted to 5 "xena". On the other hand, 0xena is converted to 0xe "na". It is unwise to define unit names that begin with the letters e or E either alone or followed by a decimal digit, or x or X followed by a hexadecimal digit because of potential ambiguities with exponential or hexadecimal notation.


JavaScript 2.0
Core Language
Expressions
previousupnext

Thursday, November 11, 1999

Most of the behavior of expressions is the same as in JavaScript 1.5. Differences are highlighted below. One general difference is that most expression operators can be overridden via operator overloading.

b  {allowInnoIn}

Identifiers

Identifier 
   Identifier
|  box
|  constructor
|  field
|  get
|  language
|  local
|  method
|  set
|  override
|  version

The above keywords are not reserved and may be used in identifiers.

QualifiedIdentifier 
   Identifier
|  QualifiedIdentifier :: Identifier
|  ParenthesizedExpression :: Identifier

Just like in ECMAScript Edition 3, an identifier evaluates to an internal data structure called a reference. However, JavaScript 2.0 references have several additional attributes, one of which is a namespace. The namespace is set to the value of the ParenthesizedExpression. If the ParenthesizedExpression is a simple Identifier or QualifiedIdentifier then the parentheses may be omitted.

Primary Expressions

PrimaryExpression 
   null
|  true
|  false
|  Number
|  Number [no line break] String
|  String
|  this
|  super
|  QualifiedIdentifier
|  ? Identifier
|  RegularExpression
|  ParenthesizedExpression
|  ParenthesizedExpression [no line break] String
|  ArrayLiteral
|  ObjectLiteral
|  FunctionExpression
ParenthesizedExpression  ( ExpressionallowIn )

A Number literal or ParenthesizedExpression followed by a String literal is a unit expression. The unit object specified by the String is looked up; the result is called as a function and passed two arguments: the numeric value of the Number literal or ParenthesizedExpression, and either null (if a ParenthesizedExpression was provided) or the original Number literal expressed as a string.

The string representation allows user-defined unit classes to define extended syntaxes for numbers. For instance, a long-integer package might define a unit called "L" that treats the Number literal as a full 64-bit number without rounding it to a double first.

A ? Identifier expression is used to access scope information.

Function Expressions

FunctionExpression 
   AnonymousFunction
|  NamedFunction

Object Literals

ObjectLiteral 
   { }
|  { FieldList }
FieldList 
   LiteralField
|  FieldList , LiteralField
LiteralField  FieldName : AssignmentExpressionallowIn
FieldName 
   QualifiedIdentifier
|  String
|  Number

Array Literals

ArrayLiteral  [ ElementList ]
ElementList 
   LiteralElement
|  ElementList , LiteralElement
LiteralElement 
   «empty»
|  AssignmentExpressionallowIn

Postfix Unary Operators

PostfixExpression 
   FullPostfixExpression
|  ShortNewExpression
FullPostfixExpression 
   PrimaryExpression
|  FullNewExpression
|  FullPostfixExpression MemberOperator
|  FullPostfixExpression Arguments
|  PostfixExpression [no line break] ++
|  PostfixExpression [no line break] --
FullNewExpression  new FullNewSubexpression Arguments
ShortNewExpression  new ShortNewSubexpression
FullNewSubexpression 
   PrimaryExpression
|  FullNewSubexpression MemberOperator
|  FullNewExpression
ShortNewSubexpression 
   FullNewSubexpression
|  ShortNewExpression
MemberOperator 
   [ ArgumentList ]
|  . QualifiedIdentifier
|  . ParenthesizedExpression
|  @ QualifiedIdentifier
|  @ ParenthesizedExpression

The @ operator performs a type cast. The second operand specifies the type. Both the . and the @ operators accept either a QualifiedIdentifier or a ParenthesizedExpression as the second operand. If it is a ParenthesizedExpression, the second operand of . must evaluate to a string. a.(x) is a synonym for a[x] except that the latter can be overridden via operator overloading.

The [] operator can take multiple (or even named) arguments. This allows users to define data structures such as multidimensional arrays via operator overloading.

Arguments  ( ArgumentList )
ArgumentList 
   «empty»
|  ArgumentListPrefix
|  NamedArgumentListPrefix
ArgumentListPrefix 
   AssignmentExpressionallowIn
|  ArgumentListPrefix , AssignmentExpressionallowIn
NamedArgumentListPrefix 
   LiteralField
|  ArgumentListPrefix , LiteralField
|  NamedArgumentListPrefix , LiteralField

An ArgumentList can contain both positional and named arguments. Named arguments use the same syntax as object literals.

Prefix Unary Operators

UnaryExpression 
   PostfixExpression
|  delete PostfixExpression
|  typeof UnaryExpression
|  eval UnaryExpression
|  ++ PostfixExpression
|  -- PostfixExpression
|  + UnaryExpression
|  - UnaryExpression
|  ~ UnaryExpression
|  ! UnaryExpression

Multiplicative Operators

MultiplicativeExpression 
   UnaryExpression
|  MultiplicativeExpression * UnaryExpression
|  MultiplicativeExpression / UnaryExpression
|  MultiplicativeExpression % UnaryExpression

Additive Operators

AdditiveExpression 
   MultiplicativeExpression
|  AdditiveExpression + MultiplicativeExpression
|  AdditiveExpression - MultiplicativeExpression

Bitwise Shift Operators

ShiftExpression 
   AdditiveExpression
|  ShiftExpression << AdditiveExpression
|  ShiftExpression >> AdditiveExpression
|  ShiftExpression >>> AdditiveExpression

Relational Operators

RelationalExpressionallowIn 
   ShiftExpression
|  RelationalExpressionallowIn < ShiftExpression
|  RelationalExpressionallowIn > ShiftExpression
|  RelationalExpressionallowIn <= ShiftExpression
|  RelationalExpressionallowIn >= ShiftExpression
|  RelationalExpressionallowIn instanceof ShiftExpression
|  RelationalExpressionallowIn in ShiftExpression
RelationalExpressionnoIn 
   ShiftExpression
|  RelationalExpressionnoIn < ShiftExpression
|  RelationalExpressionnoIn > ShiftExpression
|  RelationalExpressionnoIn <= ShiftExpression
|  RelationalExpressionnoIn >= ShiftExpression
|  RelationalExpressionnoIn instanceof ShiftExpression

Equality Operators

EqualityExpressionb 
   RelationalExpressionb
|  EqualityExpressionb == RelationalExpressionb
|  EqualityExpressionb != RelationalExpressionb
|  EqualityExpressionb === RelationalExpressionb
|  EqualityExpressionb !== RelationalExpressionb

Binary Bitwise Operators

BitwiseAndExpressionb 
   EqualityExpressionb
|  BitwiseAndExpressionb & EqualityExpressionb
BitwiseXorExpressionb 
   BitwiseAndExpressionb
|  BitwiseXorExpressionb ^ BitwiseAndExpressionb
|  BitwiseXorExpressionb ^ *
|  BitwiseXorExpressionb ^ ?
BitwiseOrExpressionb 
   BitwiseXorExpressionb
|  BitwiseOrExpressionb | BitwiseXorExpressionb
|  BitwiseOrExpressionb | *
|  BitwiseOrExpressionb | ?

Binary Logical Operators

LogicalAndExpressionb 
   BitwiseOrExpressionb
|  LogicalAndExpressionb && BitwiseOrExpressionb
LogicalXorExpressionb 
   LogicalAndExpressionb
|  LogicalXorExpressionb ^^ LogicalAndExpressionb

The ^^ operator is a logical exclusive-or operator. It evaluates both operands. If they both convert to true or both convert to false, then ^^ returns false; otherwise ^^ returns the unconverted value of whichever argument converted to true.

LogicalOrExpressionb 
   LogicalXorExpressionb
|  LogicalOrExpressionb || LogicalXorExpressionb

Conditional Operator

ConditionalExpressionb 
   LogicalOrExpressionb
|  LogicalOrExpressionb ? AssignmentExpressionb : AssignmentExpressionb
NonAssignmentExpressionb 
   LogicalOrExpressionb
|  LogicalOrExpressionb ? NonAssignmentExpressionb : NonAssignmentExpressionb

Assignment Operators

AssignmentExpressionb 
   ConditionalExpressionb
|  PostfixExpression = AssignmentExpressionb
|  PostfixExpression CompoundAssignment AssignmentExpressionb
CompoundAssignment 
   *=
|  /=
|  %=
|  +=
|  -=
|  <<=
|  >>=
|  >>>=
|  &=
|  ^=
|  |=
|  &&=
|  ^^=
|  ||=

Expressions

Expressionb 
   AssignmentExpressionb
|  Expressionb , AssignmentExpressionb
OptionalExpression 
   ExpressionallowIn
|  «empty»

Type Expressions

TypeExpressionb  NonAssignmentExpressionb

JavaScript 2.0
Core Language
Statements
previousupnext

Thursday, November 11, 1999

Most of the behavior of statements is the same as in JavaScript 1.5. Differences are highlighted below.

w  {abbrevabbrevNonEmptyabbrevNoShortIffull}
TopStatementw 
   Statementw
|  LanguageDeclarationw
Statementw 
   AnnotatedDefinitionw
|  EmptyStatementw
|  ExpressionStatement Semicolonw
|  AnnotatedBlock
|  LabeledStatementw
|  IfStatementw
|  SwitchStatement
|  DoStatement Semicolonw
|  WhileStatementw
|  ForStatementw
|  WithStatementw
|  ContinueStatement Semicolonw
|  BreakStatement Semicolonw
|  ReturnStatement Semicolonw
|  ThrowStatement Semicolonw
|  TryStatement
Semicolonabbrev 
   ;
|  VirtualSemicolon
|  «empty»
SemicolonabbrevNonEmpty 
   ;
|  VirtualSemicolon
|  «empty»
SemicolonabbrevNoShortIf 
   ;
|  VirtualSemicolon
|  «empty»
Semicolonfull 
   ;
|  VirtualSemicolon

Empty Statement

EmptyStatementabbrev 
   ;
|  «empty»
EmptyStatementabbrevNonEmpty  ;
EmptyStatementabbrevNoShortIf  ;
EmptyStatementfull  ;

Expression Statement

ExpressionStatement  [lookahead{function{}] ExpressionallowIn

Block

AnnotatedBlock 
   Block
|  Visibility [no line break] Block
Block  { TopStatements }
TopStatements 
   TopStatementabbrev
|  TopStatementsPrefix TopStatementabbrevNonEmpty
TopStatementsPrefix 
   TopStatementfull
|  TopStatementsPrefix TopStatementfull

Boxes

A box has the syntax:

   box { Statement ... Statement }

A box behaves like a regular block except that it forms its own scope. Variable and function definitions without a Visibility prefix inside the box belong to that box instead of the global scope or the enclosing function, class, or box.

Visibility-Specifying Blocks

A block can be annotated with a Visibility prefix as follows:

   Visibility { Statement ... Statement }

Such a block behaves like a regular block except that every declaration inside that block (but not inside any enclosed function, class, box, or nested visibility-specifying block) that does not have an explicit Visibility prefix uses the Visibility prefix given by the block.

Visibility-specifying blocks are useful to define several items without having to repeat a Visibility prefix for each one. For example,

class foo {
  field z:integer;
  public field a;
  private field b;
  public method f() {}
  public method g(x:integer) {}
}

is equivalent to:

class foo {
  field z:integer;
  public {
    field a;
    private field b;
    method f() {}
    method g(x:integer) {}
  }
}

Labeled Statements

LabeledStatementw  Identifier : Statementw

If Statement

IfStatementabbrev 
   if ParenthesizedExpression Statementabbrev
|  if ParenthesizedExpression StatementabbrevNoShortIf else Statementabbrev
IfStatementabbrevNonEmpty 
   if ParenthesizedExpression StatementabbrevNonEmpty
|  if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNonEmpty
IfStatementfull 
   if ParenthesizedExpression Statementfull
|  if ParenthesizedExpression StatementabbrevNoShortIf else Statementfull
IfStatementabbrevNoShortIf  if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNoShortIf

The semicolon is optional before the else.

Switch Statement

SwitchStatement 
   switch ParenthesizedExpression { }
|  switch ParenthesizedExpression { CaseGroups LastCaseGroup }
CaseGroups 
   «empty»
|  CaseGroups CaseGroup
CaseGroup  CaseGuards CaseStatementsPrefix
LastCaseGroup  CaseGuards CaseStatements
CaseGuards 
   CaseGuard
|  CaseGuards CaseGuard
CaseGuard 
   case ExpressionallowIn :
|  default :
CaseStatements 
   Statementabbrev
|  CaseStatementsPrefix StatementabbrevNonEmpty
CaseStatementsPrefix 
   Statementfull
|  CaseStatementsPrefix Statementfull

Do-While Statement

DoStatement  do StatementabbrevNonEmpty while ParenthesizedExpression

The semicolon is optional before the closing while.

While Statement

WhileStatementw  while ParenthesizedExpression Statementw

For Statements

ForStatementw 
   for ( ForInitializer ; OptionalExpression ; OptionalExpression ) Statementw
|  for ( ForInBinding in ExpressionallowIn ) Statementw
ForInitializer 
   «empty»
|  ExpressionnoIn
|  VariableDefinitionKind VariableBindingListnoIn
ForInBinding 
   PostfixExpression
|  VariableDefinitionKind VariableBindingnoIn

With Statement

WithStatementw  with ParenthesizedExpression Statementw

Continue and Break Statements

ContinueStatement  continue [no line break] OptionalLabel
BreakStatement  break [no line break] OptionalLabel
OptionalLabel 
   «empty»
|  Identifier

Return Statement

ReturnStatement  return [no line break] OptionalExpression

Throw Statement

ThrowStatement  throw [no line break] ExpressionallowIn

Try Statement

TryStatement 
   try AnnotatedBlock CatchClauses
|  try AnnotatedBlock FinallyClause
|  try AnnotatedBlock CatchClauses FinallyClause
CatchClauses 
   CatchClause
|  CatchClauses CatchClause
CatchClause  catch ( TypedIdentifierallowIn ) AnnotatedBlock
FinallyClause  finally AnnotatedBlock

Programs

Program  TopStatements

JavaScript 2.0
Core Language
Definitions
previousupnext

Thursday, November 11, 1999

Definitions

AnnotatedDefinitionw 
   Visibility [no line break] Definitionw
|  Definitionw
Definitionw 
   VariableDefinition Semicolonw
|  FunctionDefinition
|  MemberDefinitionw
|  ClassDefinition

Definition Visibility

Visibility 
   ParenthesizedExpression
|  local
|  box
|  private
|  package
|  public
|  Identifier

Any definition can have a Visibility prefix. That prefix specifies the following:

A Visibility prefix can be one of the prefixes in the table below, or it can be user-defined. User-defined Visibility prefixes allow the author of a package P to control definition visibility based on the version by which a client package imports P. User-defined Visibility prefixes also allow definition access to be controlled by the manner in which a client attempts to reference the definition.

The following are the predefined Visibility prefixes. The access privileges they provide are described in more detail in the next section. Unless overridden, the default Visibility is box.

Visibility   Access allowed from
local only within current block
box only within current package (when applied to a class member), function, or box
private only within current class
package   only within current package
public within any package that imports this package

Terminology

To understand the scope to which a definition applies we need to define a few terms. In the definitions below D represents a variable, function, member, or class definition.

Containing block
The containing block of D is the innermost Block (including function and class Blocks) lexically enclosing D. If there is no such block because D is at the top level, then the containing block of D is the package scope.
Containing box
The containing box of D is the innermost function Block, class Block, or box Block lexically enclosing D. If there is no such block, then the containing box of D is the package scope.
Containing class
The containing class of D is the innermost class Block lexically enclosing D. If there is no such block, then the containing class of D is the package scope.
Containing visibility specifier
The containing visibility specifier of D is the innermost function Block, class Block, box Block, or visibility-specifying block lexically enclosing D. If there is no such block, then the containing visibility specifier of D is the package scope.
Visibility default
If the containing visibility specifier of D is a visibility-specifying block E, then the visibility default of D is E's Visibility prefix. Otherwise, the visibility default of D is box.

Scope Rules

To determine the scope S to which a definition D applies, we look up the definition's Visibility prefix in the table below. A definition without a Visibility prefix uses its visibility default prefix.

Visibility   Scope where entity is declared
local D's containing block
box D's containing box
private D's containing class
package   D's containing class
public D's containing class
User-defined   D's containing class

The scope S is not the scope in which the definition is accessible; rather, it is the scope into which the declared entity is inserted.

If S is a class and Visibility is not local, then the declared entity will appear as a member of class S. If S is a class and Visibility is local, then the declared entity will only be created inside class S's block without becoming a member of class S; it is an error if this case arises for a method or field definition.

Once the scope S is known, the accessibility of definition D is determined by the table below. P is the lexically enclosing package.

Visibility
prefix
Scope S is ...
a package P   a class C   a function F   a box B   a block B  
local Package P Class C Function F Box B Block B
box Package P Package P Function F Box B
private Package P Class C
package   Package P Package P
public Any package Any package
User-defined   User-defined User-defined

The rest of this page is out of date

All of these definitions share several common scoping rules:

  1. A definition that applies to a scope can be referenced lexically from anywhere within that scope unless shadowed by a more local definition.
  2. A definition that applies to a scope lasts until that scope is exited. No other definition may be executed for the same identifier applying to the same scope (with the exceptions that both a getter and a setter may be defined with the same name and that versions have a namespace separate from other definitions).
  3. If code executing inside a scope s has already made an attempt to resolve identifier i and that resolution either bound i to a definition of i in a scope enclosing s or failed because i wasn't defined, then no definition of i applying to scope s may be executed.

Rules 3 and 4 state that once an identifier is resolved to a variable or function in a scope, that resolution cannot be changed. This permits efficient compilation and avoids confusion with programs such as:

const b:integer = 7;

function f():integer {
  function g():integer {return b}

  var a = g();
  const b:integer = 8;
  return g() - a;
}

Scopes

Definitions at the top level of a Program or at the top level of a ClassDefinition's Block may omit Visibility, in which case they are treated as if they had package visibility. When used outside a ClassDefinition's Block, private is equivalent to package.

does not apply to the current Block. Instead, it declares either an entity at the top level of the current package (if outside a ClassDefinition's Block) or a member of the current class (if inside a ClassDefinition's Block). In addition to lifting the definition out of the current scope in this way, Visibility also specifies the definition's visibility from other packages or classes. Visibility can take one of the following forms:

Most lexical scopes are established by Block productions in the grammar. Lexical scopes nest, and a definition in an inner scope can shadow definitions in outer ones.

In the example below the comments indicate the scope and visibility of each definition:

var a0;                 // Package-visible global variable
private var a1 = true;  // Package-visible global variable
package var a2;         // Package-visible global variable
public var a3;          // Public global variable

if (a1) {
  var b0;               // Local to this block
  private var b1;       // Package-visible global variable
  package var b2;       // Package-visible global variable
  public var b3;        // Public global variable
}

public function F() {   // Public global function
  var c0;               // Local to this function
  private var c1;       // Package-visible global variable
  package var c2;       // Package-visible global variable
  public var c3;        // Public global variable
}

function G() {          // Package-visible global function
  var d0;               // Never defined because G isn't called
  private var d1;       // Never defined because G isn't called
  package var d2;       // Never defined because G isn't called
  public var d3;        // Never defined because G isn't called
}

class C {               // Package-visible global class
  var e0;               // Package-visible class variable
  private var e1;       // Class-visible class variable
  package var e2;       // Package-visible class variable
  public var e3;        // Public class variable
  field e4;             // Package-visible instance variable
  private field e5;     // Class-visible instance variable
  package field e6;     // Package-visible instance variable
  public field e7;      // Public instance variable

  function H() {        // Package-visible class function
    var f0;             // Local to this function
    private var f1;     // Class-visible class variable
    package var f2;     // Package-visible class variable
    public var f3;      // Public class variable
    private field f4;   // Class-visible instance variable
    package field f5;   // Package-visible instance variable
    public field f6;    // Public instance variable
  }
  public method I() {}  // Public class method

  H();
}

F();

Versioning Public Identifiers

A public definition's identifier is exported to other packages. To help avoid accidental collisions between identifiers declared in different packages, identifiers can be selectively exported depending on the version requested by an importing package. An identifier definition with a version number newer than that requested by the importer will not be seen by that importer. The versioning facilities also include additional facilities that allow robust removal and renaming of identifiers.

VersionsAndRenames describes the set of versions in which an identifier is exported, together with a possible alias for the identifier:

VersionsAndRenames 
   [VersionRange [: Identifier, ... , VersionRange [: Identifier]]
VersionRange 
   Version
|  [Version.. [Version]

Suppose a client package C imports version V of package P that exports identifier N with some VersionsAndRenames. If the VersionsAndRenames's VersionRange includes version V, then package C can use the corresponding Identifier alias to access package P's N. If the Identifier alias is omitted, then package C can use N to access package P's N. Multiple VersionRanges operate independently.

In most cases VersionsAndRenames is just a Version name (a string):

public "1.2" const z = 3;

If VersionsAndRenames is omitted, the default version "" is assumed.

Discussion

Scopes

Do we want to collapse all block scopes into one inside functions? On one hand this complicates the language conceptually and surprises Java and C++ programmers. On the other hand, this would match JavaScript 1.5 better and simplify closure creation when a closure is created nested inside several blocks in a function.

Visibilities

Should we make private illegal outside a class rather than making it equivalent to package?

Should we introduce a local Visibility prefix that explicitly means that the definition is visible locally? This wouldn't provide any additional functionality but it would provide a convenient name for talking about the four kinds of visibility prefixes.

What should the default visibilities be? The current defaults are loosely modeled after Java:

Definition Location Default visibility
Package top level package (equivalent to local in this case)
Inside a statement outside a function or class   local
Function or method code's top level local
Inside a statement inside a function or method   local
Class definition block's top level package
Inside a statement inside a class definition block   local

Should we have a protected Visibility? It has been omitted for now to keep the language simple, but there does not appear to be any fundamental reason why it could not be supported. If we do support it, should we choose the C++ protected concept (visible only in class and subclasses) or the Java protected concept (visible in class, subclasses, and the original class's package)?


JavaScript 2.0
Core Language
Variables
previousupnext

Thursday, November 11, 1999

Variable Definition

VariableDefinition  VariableDefinitionKind VariableBindingListallowIn
VariableDefinitionKind 
   var
|  const
VariableBindingListb 
   VariableBindingb
|  VariableBindingListb , VariableBindingb
VariableBindingb  TypedIdentifierb VariableInitializerb
TypedIdentifierb 
   Identifier
|  Identifier : TypeExpressionb
VariableInitializerb 
   «empty»
|  = AssignmentExpressionb

The general syntax for defining variables is:

VariableDefinition 
   [Visibilityvar Identifier [: TypeExpression] [= AssignmentExpression, ... , Identifier [: TypeExpression] [= AssignmentExpression;
|  [Visibilityconst Identifier [: TypeExpression= AssignmentExpression , ... , Identifier [: TypeExpression= AssignmentExpression ;

A variable defined with var can be modified, while one defined with const cannot. Identifier is the name of the variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression is evaluated at the time the variable definition is evaluated and should evaluate to a type t.

If provided, AssignmentExpression gives the variable's initial value v. If not, undefined is assumed; an error occurs if undefined cannot be coerced to type t. AssignmentExpression is evaluated just after the TypeExpression is evaluated. The value v is then coerced to the variable's type t and stored in the variable. If the variable is defined using var, any values subsequently assigned to the variable are also coerced to type t at the time of each such assignment.

Multiple variables separated by commas can be defined in the same VariableDefinition. The values of earlier variables are available in the TypeExpressions and AssignmentExpressions of later variables.

If omitted, TypeExpression defaults to type any. Thus, the definition

var a, b=3, c:integer=7, d, e:type=boolean, f:number, g:e, h:int;

is equivalent to:

var a:Any=undefined;
var b:Any=3;
var c:integer=7;
var d:integer=undefined;  // coerced to +0
var e:type=boolean;
var f:number=undefined;   // coerced to +0
var g:boolean=undefined;  // coerced to false
var h:int=undefined;      // coerced to int(0)

const Definitions

const means that Identifier cannot be written after it is defined. It does not mean that Identifier will have the same value the next time it is bound. For example, the following is legal; a new j binding is created each time through the loop:

var k = 0;
for (var i = 0; i < 10; i++) {
  local const j = i;
  k += j;
}

JavaScript 2.0
Core Language
Functions
previousupnext

Thursday, November 11, 1999

Function Definition

FunctionDefinition 
   NamedFunction
|  AccessorFunction
AnonymousFunction  function FunctionSignature Block
NamedFunction  function Identifier FunctionSignature Block
AccessorFunction 
   function get [no line break] Identifier FunctionSignature Block
|  function set [no line break] Identifier FunctionSignature Block
FunctionSignature  ParameterSignature ResultSignature
ParameterSignature  ( Parameters )
Parameters 
   «empty»
|  RestParameter
|  RequiredParameters
|  OptionalParameters
|  RequiredParameters , RestParameter
|  OptionalParameters , RestParameter
RequiredParameters 
   RequiredParameter
|  RequiredParameters , RequiredParameter
OptionalParameters 
   OptionalParameter
|  RequiredParameters , OptionalParameter
|  OptionalParameters , OptionalParameter
RequiredParameter  TypedIdentifierallowIn
OptionalParameter  TypedIdentifierallowIn = AssignmentExpressionallowIn
RestParameter 
   ...
|  ... TypedIdentifierallowIn
|  ... TypedIdentifierallowIn = AssignmentExpressionallowIn
ResultSignature 
   «empty»
|  : TypeExpressionallowIn

The rest of this page is slightly out of date

Function Definitions

To define a function we use the following syntax:

FunctionDefinition 
   [Visibilityfunction [get | setIdentifier ( Parameters ) [: TypeExpressionBlock

If Visibility is absent, the above declaration defines a local function within the current Block scope. If Visibility is present, the above declaration declares either a global function (if outside a ClassDefinition's Block) or a class function (if inside a ClassDefinition's Block) according to the declaration scope rules.

The function's result type is TypeExpression, which defaults to type Any if not given. If the function does not return a value, it's good practice to set TypeExpression to void to document this fact.

Block contains the function body and is evaluated only when the function is called.

Parameters

Parameters has one of the following forms:

Parameters 
   RequiredParameter , ... , RequiredParameter [, OptionalParameter ... , OptionalParameter] [, ... [Identifier]]
|  ... [Identifier]

If the ... is present, the function accepts more arguments than just the listed parameters. If an Identifier is given after the ..., then that Identifier is bound to an array of arguments given after the listed parameters. That Identifier is declared locally as though by the declaration const array Identifier.

Individual parameters have the forms:

RequiredParameter 
   Identifier [: TypeExpression]
OptionalParameter 
   Identifier [: TypeExpression= AssignmentExpression

TypeExpression gives the parameter's type and defaults to type Any. If the parameter name Identifier is followed by a =, then that parameter is optional. If the nth parameter is optional and a call to this function provides fewer than n arguments, then the nth parameter is set to the value of its AssignmentExpression, coerced to the nth parameter's type if necessary. The nth parameter's AssignmentExpression is evaluated only if fewer than n arguments are given in a call.

A RequiredParameter may not follow an OptionalParameter. If a function has n RequiredParameters and m OptionalParameters and no ... in its parameter list, then any call of that function must supply at least n arguments and at most n+m arguments. If this function has a ... in its parameter list, then any call of that function must supply at least n arguments. These restrictions do not apply to traditional functions.

The parameters' Identifiers are local variables with types given by the corresponding TypeExpressions inside the function's Block. Code in the Block may read and write these variables. Arguments are passed by value, so writes to these variables do not affect the passed arguments' values in the caller.

In addition to local variables generated by the parameters' Identifiers, each function also has a predefined arguments local variable which holds an array (of type const array) of all arguments passed to this function.

Evaluation Order

When a function is called, the following list indicates the order of evaluation of the various expressions in a FunctionDefinition. These steps are taken only after all of the arguments have been evaluated.

  1. Evaluate the first parameter's TypeExpression to obtain a type t.
  2. If the first parameter is optional and no argument has been supplied, evaluate the first parameter's AssignmentExpression and let it be the first parameter's value.
  3. Coerce the argument (or default) value to type t and bind the parameter's Identifier to the result.
  4. Repeat steps 1-3 for each additional parameter.
  5. If the FunctionDefinition's Parameters ends with a ... followed by an Identifier, bind that Identifier to an array comprised of the zero or more leftover arguments not already bound to a parameter.
  6. Evaluate the FunctionDefinition's result TypeExpression to obtain a result type r.
  7. Evaluate the body.
  8. Coerce the result to type r and return it.

Note that later TypeExpressions and AssignmentExpressions can refer to previously bound arguments. Thus, the following is legal:

function choice(boolean a, type b, b c, b d=) b {
  return a ? c : d;
}

The call choice(true,integer,8,4) would return 8, while choice(false,integer,6) would return 0 (undefined coerced to type integer).

Relationship to Methods and Classes

Unless the function is a traditional function, the function definition using the above syntax does not define a class; the function's name cannot be used in a new expression, and the function does not have a this parameter. Any attempt to use this inside the function's body is an error. To define a method that can access this, use the method keyword.

If a FunctionDefinition is located at a class scope (either because it is located the top level of a ClassDefinition's Block or it has a Visibility prefix and is located inside a ClassDefinition's Block), then the function is a static method of the class. Unlike C++ or Java, JavaScript 2.0 does not use the static keyword to indicate such functions; instead, instance methods (i.e. non-static methods) are defined using the method keyword.

Getters and Setters

If a FunctionDefinition contains the keyword get or set, then the defined function is a getter or a setter.

A getter must not take any parameters and cannot have a ... in its Parameters list. Unlike an ordinary function, a getter is invoked by merely mentioning its name without an Arguments list in any expression except as the destination of an assignment. For example, the following code returns the string <2,3,1>:

var x:integer = 0;
function get serialNumber():integer {return ++x}

var y = serialNumber;
return "<" + serialNumber + "," + serialNumber + "," + y + ">";

A setter must take exactly one required parameter and cannot have a ... in its Parameters list. Unlike an ordinary function, a setter is invoked by merely mentioning its name (without an Arguments list) on the left side of an assignment or as the target of a mutator such as ++ or --. The result of the setter becomes the result of the assignment. For example, the following code returns the string <1,2,43>:

var x:integer = 0;
function get serialNumber():integer {return ++x}
function set serialNumber(n:integer):integer {return x=n}

var s = "<" + serialNumber + "," + serialNumber;
serialNumber = 42;
return s + "," + serialNumber + ">";

A setter can have the same name as a getter in the same lexical scope. A getter or setter cannot be extracted from its variable, so the notion of the type of a getter or setter is vacuous; a getter or setter can only be called.

Contrast the following:

var x:integer = 0;
function f():integer {return ++x}
function g():Function {return f}
function get h():Function {return f}

f;     // Evaluates to function f
g;     // Evaluates to function g
h;     // Evaluates to function f (not h)
f();   // Evaluates to 1
g();   // Evaluates to function f
h();   // Evaluates to 2
g()(); // Evaluates to 3

We can use a getter and a setter to create an alias to another variable, as in:

function get myAlias() {return Pkg::var}
function set myAlias(x) {return Pkg::var = x}

myAlias = myAlias+4;

Traditional Functions

Traditional function definitions are provided for compatibility with JavaScript 1.5. The syntax is as follows:

TraditionalFunctionDefinition 
   [Visibilitytraditional function Identifier ( Identifier , ... , Identifier ) Block

A function declared with the traditional keyword cannot have any argument or result type declarations, optional arguments, or getter or setter keyword. Such a function is treated as though every argument were optional and more arguments than just the listed ones were allowed. Thus, the definition

traditional function Identifier ( Identifier , ... , Identifier ) Block

behaves like the following function definition:

function Identifier ( Identifier = , ... , Identifier = , ... ) Block

Furthermore, a traditional function defines its own class and treats this in the same manner as JavaScript 1.5.

Functions in Expressions

Every function (except a getter or a setter) is also a value and has type Function. Like other values, it can be stored in a variable, passed as an argument, and returned as a result. The identifiers in a function are all lexically scoped.

Function Expressions

We can use a variant of a function definition to define a function inside an expression. The syntax is:

FunctionExpression 
   function [Identifier( Parameters ) [: TypeExpressionBlock

This expression defines a function and returns it as a value of type Function. The function can be named by providing the Identifier, but this name is only accessible from inside the function's Block.

To avoid confusion between a FunctionDefinition and a FunctionExpression, a Statement (and a few other grammar nonterminals) may not begin with a FunctionExpression. To place a FunctionExpression at the beginning of a Statement, enclose it in parentheses.

A FunctionDefinition is merely convenient syntax for a const variable definition and a FunctionExpression:

[Visibilityfunction Identifier ( Parameters ) [: TypeExpressionBlock

is equivalent to:

[Visibilityconst Identifier : Function = function Identifier ( Parameters ) [: TypeExpressionBlock ;

Function Calls

Unless a function is a getter or a setter, we call that function by listing its arguments in parentheses after the function expression, just as in JavaScript 1.5:

FullPostfixExpression 
   FullPostfixExpression ( AssignmentExpression , ... , AssignmentExpression )
|  other postfix expressions

Discussion

Getters and Setters

By consensus in the ECMA TC39 modularity subcommittee, we decided to use the above syntax for getters and setters instead of:

FunctionDefinition 
   [Visibility] [getter | setterfunction Identifier ( Parameters ) [: TypeExpressionBlock
|  TraditionalFunctionDefinition

The decision was based on aesthetics; neither syntax is more difficult to implement than the other.

Optional Parameters

Do we want to have a named rest parameter (as in the proposal above), or only support the arguments special local variable as in JavaScript 1.5? The main difference is in the handling of fixed arguments -- they must be added to the arguments array but can be omitted from the rest array.

Traditional Functions

The traditional keyword is ugly, so let's take a look at some alternatives. Unless we want to continue to make each function into a class (as JavaScript 1.5 does), we need some way to indicate which functions are also classes and which ones are not. Also, we'd like to be able to indicate which functions can be called with more or fewer than the desired number of arguments and which cannot.

One possibility would be to state that any function that uses a type annotation in its signature (either the parameter list or the result type) is a new-style function and does not define a class; other functions would declare classes. Furthermore, new-style functions would have to be called with the exact number of arguments unless some parameters are optional or a ... is present in the parameter list. These are analogous to the rules that ANSI C used to distinguish new-style functions from traditional C functions. As with ANSI C, we have somewhat of a difficulty with functions that take no parameters; such functions would need to specify a return type to be considered new-style.

C++ did away with the ANSI C treatment of traditional C functions. We could do the same by having a pragma (analogous to Perl's use pragmas) that could indicate that all functions are to be considered new-style unless prefixed by the traditional keyword. If we do this, we should decide whether the default setting of this pragma would be on or off.


JavaScript 2.0
Core Language
Classes
previousupnext

Thursday, November 11, 1999

Class Member Definitions

MemberDefinitionw 
   FieldDefinition Semicolonw
|  MethodDefinitionw
|  ConstructorDefinition
FieldDefinition  field [no line break] VariableBindingListallowIn
MethodDefinitionw 
   ConcreteMethodDefinition
|  AbstractMethodDefinitionw
ConcreteMethodDefinition  MethodPrefix [no line break] MethodName FunctionSignature Block
AbstractMethodDefinitionw  MethodPrefix [no line break] MethodName FunctionSignature Semicolonw
MethodPrefix 
   method
|  override [no line break] method
|  final [no line break] method
|  final [no line break] override [no line break] method
MethodName 
   Identifier
|  get [no line break] Identifier
|  set [no line break] Identifier
ConstructorDefinition  constructor [no line break] ConstructorName ParameterSignature Block
ConstructorName 
   new
|  Identifier

Class Definition

ClassDefinition 
   class Identifier Superclasses Block
|  class extends TypeExpressionallowIn Block
Superclasses 
   «empty»
|  extends TypeExpressionallowIn

This page is out of date

Class Definitions

In JavaScript 2.0 we define classes using the class keyword. Limited classes can also be defined via JavaScript 1.5-style functions, but doing so is discouraged for new code.

ClassDefinition 
   [Visibilityclass Identifier [extends TypeExpressionBlock
|  [Visibilityclass extends TypeExpression Block

The first format declares a class with the name Identifier, binding Identifier to this class in the scope specified by the Visibility prefix (which usually includes the ClassDefinition's Block). Identifier is a constant variable with type type and can be used anywhere a type expression is allowed.

When the first ClassDefinition format is evaluated, the following steps take place:

  1. A new type t is created.
  2. If extends TypeExpression is given, TypeExpression is evaluated to obtain a type s, which must be another class. If extends TypeExpression is absent, type s defaults to the class Object.
  3. Type t is made a subtype of type s.
  4. Identifier is lexically bound in the scope given by Visibility; however, at this time Identifier does not have a legal type yet and any attempt to read or write it results in an error.
  5. Block is evaluated.
  6. If Block is evaluated successfully (without throwing out an exception), all const, var, function, constructor, and class declarations evaluated at its top level (or placed at its top level by the scope rules) become class members of type t. All field and method declarations evaluated at the Block's top level (or placed at its top level by the scope rules) become instance members of type t.
  7. The value of Identifier becomes type t. From now on Identifier is a constant and its value cannot be altered.

A ClassDefinition's Block is evaluated just like any other Block, so it can contain expressions, statements, loops, etc. Such statements that do not contain declarations do not contribute members to the class being declared, but they are evaluated when the class is declared.

Class Extensions

If a ClassDefinition omits the class name Identifier, it extends the original class rather than creating a subclass. A class extension may define new methods and class constants and variables, but it does not have special privileges in accessing the original class definition's private members (or package members if in a separate package). A class extension may not override methods, and it may not define constructors or instance variables.

Each instance of the original class is automatically also an instance of the extended class. Several extensions can apply to the same class.

An extension is useful to add methods to system classes, as in the following code in some user package P:

class extends string {
  public method scramble() string {...}
  public method unscramble() string {...}
}

var x = "abc".scramble();

Once the class extension is evaluated, methods scramble and unscramble become available on all strings. There is no possibility of name clashes with extensions of class string in other, unrelated packages because the names scramble and unscramble belong to package P and not the system package that defines string. Any packages that import package P will also be able to call scramble and unscramble on strings, but other packages will not.

Members

A class has an associated set of class members and another set of instance members. Class members are properties of the class itself, while instance members are properties of each instance object of this class and have independent values for different instance objects.

Class members are one of the following:

Instance members are one of the following:

Members can only be defined within the intersection of the lexical and dynamic extent of a ClassDefinition's Block. A few examples illustrate this rule.

The code

var bool extended = false;

function callIt(x) {return x()}

class C {
  extended = true;
  public function square(integer x) integer {return x*x}
  if (extended) {
    public function cube(integer x) integer {return x*x*x}
  } else {
    public function reciprocal(number x) number {return 1/x}
  }

  field string firstName, lastName;
  method name() string {return firstName + lastName}

  public function genMethod(boolean b) {
    if (b) {
      public field time = 0;
    } else {
      public field date = 0;
    }
  }

  genMethod(true);
}

defines class C with members square (a class function), cube (a class function), firstName (an instance variable), lastName (an instance variable), name (an instance method), and genMethod (a class function).

On the other hand, executing the following code after the above example would be illegal due to three different errors:

genMethod(false);   // Field date declared outside of C's block's dynamic extent

public field color; // Field declared outside a class's block

function genField() {
  public field style;
}

class D {
  genField();       // Field style declared outside D's block's lexical extent
}

Visibility

While a ClassDefinition's Block is being evaluated, the already defined class members (other than constructors) are visible and usable by the code in that Block. Afterwards members can be accessed in one of several ways:

Inheritance

A subclass inherits all members except constructors from its superclass. Class variables have only one global value, not one value per subclass. A subclass may override visible methods, but it may not override or shadow any other visible members. On the other hand, imports and versioning can hide members' names from some or all users in importing packages, including subclasses in importing packages.

Member Definitions

We have already seen the definition syntax for variables and constants, functions, and classes. Any of these defined at a ClassDefinition's Block's top level (or placed at its top level by the scope rules) become class members of the class.

Fields, methods, and constructor definitions have their own syntax described below. These definitions must be lexically enclosed by a ClassDefinition's Block.

MemberDefinition 
   FieldDefinition
|  MethodDefinition
|  ConstructorDefinition

Field Definitions

FieldDefinition 
   [Visibilityfield Identifier [: TypeExpression] [= AssignmentExpression, ... , Identifier [: TypeExpression] [= AssignmentExpression;

A FieldDefinition is similar to a VariableDefinition except that it defines an instance variable of the lexically enclosing class. Each new instance of the class contains a new, independent set of instance variables initialized to the values given by the AssignmentExpressions in the FieldDefinition.

Identifier is the name of the instance variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression is evaluated at the time the variable definition is evaluated and should evaluate to a type t. The TypeExpressions and AssignmentExpressions are evaluated once, at the time the FieldDefinition is evaluated, rather than every time an instance of the class is constructed; their values are saved for use in constructors.

If omitted, TypeExpression defaults to type any.

If provided, AssignmentExpression gives the instance variable's initial value v. If not, undefined is assumed; an error occurs if undefined cannot be coerced to type t. AssignmentExpression is evaluated just after the TypeExpression is evaluated. The value v is then coerced to the variable's type t and stored in the instance variable. Any values subsequently assigned to the instance variable are also coerced to type t at the time of each such assignment.

Multiple instance variables separated by commas can be defined in the same FieldDefinition.

A field cannot be overridden in a subclass.

Method Definitions

MethodDefinition 
   [Visibility] [final] [overridemethod [get | setIdentifier ( Parameters ) [: TypeExpressionBlock
|  [Visibility] [final] [overridemethod [get | setIdentifier ( Parameters ) [: TypeExpression;

A MethodDefinition is similar to a FunctionDefinition except that it defines an instance method of the lexically enclosing class. Parameters, the result TypeExpression, and the body Block behave just like for function definitions, with the following differences:

We call a regular method by combining the . operator with a function call. For example:

class C {
  field x:integer = 3;
  method m() {return x}
  method n(x) {return x+4}
}

var c = new C;
c.m();                 //
returns 3
c.n(7);                //
returns 11
var f:Function = c.m;  //
f is a zero-argument function with this bound to c
f();                   //
returns 3
c.x = 8;
f();                   //
returns 8

Method Overriding

A class c may override a method m defined in its superclass s. To do this, c should define a method m' with the same name as m and use the override keyword in the definition of m'. Overriding a method without using the override keyword or using the override keyword when not overriding a method results in a warning intended to catch misspelled method names. The warning is not an error to allow subclass c to either define a method if it is not present in s or override it if it is present in s -- this situation can arise when s is imported from a different package and provides several versions.

The overriding method m' does not have to have the same number or type of parameters as the overridden method m. In fact, since parameter types can be arbitrary expressions and are evaluated only during a call, checking for parameter type compatibility when the overriding method m is declared would require solving the halting problem. Moreover, defining overriding methods that are more general than overridden methods is useful.

A method defined with the final keyword cannot be overridden (or further overridden) in subclasses.

Getter and Setter Methods

If a MethodDefinition contains the keyword get or set, then the defined method is a getter or a setter. These are analogous to getter and setter functions in that they are invoked without listing the parentheses after the method name.

A getter or setter method cannot be overridden. We could relax this restriction, but then we'd also have to allow overriding of fields by getters, setters, or other fields, and, as a corollary, allow fields to be declared final.

Constructor Definitions

ConstructorDefinition 
   [Visibilityconstructor Identifier ( Parameters ) Block

A constructor is a class function that creates a new instance of the lexically enclosing class c. A constructor's body Block is required to call one of c's superclass's constructors (when and how?). Afterwards it may access the instance object under construction via the this local variable. A constructor should not return a value with a return statement; the newly created object is returned automatically.

A constructor can have any non-reserved name, in which case we would invoke it as though it were a class function. In addition, a constructor's Identifier can have the special name new, in which case we invoke it using the new prefix operator syntax as in JavaScript 1.5.


JavaScript 2.0
Core Language
Packages
previousupnext

Thursday, November 11, 1999

This page is out of date

Overview

Packages are an abstraction mechanism for grouping and distributing related code. Packages are designed to be linked at run time to allow a program to take advantage of packages written elsewhere or provided by the embedding environment. JavaScript 2.0 offers a number of facilities to make packages robust for dynamic linking:

A package is a file (or analogous container) of JavaScript 2.0 code. There is no specific JavaScript statement that introduces or names a package -- every file is presumed to be a package. A package itself has no name, but it has a specific URI by which other packages can import it.

A package P typically starts with import statements that import other packages used by package P. A package that is meant to be used by other packages typically has one or more version declarations that declare versions available for export.

Package Loading

A package's body is described by the Program grammar nonterminal. A package is loaded (its body is evaluated) when the package is first imported or invoked directly (if, for example, the package is on an HTML web page). Some standard packages may also be loaded when the JavaScript engine first starts up.

Two attempts to load the same package in the same environment result in sharing of that package. What constitutes an environment is necessarily application-dependent. However, if package P1 loads packages P2 and P3, both of which load package P4, then P4 is loaded only once and thereafter its code and data is shared by P2 and P3.

When a package is loaded, all of its statements are evaluated in order, which may cause other packages to be loaded along the way when import statements are encountered. A package's symbols are available for export to other packages only after the package's body has been successfully evaluated. Unlike in Java, circularities are not allowed in the graph of package imports.

To create packages A and B that access each others' symbols, we need to instead define a hidden package C that consists of all of the code that would have gone into A and B. Package C should define versions verA and verB and tag the symbols it exports with either verA or verB to indicate whether these symbols belong in package A or B. Package A should then be empty except for a directive (or several directives if there are multiple versions of A and verA) that reexports C's symbols tagged with verA. Similarly, package B should reexport C's symbols tagged with verB. To make this work we need a reexport directive. Is this really necessary? Also, do we want a mechanism for hiding package C from general view so that users can only use it through A or B?

Exports

We can export a symbol in a package by giving it public Visibility.

Imports

To import symbols from a package we use the import statement:

ImportStatement 
   import ImportList ;
|  import ImportList Block
|  import ImportList Block else CodeStatement
ImportList 
   ImportItem , ... , ImportItem
ImportItem 
   [[protectedIdentifier =NonAssignmentExpression [: Version]

The first form of the import statement (without a Block) imports symbols into the current lexical scope. The second and third forms import symbols into the lexical scope of the Block. If the imports are unsuccessful, the first two forms of the import statement throw an exception, while the last form executes the CodeStatement after the else keyword.

An import statement can import one or more packages separated by commas. Each ImportItem specifies one package to be imported. The NonAssignmentExpression should evaluate to a string that contains a URI where the package may be found. If present, Version indicates the version of the package's exports to be imported; if not present, Version defaults to version 1.

An ImportItem can introduce a name for the imported package if the NonAssignmentExpression is preceded by Identifier =. Identifier becomes bound (either in the current lexical scope or in the Block's scope) to the imported package as a whole. Individual symbols can be extracted from the package by using Identifier with the :: operator. For example, if package at URI P has public symbols a and b, then after the statement

import x=P;

P's symbols can be referenced as either a, b, x::a, or x::b.

If an ImportItem contains the keyword protected, then the imported symbols can only be accessed using the :: operator. If we were to import package P using

import protected x=P;

then we'd have to access P's symbols using either x::a or x::b.

If two imports in the same scope import packages with clashing symbols, then neither symbol is accessible unless qualified using the :: operator. If an imported symbol clashes with a symbol declared in the same scope, then the declared symbol shadows the imported symbol. Scope rules 3 and 4 apply here as well, so the following code is illegal because a is referenced and then redefined:

import x=P;
var y=a;     //
References P's a
const a=17;  //
Redefines a in same scope

Version names cannot be imported.

Discussion

Package Names

Do we want to use URIs to locate packages, or do we want to invent our own, separate mechanism to do this?

Visibilities

Should we make private illegal outside a class rather than making it equivalent to package?

Should we introduce a local Visibility prefix that explicitly means that the declaration is visible locally? This wouldn't provide any additional functionality but it would provide a convenient name for talking about the four kinds of visibility prefixes.

What should the default visibilities be? The current defaults are loosely modeled after Java:

Definition Location Default visibility
Package top level package (equivalent to local in this case)
Inside a statement outside a function or class   local
Function or method code's top level local
Inside a statement inside a function or method   local
Class declaration block's top level package
Inside a statement inside a class declaration block   local

JavaScript 2.0
Core Language
Language Declarations
previousupnext

Thursday, November 11, 1999

Language declarations allow a script writer to select the language to use for a script or a particular section of a script. A language denotes either a major language such as JavaScript 2.0 or a variation such as strict mode.

Developers often find it desirable to be able to write a single script that takes advantage of the latest features in a host environment such as a browser while at the same time working in older host environments that do not support these features. JavaScript 2.0's language declarations enable one to easily write such scripts. One may still need to use techniques such as the LANGUAGE HTML attribute to support pre-JavaScript 2.0 environments, but at least the number of such environments that will need to be special-cased will not increase in the future.

Language declarations are a dual of versioning: language declarations let a script run under a variety of historical hosts, while versioning lets a host run a variety of historical scripts.

Syntax

LanguageDeclarationw  language LanguageId LanguageIdList LanguageAlternatives LanguageSemicolonw
LanguageSemicolonabbrev 
   ;
|  «empty»
LanguageSemicolonabbrevNonEmpty 
   ;
|  «empty»
LanguageSemicolonfull  ;
LanguageAlternatives 
   «empty»
|  LanguageAlternatives | LanguageIdList
LanguageIdList 
   «empty»
|  LanguageIdList LanguageId
LanguageId 
   Identifier
|  Number

A language declaration uses the syntax above. The keyword language is followed by one or more language alternatives separated by vertical bars. Each language alternative consists of zero or more LanguageIds, which are either identifiers or numbers. The first language alternative must contain at least one LanguageId. The semicolon at the end of the LanguageDeclaration cannot be inserted by line-break semicolon insertion.

When a JavaScript environment is lexing and parsing a JavaScript program and it encounters a language declaration, it checks whether any of the language alternatives can be satisfied. If at least one can, the environment picks the first language alternative that can be satisfied and processes the rest of the containing block (until the closing } or until the end of the program if at the top level) using that language. A subsequent language declaration in the same block can further change the language.

If no language alternatives can be satisfied, then the JavaScript environment skips to the end of the containing block (until the closing matching } or until the end of the program if at the top level). Further language declarations in the same block are ignored. No error occurs unless the failing language declaration is executed as a statement, in which case it throws a syntax error. [See rationale for a discussion of some of the issues here.]

The following LanguageIds are currently defined:

LanguageId   Language
1.0 JavaScript 1.0
1.1 JavaScript 1.1
1.2 JavaScript 1.2
1.3 JavaScript 1.3
1.4 JavaScript 1.4
1.5 JavaScript 1.5 (ECMAScript Edition 3)
2.0 JavaScript 2.0
strict Strict mode
traditional Traditional mode (default)

It is meaningless to combine two or more numeric LanguageIds in the same alternative:

language 1.0 2.0;

will always fail. On the other hand, it is meaningful and useful to separate them with vertical bars. For example, one can indicate that one prefers JavaScript 2.1 but is willing to accept JavaScript 2.0 if 2.1 is not available:

language 2.1 | 2.0;

An empty alternative will always succeed. One can use it to indicate a preference for strict mode but willingness to work without it:

language strict |;

Language declarations are always lexically scoped and never extend past the end of the enclosing block.

This document specifies the 2.0 language and its strict and traditional modes. The consequences of mixing in other languages are implementation-defined, but implementations are encouraged to do something reasonable.

Strict Mode

Many parts of JavaScript 2.0 are relaxed or unduly convoluted due to compatibility requirements with JavaScript 1.5. Strict mode sacrifices some of this compatibility for simplicity and additional error checking. Strict mode is intended to be used in newly written JavaScript 2.0 programs, although existing JavaScript 1.5 programs may be retrofitted.

The opposite of strict mode is traditional mode, which is the default. A program can readily mix strict and traditional portions.

Strict mode has the following effects:

See also rationale.


JavaScript 2.0
Libraries
previousupnext

Thursday, November 11, 1999


This chapter presents the libraries that accompany the core language.

For the time being, only the libraries new to JavaScript 2.0 are described. The basic libraries such as String, Array, etc. carry over from JavaScript 1.5.


JavaScript 2.0
Libraries
Types
previousupnext

Thursday, November 11, 1999

Predefined Types

The following types are predefined in JavaScript 2.0:

Type Set of Values
void undefined
Null null
boolean   true and false
integer Double-precision IEEE floating-point numbers that are mathematical integers, including positive and negative zeroes but excluding infinities and NaN
number Double-precision IEEE floating-point numbers, including positive and negative zeroes and infinities and NaN
character   Single 16-bit unicode characters
string Immutable strings of unicode characters
Function All functions and null
array All arrays
Array All arrays and null
type All types
Type All types and null
object All values except undefined and null
Object All values except undefined
Any All values

By convention, predefined types whose names start with an upper-case letter include the value null, while predefined types whose names start with a lower-case letter do not include null. User-defined type names do not have to follow this convention.

Unlike in JavaScript 1.5, there is no distinction between objects and primitive values. All values can have methods. Some values can be sealed, which disallows addition of ad-hoc properties. User-defined classes can be made to behave like primitives.

The above type names are not reserved words. They are considered to be defined in a scope that encloses a package's global scope, so a package could use these type names as identifiers. However, defining these identifiers for other uses might be confusing because it would shadow the corresponding type names (the types themselves would continue to exist, but they could not be accessed by name).

The names Boolean, Number, and String have been deliberately left unused to enable implementations to use them to emulate the behavior of the JavaScript 1.5 Boolean, Number, and String wrapper objects. These are not part of JavaScript 2.0, but an implementation may support them for compatibility.

The name function could not be used to mean "all functions" because it is a reserved word. Use Function^* instead.

Literals

A literal number that has an integral value has type integer; otherwise it has type number. integer is a subtype of number, so every integer value is also a number value. A literal string that has exactly one 16-bit unicode character has type character; otherwise it has type string. character is a subtype of string, so every character value is also a string value.

User-Defined Types

Any class defined using the class declaration is also a type that denotes the set of all of its and its descendants' instances. These include the predefined classes, so Object, Date, etc. are all types. null is an instance of a user-defined class c if it is an instance of any of c's superclasses.

Compound Types

We can use the following operators to construct more complex types. t and u are type expressions in the expressions below.

Type   Values
t | * null and all values of type t
t ^ * All values of type t except null
t | ? undefined and all values of type t
t ^ ? All values of type t except undefined
t | u All values belonging to either type t or type u or both
t & u All values simultaneously belonging to both type t and type u

The language does not syntactically distinguish type expressions from value expressions, so a type expression can also use any other value operators such as !, +, and . (member access). Except for parentheses, most of them are not very useful, though.

Subtyping

We write a b to denote that a is a subtype of b. Subtyping is transitive, so if a b and b c then a c is also true. Subtyping is also reflexive: a a.

The following subtype and type equivalence relations hold. t, u, and v represent arbitrary types.

t t | u t & u t
t | t = t t & t = t
t | u = u | t t & u = u & t
(t | u) | v = t | (u | v) (t & u) & v = t & (u & v)
t | * = t | Null t | ? = t | void
integer number object character string object
boolean object array object
type object
Array = array | Null Type = type | Null
Object = object | Null
t Any

We write v t to indicate that v is a value that is a member of type t. The following subtyping rule holds: if v t and t s, then v s holds as well. Any particular value v is simultaneously a member of many types.

Meaning of Types

Types are generally used to restrict the set of objects that can be held in a variable or passed as a function argument. For example, the declaration

var integer x;

restricts the values that can be held in variable x to be integers.

A type declaration never affects the semantics of reading the variable or accessing one of its members. Thus, as long as expression new MyType() returns a value of type MyType, the following two code snippets are equivalent:

var MyType x = new MyType();
x.foo();
var x = new MyType();
x.foo();

This equivalence always holds, even if these snippets are inside the declaration of class MyType and foo is a private field of that class. As a corollary, adding true type annotations does not change the meaning of a program.

Type Expressions

A type is also a value (whose type is type) and can be used in expressions, assigned to variables, passed to functions, etc. For example, the code

const type Z = integer;
function abs_val(Z i) Z {
  return i<0 ? -i : i;
}

is equivalent to:

function abs_val(integer i) integer {
  return i<0 ? -i : i;
}

As another example, the following method takes a type and returns an instance of that type:

method QueryInterface(type t) t { ... }

Coercions

Coercions can take place in the following situations:

In any of these cases, if v t, then v is passed unchanged. If v t, then an error occurs unless v is undefined, in which case the following coercions are tried, in order:

  1. If Null t, then null is used instead of undefined.
  2. If boolean t, then false is used instead of undefined.
  3. If integer t, then +0.0 is used instead of undefined.
  4. If string t, then "" is used instead of undefined.

If none of the coercions succeeds, an error occurs.

Some types such as machine integers define additional coercions. These are listed along with descriptions of these types.

@ Operator

One can explicitly request a coercion in an expression by using the @ operator. This operator has the same precedence as . and coerces its left operand to the right operand, which must be a type. ... v@t ... can be used in an expression and has the same effect as:

function coerce_to_t(t a) t {return a}   // Declared at the top level

... coerce_to_t(v) ...

assuming that coerce_to_t is an identifier not used anywhere else. The @ operator is useful as a type assertion as in w@Window. It's a postfix operator to simplify cascading expressions:

w@Window.child@Window.pos

is equivalent to:

(((w@Window).child)@Window).pos

Type Casts

A type cast performs more aggressive transformations than a type coercion. To cast a value to a given type, we use the type as a function, passing it the value as an argument:

type(value)

For example, integer(258.1) returns the integer 258, and string(2+2==4) returns the string "true".

Need to specify the semantics of type casts. They are intended to mimic the current ToNumber, ToString, etc. methods.

Discussion

Colon Syntax

Would we rather have the colon syntax for declaring types? Two sample declarations would be:

var x:integer = 7;
function f(a:integer, b:Object):number {...}

A few considerations:

Type Expressions

Do we want to make type expressions have a distinct syntax from value expressions? I have not heard any "pro" arguments. Here are the "con" arguments:


JavaScript 2.0
Libraries
Versions
previousupnext

Thursday, November 11, 1999

Motivation

As a package evolves over time it often becomes necessary to change its exported interface. Most of these changes involve adding symbols (global and class members), although occasionally a symbol may be deleted or renamed. In a monolithic environment where all JavaScript source code comes preassembled from the same source, this is not a problem. On the other hand, if packages are dynamically linked from several sources then versioning problems are likely to arise.

One of the most common avoidable problems is collision of symbols. Unless we solve this problem, an author of a library will not be able to add even one symbol in a future version of his library because that symbol could already be in use by some client or some other library that a client also links with. This problem occurs both in the global namespace and in the namespaces within classes from which clients are allowed to inherit.

Example

Here's an example of how such a collision can arise. Suppose that a library provider creates a library called BitTracker that exports a class Data. This library becomes so successful that it is bundled with all web browsers produced by the BrowsersRUs company:

package BitTracker;

public class Data {
  public field author;
  public field contents;
  function save() {...}
};

function store(d) {
  ...
  storeOnFastDisk(d);
}

Now someone else writes a web page W that takes advantage of BitTracker. The class Picture derives from Data and adds, among other things, a method called size that returns the dimensions of the picture:

import BitTracker;

class Picture extends Data {
  public method size() {...}
  field palette;
};

function orientation(d) {
  if (d.size().h >= d.size().v)
    return "Landscape";
  else
    return "Portrait";
}

The author of the BitTracker library, who hasn't seen W, decides in response to customer requests to add a method called size that returns the number of bytes of data in a Data object. He then releases the new and improved BitTracker library. BrowsersRUs includes this library with its latest NavigatorForInternetComputing 17.0 browser:

package BitTracker;

public class Data {
  public field author;
  public field contents;
  public method size() {...}
  function save() {...}
};

function store(d) {
  ...
  if (d.size() > limit)
    storeOnSlowDisk(d);
  else
    storeOnFastDisk(d);
}

An unsuspecting user U upgrades his old BrowsersRUs browser to the latest NavigatorForInternetComputing 17.0 browser and a week later is dismayed to find that page W doesn't work anymore. U's granddaughter Alyssa P. Hacker tries to explain to U that he's experiencing a name conflict on the size methods, but U has no idea what she is talking about. U attempts to contact the author of W, but she has moved on to other pursuits and is on a self-discovery mission to sub-Saharan Africa. Now U is steaming at BrowsersRUs, which in turn is pointing its finger at the author of BitTracker.

Solutions

How could the author of BitTracker have avoided this problem? Simply choosing a name other than size wouldn't work, because there could be some other page W2 that conflicts with the new name. There are several possible approaches:

The last approach appears to be the most desirable because it places the smallest burden on casual users of the language, who merely have to import the packages they use and supply the current version numbers in the import statements. A package author has to be careful not to disturb the set of visible prior-version symbols when releasing an updated package, but authors of dynamically linkable packages are assumed to be more sophisticated users of the language and could be supplied with tools to automatically check updated packages' consistency.

Overview

The versioning system in JavaScript 2.0 only affects exports of symbols. The concept of a version does not apply to a package's internal code; it is up to package developers to ensure that newer releases of their packages continue to behave compatibly with older ones.

Terminology

A version describes the API of a package. A release refers to the entirety of a package, including its code. One release can export many versions of its API. A package developer should make sure that multiple releases of a package that export version V export exactly the same set of symbols in version V.

Example

As an example, suppose that a developer wrote a sorting package P with functions sort and merge that called bubble sort in version "1.0". In the next release the developer adds a function called stablesort and includes it in version "2.0". In a subsequent release the developer changes the sort algorithm to a quicksort that calls stablesort as a subroutine. That last release of the package might look like:

const V1_0 = new Version("1.0","");       // The "" makes version "1.0" be the default
const V2_0 = new Version("2.0","1.0");

public var serialNumber;

public function sort(compare: Function, array: any[]):any[] {...}
public function merge(compare: Function, array1: any[], array2: any[]):any[] {...}
V2_0 function stablesort(compare: Function, array: any[]):any[] {...}

Suppose, further, that client package C1 imports version "1.0" of P, client package C2 simultaneously imports version "2.0" of P, and a search for P yields the latest release described above. There would be only one instance of P running -- the latest release. Both clients would get the same sort and merge functions, and both would see the same serialNumber variable (in particular, if client C1 wrote to serialNumber, then client C2 would see the updated value), but only client package C2 would see the stablesort function. Both clients would get the quicksort release of sort. If client package C1 defined its own stablesort function, then that function would not conflict with P's stablesort; furthermore, P's sort would still refer to P's stablesort in its internal subroutine call.

Had only the first release of P been available, client package C2 would obtain an error because version 2 of P's API would not be available. Client C1 could run normally, although the sort function it calls would use bubble sort instead of the quicksort.

Note that the last release of P did not change the API so it did not need a new version. Of course, it could define a new version if for some reason it wanted clients to be able to demand the last release of P even though its API is the same as the second release.

The remainder of this page is out of date. Versions are now created using ordinary object calls on a versioning library.

Version Declarations

Version Names

A version name Version is a quoted string literal such as "1.2" or "Private Interface 2.0". Two version names are equal if their strings are equal. A special version whose name is the empty string "" is called the default version.

Declaration Syntax

A package must declare every version it uses except "", which is declared by default if not explicitly declared. A version must be declared before its first use. A given version name may be declared only once per package. A package declares a version name Version using the version declaration:

VersionDefinition 
   [Visibilityversion Version [> VersionList;
|  [Visibilityversion Version [= Version;
VersionList 
   Version , ... , Version

A version declaration cannot be nested inside a ClassDefinition's Block.

If Visibility is present, it must be either private, package, or public (without VersionsAndRenames). Unlike in other declarations, the default is public, which makes Version accessible by other packages. A private or package Visibility hides its Version from other packages; such a Version can be used only by being included in the VersionList of another Version. Also unlike other declarations, all Version declarations are global.

Version Ordering

If the Version being declared is followed by a > and a VersionList, then the Version is said to be greater than all of the Versions in the VersionList. We write v1 :> v2 to indicate that v1 is greater than v2 and v1 : v2 to indicate that either v1 and v2 are the same version or v1 :> v2. Order is transitive, which means that if v1 :> v2 and v2 :> v3, then v1 :> v3. This order induces a partial order on the set of all versions. It is possible for two versions to be unordered with respect to each other, in which case they are not equal and neither is greater than the other.

If the Version v1 being declared is followed by a = and another Version v2, then v1 becomes an alias for v2, and they may be used interchangeably.

Version Ranges

A VersionRange specifies a subset of all versions. This subset contains all versions that are both greater than or equal to a given Version1 and less than or equal to a given Version2. A VersionRange can have either of the following forms:

VersionRange 
   Version
|  [Version1.. [Version2]

The first form specifies the one-element set {Version}. The second form specifies the set of all Versions v such that v : Version1 and Version2 : v. If Version1 is omitted, the condition v : Version1 is dropped. If Version2 is omitted, the condition Version2 : v is dropped.

Discussion

Version Numbers 1

The original version of this specification allowed both strings and numbers as Version names. Two version names were equal if their toString representations were identical, so version names 2.0 and "2" were identical but 2.0 and "2.0" were not. In addition, numbered versions had an implicit order: For any two versions v1 and v2 whose names could be represented as numbers, v1 :> v2 if and only if v1 was numerically greater than v2. Additionally, every version except 0 was greater than version 0. It was an error to define explicit version containment relations that would violate this default order, directly or indirectly.

Numbered Version names were dropped for simplicity and to avoid confusion with versions such as 1.2.3 (which would be a syntax error unless quoted).

Version Numbers 2

Another, simpler, approach is to require all Version names to be nonnegative integers (without quotes). Versions would not need to be declared, and all versions would be totally ordered in numerical order. A disadvantage of this approach is that the total order keeps versions from being branched.

Dynamic Version Definitions

Currently version definitions are fixed. These could be turned into function calls that define versions and list their relationships. If we can get a variable or constant to hold a set of version names, then we could use these variables rather than specific version names in the VersionsAndRenames lists after public keywords. This would provide another level of abstraction and flexibility.

Separate Version Definitions

Yet another approach is to consolidate all of the information in VersionsAndRenames into a set of export statements, say, at the top of the file rather than being interspersed throughout a package along with public declarations. This would make it easier to see all of the identifiers exported by a particular version of the package, but it would also likely lead to inconsistencies when someone forgets to update an export statement after inserting another variable, function, field, or method definition. Such errors would likely be caught after a package has been released.


JavaScript 2.0
Libraries
Machine Types
previousupnext

Thursday, November 11, 1999

Purpose

The machine types library is an optional library that provides additional low-level types for use in JavaScript 2.0 programs. On implementations that support this library, these types provide faster, Java-style integer operations that are useful for communicating between JavaScript 2.0 and other programming languages and for performance-critical code. These types are not intended to replace number and integer for general-purpose scripting.

Contents

When the machine types library is imported via an import of "machine-types" version 1, the following types become available:

Type

Values

byte Machine integers between -128 and 127 inclusive
ubyte Machine integers between 0 and 255 inclusive
short Machine integers between -32768 and 32767 inclusive
ushort Machine integers between 0 and 65535 inclusive
int Machine integers between -2147483648 and 2147483647 inclusive
uint Machine integers between 0 and 4294967295 inclusive
long Machine integers between -9223372036854775808 and 9223372036854775807 inclusive
ulong Machine integers between 0 and 18446744073709551615 inclusive

Values belonging to the eight machine integer types above are distinct from each other and from values of type integer. Thus, byte(7) is distinct from int(7), which in turn is distinct from the plain integer 7. However, the coercions listed below usually hide these distinctions.

No subtype relations hold between the machine types.

The above type names are not reserved words.

Coercions

The following coercions take place:

Operations

Machine integers support the arithmetic operators +, -, *, /, %, comparisons ==, !=, <, >, <=, =>, and bitwise logical operations ~, &, |, ^, <<, >>. If supplied two operands of different machine integer types M1 and M2, all of these binary operators except << and >> first coerce both operands to the same type M. If M1 appears before M2 in the list byte, ubyte, short, ushort, int, uint, long, ulong, then M is M2; otherwise M is M1. Then these operators perform the operation and finally return the result as a value of type M. If the result is not within range of the target type M, it is treated modulo |M|.

If one of the operands is a machine integer of type M and the other is an integer value v, then v is first coerced to type M.

The result type of a shift expression (<< or >>) is the same as the type of its first operand. The second operand's type does not affect the type of the result. Right shifts are signed if the first operand has type byte, short, int, or long, and unsigned if it has type ubyte, ushort, uint, or ulong.

Discussion

These rules are designed to permit machine integer operations to be implemented as single instructions on most processor architectures yet give predictable results. Overflows wrap around instead of signaling errors because such behavior is useful for many bit-manipulation algorithms and permits much better optimization of performance-critical code. Code that is concerned about overflows should be using regular integer instead of the machine integer types.

Disjointness of Machine Types

Why are values of the eight machine integer types distinct? This was done because of a desire to allow arithmetic operators to only support 32 bits when operating on int values. Let's take a look at the alternative:

Suppose we unify the values of all eight machine types so that int(2000000000) is indistinguishable from long(2000000000). To what precision should an operator like + calculate its results? Clearly, if we're adding two long values and the result is within the range of long values, then we'd expect to get the right result. In particular, long(2000000000) + long(2000000000) should yield long(4000000000). However, long(2000000000) is indistinguishable from int(2000000000), so int(2000000000) + int(2000000000) should also yield long(4000000000), which is not representable as an int value. Thus, even if both operands are known to be int values, the + operator has to use 64-bit arithmetic.

If a has type int and we compute a+1, then we have to use 64-bit arithmetic because the result could be 2147483648. However, if we compute var int r = a+1 instead, then a smart compiler could make do with 32-bit arithmetic because the result is treated modulo 232. However, this trick would not work with an expression such as var boolean b = a+1 > 0.

The alternative is viable but it leads to more demand for 64-bit arithmetic. It does have the advantage that one does not need to worry about intermediate overflows as long as the values don't approach 264.

Single-Precision Floating-Point Type

Do we want to support a float type for holding single-precision IEEE floating-point numbers? This type may be useful for:

One difficulty with supporting float is deciding what the coercion rules should be. If we invoke + with one number operand and one float operand, should the result be a float or a number? One might expect number, but this makes adding constants to floats using single-precision arithmetic awkward since every constant is a number. If s is a float, the expression s+1 would yield a number instead of a float because 1 is a number. One would have to write s+float(1) instead.


JavaScript 2.0
Libraries
Operator Overloading
previousupnext

Thursday, November 11, 1999

Overview

Operator overloading is useful to implement Spice-style units without having to add units to the core of the JavaScript 2.0 language. Operator overloading is done via an optional library that, when imported, exposes several additional methods of the Object class. This library is analogous to the internationalization library in that it does not have to be present on all implementations of JavaScript 2.0; implementations without this library do not support operator overloading.


JavaScript 2.0
Formal Description
previousupnext

Thursday, November 11, 1999


This chapter presents the formal syntax and semantics of JavaScript 2.0. The syntax notation and semantic notation sections explain the notation used for this description. A simple metalanguage based on a typed lambda calculus is used to specify the semantics.

The syntax and semantic sections are available in both HTML 4.0 and Microsoft Word 98 RTF formats. In the HTML versions each use of a grammar nonterminal or metalanguage value, type, or field is hyperlinked to its definition, making the HTML version preferred for browsing. On the other hand, the RTF version looks much better when printed. The fonts, colors, and other formatting of the various grammar and semantic elements are all encoded as CSS (in HTML) or Word (in RTF) styles and can be altered if desired.

The syntax and semantics sections are machine-generated from code supplied to a small engine that can type-check and execute the semantics directly. This engine is in the CVS tree at mozilla/js/semantics; the input files are at mozilla/js/semantics/JS20.


JavaScript 2.0
Formal Description
Semantic Notation
previousupnext

Thursday, November 11, 1999

Introduction

To precisely specify the semantics of JavaScript 2.0, we use the notation described below to define the behavior of all JavaScript 2.0 constructs and their interactions.

Semantic Values

The semantics describe the meaning of a JavaScript 2.0 program in terms of operations on simpler objects borrowed from mathematics collectively called semantic values. Semantic values can be held in semantic variables and passed to semantic functions. The kinds of semantic values used in this specification are summarized in the table below and explained in the next few sections:

Semantic Value Examples Description
The result of a nonterminating computation
syntaxError The result of a computation that returns by throwing a semantic exception
The result of a semantic function that does not return a useful value
true, false Booleans
-3, 0, 1, 2, 93 Mathematical integers
1/2, -12/7 Mathematical rational numbers
1.0, 3.5, 2.0e-10, -0.0, -, NaN Double-precision IEEE floating-point numbers
A, b, «LF», «uFFFF» Characters (Unicode 16-bit code points)
[value0, ... , valuen-1] Vectors — indexed lists of semantic values
, abc , 1«TAB»5 Strings
{value1value2, ... , valuen} Mathematical sets of semantic values
name1 value1name2 value2, ... , namen valuen    Tuples with named member semantic values
name or name value Tagged semantic values
function(nIntegern*n Semantic functions

There is a special semantic value (pronounced as "bottom") that represents the result of an inconsistent or nonterminating computation. Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to or calling a semantic function with as any argument also yields without evaluating any remaining operands or arguments (in technical terms, semantic functions and operators are strict in all of their arguments unless specified otherwise).

If interpreting a JavaScript program according to the semantics here gives a result, an actual implementation executing that JavaScript program will either fail to terminate or throw an exception because it runs out of memory or stack space.

Semantic values of the form value represents the result of a computation that throws a semantic exception. value is the exception's value (which must be a member of the SemanticException semantic type). Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to value or calling a semantic function with value as any argument also yields value (with the same value) without evaluating any remaining operands or arguments.

The throw statement takes a value v and returns v. The catch statement converts v back to v.

Semantic functions that do not return a useful value return the semantic value . There are no operations defined on .

Booleans

The semantic values true and false are booleans. The not, and, or, and xor operators operate on booleans. Like most other operators, and, or, and xor evaluate both operands before returning a result; these operators do not short-circuit.

Integers

Unless specified otherwise, numbers in the semantics written without a slash or decimal point are mathematical integers: ..., -3, -2, -1, 0, 1, 2, 3, .... The usual mathematical operators +, -, *, and unary - can be used on integers. Integers can be compared using =, , <, , >, and .

Rationals

Numbers in the semantics written with a slash are mathematical rational numbers. Every integer is also a rational. Rational numbers include, for example, 0, 1, 2, -1, 1/2, -12/7, and -24/14; the last two are different ways of writing the same rational number. The usual mathematical operators +, -, *, /, and unary - can be used on rationals. Rationals can be compared using =, , <, , >, and .

Doubles

Numbers in the semantics written with a decimal point are double-precision IEEE floating-point numbers (often abbreviated as doubles), including distinct +0.0, -0.0, +, -, and NaN. Doubles are distinct from integers and rationals; when writing doubles in the semantics, we always include a decimal point to distinguish them from integers and rationals.

Doubles other than +, -, and NaN are called finite. We define the significand of a finite double d as follows:

Characters

Characters are single Unicode 16-bit code points. We write them enclosed in single quotes and . There are exactly 65536 characters: «u0000», «u0001», ...,A, B, C, ..., «uFFFF» (see also notation for non-ASCII characters). Unicode surrogates are considered to be pairs of characters for the purpose of this specification.

The characterToCode and codeToCharacter semantic functions convert between characters and their integer Unicode values.

Vectors

A semantic vector contains zero or more elements indexed by integers starting from zero. We write a vector value by enclosing a comma-separated list of values inside bold brackets:

[element0element1, ... , elementn-1]

For example, the following semantic value is a vector whose elements are four strings:

[parsleysagerosemarythyme]

The empty vector is written as [].

Let u = [e0e1, ... , en-1] and v = [f0f1, ... , fm-1] be vectors, i and j be integers, and x be a value. The following notations describe common operations on vectors:

Notation   Result Value
u  v The concatenated vector [e0e1, ... , en-1f0f1, ... , fm-1]
|u| The length n of the vector
u[i] The ith element ei, or if i<0 or in
u[i ... j] The vector slice [eiei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive, or if i<0, jn, or j<i-1. The result is the empty vector [] if j=i-1.
u[i ...] The vector slice [eiei+1, ... , en-1] consisting of all elements of u between the ith and the end, or if i<0 or i>n. The result is the empty vector [] if i=n.
u[i  x]   The vector [e0, ... , ei-1xei+1, ... , en-1] with the ith element replaced by the value x and the other elements unchanged, or if i<0 or in

Semantic vectors are functional; there is no notation for modifying a semantic vector in place.

Strings

A semantic string is merely a vector of characters. For notational convenience we can write a string literal as zero or more characters enclosed in double quotes. Thus,

Wonder«LF»

is equivalent to:

[Wonder«LF»]

In addition to all of the other vector operations, we can use =, , <, , >, and to compare two strings.

Sets

A semantic set is an unordered collection of values. Each value may occur at most once in a set. There must be a well-defined = semantic operator defined on all pairs of values in the set, and that operator must induce an equivalence relation.

A semantic set is denoted by enclosing a comma-separated list of values inside braces:

{element1element2, ... , elementn}

The empty set is written as {}.

For example, the following set contains seven integers:

{3, 0, 10, 11, 12, 13, -5}

When using elements such as integers and characters that have an obvious total order, we can also write sets by using the ... range operator. For example, we can rewrite the above set as:

{0, -5, 3 ... 3, 10 ... 13}

If the beginning of the range is equal to the end of the range, then the range consists of only one element: {7 ... 7} is the same as {7}. If the end of the range is one "less" than the beginning, then the range contains no elements: {7 ... 6} is the same as {}. If the end of the range is more than one "less" than the beginning, then the set is .

Let A and B be sets and x be a value. The following notations describe common operations on sets:

Notation   Result Value
|A| The number of elements in the set A; if A has infinitely many elements
min A If there exists a value m that satisfies both m  A and for all elements x  A, x  m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite descending sequence of elements with no lower bound in A)
max A If there exists a value m that satisfies both m  A and for all elements x  A, x  m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite ascending sequence of elements with no upper bound in A)
A B The intersection of sets A and B (the set of all values that are present both in A and in B)
A B The union of sets A and B (the set of all values that are present in at least one of A or B)
A - B The difference of sets A and B (the set of all values that are present in A but not B)
x A Return true if x is an element of set A and false if not
A = B Return true if the two sets A and B are equal and false otherwise. Sets A and B are equal if every element of A is also in B and every element of B is also in A.

min and max are only defined for sets whose elements can be compared with <.

Tuples

A semantic tuple is an aggregate of several named semantic values. Tuples are sometimes called records or structures in other languages. A tuple is denoted by a comma-separated list of names and values between bold triangular brackets:

name1 value1name2 value2, ... , namen valuen

Each namei valuei pair is called a field. The order of fields in a tuple is irrelevant, so x 3, y 4 is the same as y 4, x 3. A tuple's names must all be distinct.

Let w be an expression that evaluates to a tuple name1 value1name2 value2, ... , namen valuen. We can extract the value of the field named namei from w by using the notation w.namei. w is required to have this field. For example, x 3, y 4.x is 3.

In the HTML versions of the semantics, each use of namei is linked back to its tuple type's definition.

Oneofs

A semantic oneof is a pair consisting of a name (called the tag) and a value. Oneofs are sometimes called variants or tagged unions in other languages. A oneof is denoted by writing the tag followed by the value:

name value

For brevity, when value is , we can omit it altogether, so red is the same as red .

Let o be an expression that evaluates to some oneof n v. We can perform the following operations on o:

Notation   Result Value
o.name The value v if n is name; otherwise
o is name    true if n is name; false otherwise

For example, (red 5) is blue evaluates to false, while (red 5) is red evaluates to true. (red 5).red evaluates to 5.

In addition to the operators above, the case statement evaluates one of several expressions based on a oneof tag.

In the HTML versions of the semantics, each use of name is linked back to its oneof type's definition.

Functions

A semantic function receives zero or more arguments, performs computations, and returns a result. We write a semantic function as follows:

function(param1type1, ... , paramntypenbody

Here param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, and body is an expression that computes the function's result. When the function is called with argument values v1 through vn, the function's body is evaluated and the resulting value returned to the caller. body can refer to the parameters param1 through paramn; each reference to a parameter parami evaluates to the corresponding argument value vi. Arguments are passed by value (which in this language is equivalent to passing them by reference because there is no way to write to a parameter).

Function parameters are statically scoped. When functions are nested and an inner function f defines a parameter with the same name as a parameter of an outer function g, then f's parameter shadows g's parameter inside f.

The only operation allowed on a semantic function f is calling it, which we do using the f(arg1, ..., argn) syntax. In the presence of side effects, f is evaluated first, followed by the argument expressions arg1 through argn, in left-to-right order. If the result of evaluating f or any of the argument expressions is , then the call immediately returns without evaluating the following argument expressions, if any. If the result of evaluating f or any of the argument expressions is v for some value v, then the call immediately returns that v without evaluating the following argument expressions, if any. Otherwise, f's body is evaluated and the resulting value returned to the caller.

Semantic Types

A semantic type is a possibly infinite set of semantic values. Names of semantic types are shown in Capitalized Red Small Caps, and compound semantic type expressions are in red.

We use semantic types to make the semantics more readable by declaring the semantic type of each semantic variable (including function argument variables). Each such declaration states that the only values that will be stored in a semantic variable will be members of that variable's semantic type. These declarations can be proven statically. The JavaScript semantics have been machine type-checked to ensure that every type declaration holds, so, for example, if the semantics state that variable x has type Integer then there does not exist any place that could assign the value true to x.

Semantic type annotations allow us to restrict the description of each semantic operator and function to only describe its behavior on arguments that are members of the arguments' semantic types. Thus, for example, we need not describe the behavior of the + semantic operator when passed the semantic values true and as operands because we can prove that this case cannot arise.

Every semantic type includes the values and v for all values v whose semantic type is SemanticException. For brevity we do not list and v in the tables below.

Basic Semantic Types

The following are the basic semantic types:

Type Set of Values
Void {}
Boolean {true, false}
Integer {..., -2, -1, 0, 1, 2, ...} (All mathematical integers)
Rational   All mathematical rational numbers
Double All double-precision IEEE floating-point numbers, including , -, and NaN
Character    All 65536 characters
String Shorthand for Character[] (see vector types below)
SemanticException   Set of all values that can be thrown as semantic exceptions. This type is defined separately inside each grammar that throws such exceptions.

The type Rational includes Integer as a subtype because every integer is also a rational number. Except for and v, the types Rational and Double are disjoint.

Compound Semantic Types

We can construct compound semantic types using the notation below. Here t, t1, t2, ..., tn represent some existing semantic types.

Type   Set of Values
t[] All vectors [v0, ... , vn-1] all of whose elements v0, ... , vn-1 have type t. Note that the empty vector [] is a member of every vector type t[].
{t} All sets {v1v2, ... , vn} all of whose elements v1, ... , vn have type t. Note that the empty set {} is a member of every set type {t}.
tuple {name1t1; ... ; namentn} All tuples name1 v1, ... , namen vn for which each vi has type ti for 1  i  n. The namei's must be distinct; the order in which the nameiti fields are listed does not matter.
oneof {name1t1; ... ; namentn}    All oneofs of the form namei v, where 1  i  n and v has type ti. If tk is Void, then namektk can be abbreviated as simply namek in the oneof semantic type syntax. The namei's must be distinct; the order in which the nameiti alternatives are listed does not matter.
t1  t2  ...  tn  t Some* functions that take n arguments of types t1 through tn respectively and produce a result of type t. If n is zero (the function takes no arguments), we write this type as ()  t.
* Technically speaking, this semantic type includes only functions that are continuous in the domain-theoretical sense; this avoids set-theoretical paradoxes.
()  t

The type constructors earlier in the table bind tighter than ones later in the table, so, for example, Integer[]  Rational[] is equivalent to (Integer[])  (Rational[]) (a function that takes a vector of Integers and returns a vector of Rationals) rather than ((Integer[])  Rational)[] (a vector of functions, each of which takes a vector of Integers and returns a Rational). In the rare cases where this is needed, parentheses are used to override precedence.

Semantic Operators

The table below lists the semantic operators in order from the highest precedence (tightest-binding) to the lowest precedence (loosest-binding). Operators under the same heading of the table have the same precedence and associate left-to-right, so, for example, 7-3+2-1 is interpreted as ((7-3)+2)-1 instead of 7-(3+(2-1)) or (7-(3+2))-1. When needed, parentheses can be used to group expressions.

The type signatures of the operators are also listed. Some operators are polymorphic; t, t1, t2, ..., and tn can represent any semantic types. The types of some operators are underdetermined; for example, [] can have type t[] for any type t. In these cases the particular choice of type is inferred from the context.

Each operator in the table below is strict: it evaluates all of its operands left-to-right, and if any operand evaluates to , then the operator immediately returns without evaluating the following operands, if any. If any operand evaluates to v for some value v, then the operator immediately returns that v without evaluating the following operands, if any.

Operator   Signatures Description
Nonassociative Operators
(x) t  t Return x. Parentheses are used to override operator precedence.
|u| t[]  Integer u is a vector [e0e1, ... , en-1]. Return the length n of that vector.
{t Integer The number of elements in the set u; if u has infinitely many elements
[x0x1, ... , xn-1] t  ...  t  t[] Return a vector with the elements x0x1, ... , xn-1.
{x1x2, ... , xn} t  ...  t  {t} Return a set with the elements x1x2, ... , xn. Any duplicate elements are included only once in the set. When t is Integer or Character, we can also replace any of the xi's by a range xi ... yi that contains all integers or characters greater than or equal to xi and less than or equal to yi. yi must not be less than xi "minus" one.
name1 x1, ... , namen xn t1  ...  tn  tuple {name1t1; ... ; namentn} Return a tuple with the fields name1 x1, ... , namen xn.
name oneof {namename2t2; ... ; namentn} Return a oneof value with tag name and value .
Action[nonterminali] Determined by Action's declaration This notation can only be used inside an action definition for a grammar production that has nonterminal nonterminal on the production's right side. Return the value of action Action invoked on the ith instance of nonterminal nonterminal on the right side of . The subscript i can be omitted if there is only one instance of nonterminal nonterminal in .
nonterminali Character This notation can only be used inside an action definition for a grammar production that has nonterminal nonterminal on the production's left or right side. Furthermore, every complete expansion of grammar nonterminal nonterminal must expand it into a single character.
Return the character to which the ith instance of nonterminal nonterminal on the right side of expands. The subscript i can be omitted if there is only one instance of nonterminal nonterminal in . If the subscript is omitted and nonterminal nonterminal appears on the left side of , then this expression returns the single character to which this whole production expands.
Suffix Operators
u[i] t[]  Integer  t u is a vector [e0e1, ... , en-1]. Return the ith element ei, or if i<0 or in.
u[i ... j] t[]  Integer  Integer  t[] u is a vector [e0e1, ... , en-1]. Return the vector slice [eiei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive, or if i<0, jn, or j<i-1. The result is the empty vector [] if j=i-1.
u[i ...] t[]  Integer  t[] u is a vector [e0e1, ... , en-1]. Return the vector slice [eiei+1, ... , en-1] consisting of all elements of u between the ith and the end, or if i<0 or i>n. The result is the empty vector [] if i=n.
u[i  x]   t[]  Integer  t  t[] u is a vector [e0e1, ... , en-1]. Return the vector [e0, ... , ei-1xei+1, ... , en-1] with the ith element replaced by the value x and the other elements unchanged, or if i<0 or in.
w.namei tuple {name1t1; ... ; namentn ti w is a tuple name1 v1, ... , namen vn. Return the value vi of w's field named namei.
oneof {name1t1; ... ; namentn ti w is a oneof namek v for some k between 1 and n inclusive. Return the value v if namei is namek, or if not.
f(x1, ..., xn) (t1  ...  tn t) t1  ...  tn  t Call the function f with the arguments x1 through xn and return the result.
Prefix Operators
-x Integer  Integer or
Rational  Rational
The mathematical negation of x
min A {t t Return the minimal element of set A. Specifically, if there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite descending sequence of elements with no lower bound in A). The type t must have = and < operations that define a total order.
max A {t t Return the maximal element of set A. Specifically, if there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite ascending sequence of elements with no upper bound in A). The type t must have = and < operations that define a total order.
name x t  oneof {nametname2t2; ... ; namentn} Return a oneof value with tag name and value x.
Multiplicative Operators
x * y Integer  Integer  Integer or
Rational  Rational  Rational
The mathematical product of x and y
x / y Rational  Rational  Rational The mathematical quotient of x and y; if y=0
A  B {t {t {t} The intersection of sets A and B (the set of all values that are present both in A and in B)
Additive Operators
x + y Integer  Integer  Integer or
Rational  Rational  Rational
The mathematical sum of x and y
x - y The mathematical difference of x and y
u  v t[]  t[]  t[] u is a vector [e0e1, ... , en-1] and v is a vector [f0f1, ... , fm-1]. Return the concatenated vector [e0e1, ... , en-1f0f1, ... , fm-1].
A  B {t {t {t} The union of sets A and B (the set of all values that are present in at least one of A or B)
A - B {t {t {t} The difference of sets A and B (the set of all values that are present in A but not B)
Comparison Operators
x = y Rational  Rational  Boolean or
Character  Character  Boolean or
String  String  Boolean or
{t {t Boolean
Comparisons return true if the relation holds or false if not.
Rationals are compared mathematically.
Characters are compared according to their code points.
Two strings are equal when they have the same lengths and contain exactly the same sequences of characters. A string x is less than string y when either x is the empty string and y is not empty, the first character of x is less than the first character of y, or the first character of x is equal to the first character of y and the rest of string x is less than the rest of string y.
Two sets x and y are equal if every element of x is also in y and every element of y is also in x. Only = and can be used to compare sets.
x  y
x < y
x  y
x > y
x  y
x A t  {t Boolean Return true if x is an element of set A and false if not
o is namei oneof {name1t1; ... ; namentn Boolean o is a oneof namek v for some k between 1 and n inclusive. Return true if namei is namek, or false otherwise.
Logical Negation
not a Boolean  Boolean true if a is false; false if a is true
Logical Conjunction
a and b Boolean  Boolean  Boolean true if both a and b are true; false if at least one of a and b is false
Logical Disjunction
a or b Boolean  Boolean  Boolean true if at least one of a and b is true; false if both a and b are false
a xor b true if a is true and b is false or a is false and b is true; false if both a and b are true or both a and b are false

Semantic Statements

Semantic statements are similar to the semantic operators above in that they are also used to construct expressions, take zero or more operands, and return a value. Unlike other semantic operators, semantic statements are usually non-strict: they do not always evaluate all of their operands. Semantic statements have lower precedence than any of the semantic operators above.

Some semantic statements are syntactic sugars, which means that they are defined as macros that expand into other, simpler statements and operators.

Function

function(param1type1, ... , paramntypenbody

See the description of function values.

Let

let var1type1 = expr1; ... ; varntypen = exprn in body

Evaluate expr1 through exprn in order and save the results. If any expri evaluates to , then immediately return without evaluating the following expr's. If any expri evaluates to v for some value v, then immediately return that v without evaluating the following expr's. Otherwise evaluate body with new local variable bindings of var1 through varn bound to the saved results of evaluating expr1 through exprn, respectively. Return the result of evaluating body.

type1 through typen are the local variables' respective semantic types. The type of the entire let expression is the type of its body.

The let expression above is syntactic sugar for:

(function(var1type1, ... , varntypenbody)(expr1, ... , exprn)

If

if expr then bodytrue else bodyfalse

Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to either true or false. If it evaluated to true, then evaluate bodytrue and return its result. If expr evaluated to false, then evaluate bodyfalse and return its result.

expr must have type Boolean. The entire if expression has any type t such that both bodytrue has type t and bodyfalse has type t.

Case

case expr of
    name1(var1type1): body1;
    ...
    namen(varntypen): bodyn;
    end

Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to a oneof name v where name matches namei for some i between 1 and n inclusive. Evaluate the corresponding bodyi with a new local variable vari bound to v. Return bodyi's result.

If we are not interested in using the oneof's value for a particular bodyi, we can shorten that bodyi's clause from:

    namei(varitypei): bodyi

to:

    nameibodyi

In this case no local variable is bound while evaluating bodyi.

expr must have type oneof {name1type1; ... ; namentypen}. The entire case expression has any type t such that all of its bodyi's have type t. The namei's must be distinct. The order in which the case clauses are listed does not matter.

Throw

throw expr

Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to some value v, in which case return v.

expr must have type SemanticException. The entire throw expression has any type whatsoever (because every semantic type includes v).

Try-Catch

try
    bodytry
catch (varSemanticException)
    bodyhandler

Evaluate bodytry to obtain a value w. If w does not have the form v for some value v, then return w. Otherwise w is v for some value v. In this case evaluate bodyhandler with a new local variable var bound to v and return bodyhandler's result.

The type of var is always SemanticException. The entire try-catch expression has any type t such that both bodytry has type t and bodyhandler has type t.

Semantic Functions

The sections below list the predefined semantic functions, their type signatures, and short descriptions. All functions are strict and evaluate their arguments left-to-right.

Integer Manipulation

These functions perform bitwise operations on integers. The integers are treated as though they were written in binary notation, with each 1 bit representing true and 0 bit representing false. The integers must be nonnegative.

Function   Signature Description
bitwiseAnd(x, y) Integer  Integer  Integer The bitwise AND of x and y
bitwiseOr(x, y) The bitwise OR of x and y
bitwiseXor(x, y) The bitwise XOR of x and y
bitwiseShift(x, count) Integer  Integer  Integer Shift x to the left by count bits. If count is negative, shift x to the right by -count bits. Bits shifted out are lost; bit shifted in are zero. This function is equivalent to multiplying x by 2count and truncating the result (toward negative infinity) to an integer. x can be negative.

Double Manipulation

Function   Signature Description
rationalToDouble(r) Rational  Double The rational number r rounded to the nearest IEEE double-precision floating-point value as follows:
Consider the set of all doubles, with -0.0, +, -, and NaN removed and with two additional values added to it that are not representable as doubles, namely 21024 and -21024. Choose the member of this set that is closest in value to r. If two values of the set are equally close, choose the one with an even significand; for this purpose, the two extra values 21024 and -21024 are considered to have even significands. Finally, if 21024 was chosen, replace it with +; if -21024 was chosen, replace it with -; if +0.0 was chosen, replace it with -0.0 if and only if r < 0; any other chosen value is used unchanged. The result is the value of rationalToDouble(r).
This procedure corresponds exactly to the behavior of the IEEE 754 "round to nearest" mode.

Character Conversions

Function   Signature Description
characterToCode(c) Character  Integer The number of the Unicode code point c
codeToCharacter(i) Integer  Character The Unicode code point number i, or if i<0 or i>65535

Character Utilities

The function digitValue is defined as follows:

digitValue(cCharacter) : Integer
  = if c  {0 ... 9}
     then characterToCode(c) - characterToCode(0)
     else if c  {A ... Z}
     then characterToCode(c) - characterToCode(A) + 10
     else if c  {a ... z}
     then characterToCode(c) - characterToCode(a) + 10
     else 

Character Class Queries

Function   Signature Description
isOrdinaryInitialIdentifierCharacter(c) Character  Boolean Return true if the nonterminal OrdinaryInitialIdentifierCharacter can expand into c and false otherwise
isOrdinaryContinuingIdentifierCharacter(c) Character  Boolean Return true if the nonterminal OrdinaryContinuingIdentifierCharacter can expand into c and false otherwise

Semantic Definitions

Value Definitions

We can define a global semantic constant named var as follows:

var : type = expr

expr should evaluate to a value of type type. expr should not have side effects, and it should not evaluate to .

In the HTML versions of the semantics, each reference to the global semantic constant var is linked to var's definition.

Function Definitions

We can define a global semantic function named f as follows:

f(param1type1, ... , paramntypen) : type = body

param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, type is the function result's semantic type, and body is an expression that computes the function's result.

The above definition is syntactic sugar for the global constant definition:

f : type1  type2  ...  typen  type = function(param1type1, ... , paramntypenbody

In the HTML versions of the semantics, each reference to the global semantic function f is linked to f's definition.

For example, the function definition

square(xInteger) : Integer = x*x

defines a function named square that takes an Integer parameter x and returns an Integer that is the square of x. This is equivalent to the following global definition:

square : Integer  Integer = function(xIntegerx*x

Type Definitions

We can give a new name to a semantic type t by using the type definition, which has the form:

type name = t

For example, the following notation defines RegExp as a shorthand for tuple {reBodyStringreFlagsString}:

type RegExp = tuple {reBodyStringreFlagsString}

In the HTML versions of the semantics, each reference to the semantic type name name is linked to name's definition.

Semantic Actions

Semantic actions tie together the grammar and the semantics. A semantic action ascribes semantic meaning to a grammar production.

To illustrate the use of semantic actions, we shall look at an example, followed by a detailed description of the notation for specifying semantic actions.

Example

Consider the following grammar, with the start nonterminal Numeral:

Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Digits 
   Digit
|  Digits Digit
Numeral 
   Digits
|  Digits # Digits

This grammar defines the syntax of an acceptable input: 37, 33#4 and 30#2 are acceptable syntactically, while 1a is not. However, the grammar does not indicate what these various inputs mean. That is the job of the semantics, which are defined in terms of actions on the parse tree of grammar rule expansions. Consider the following sample set of actions defined on this grammar, with a starting Numeral action called (in this example) Value:

type SemanticException = oneof {syntaxError}

action Value[Digit] : Integer = digitValue(Digit)

action DecimalValue[Digits] : Integer

DecimalValue[Digits  Digit] = Value[Digit]

DecimalValue[Digits  Digits1 Digit] = 10*DecimalValue[Digits1] + Value[Digit]

action BaseValue[Digits] : Integer  Integer

BaseValue[Digits  Digit](baseInteger)
  = let dInteger = Value[Digit]
     in if d < base
         then d
         else throw syntaxError

BaseValue[Digits  Digits1 Digit](baseInteger)
  = let dInteger = Value[Digit]
     in if d < base
         then base*BaseValue[Digits1](base) + d
         else throw syntaxError

action Value[Numeral] : Integer

Value[Numeral  Digits] = DecimalValue[Digits]

Value[Numeral  Digits1 # Digits2]
  = let baseInteger = DecimalValue[Digits2]
     in if base  2 and base  10
         then BaseValue[Digits1](base)
         else throw syntaxError

Action names are written in violet cursive type. The last action definition states in the example above that the action Value can be applied to any expansion of the nonterminal Numeral, and the result is an Integer. This action maps all acceptable inputs to integers or syntaxError. If the result is syntaxError, then the input satisfies the grammar but contains an error detected by the semantics; this is the case for the input 30#2. A result of would indicate a nonterminating computation; this cannot happen in this example.

There are two definitions of the Value action on Numeral, one for each grammar production that expands Numeral. Each definition of an action is allowed to call actions on the terminals and nonterminals on the right side of the expansion. For example, Value applied to the first Numeral production (the one that expands Numeral into Digits) simply applies the DecimalValue action to the expansion of the nonterminal Digits and returns the result. On the other hand, Value applied to the second Numeral production (the one that expands Numeral into Digits # Digits) performs a computation using the results of the DecimalValue and BaseValue applied to the two expansions of the Digits nonterminals. In this case there are two identical nonterminals Digits on the right side of the expansion, so we use subscripts to indicate on which one we're calling the actions DecimalValue and BaseValue.

The BaseValue action illustrates a syntactic sugar for defining an action that is a function; this syntactic sugar is analogous to that for defining global functions.

The Value action on Digit illustrates the direct use of a nonterminal in a semantic expression: digitValue(Digit). Here the Digit semantic expression evaluates to the character into which the Digit grammar rule expands.

We can fully evaluate the semantics on our sample inputs to get the following results:

Input    Semantic Result
37 37
33#4 15
30#2 syntaxError

Action Declarations

action Action[nonterminal] : type

This declaration states that action Action is defined on nonterminal nonterminal. Any reference to action Action[nonterminal] in a semantic expression returns a value of type type. The values of action Action must be defined using action definitions for each grammar production that has nonterminal on the left side.

Action Definitions

Action[nonterminal  expansion] = expr

This notation defines the value of action Action on nonterminal nonterminal in the case where nonterminal nonterminal expands to the given expansion. expansion can contain zero or more terminals and nonterminals (as well as other notations allowed on the right side of a grammar production). Furthermore, the terminals and nonterminals of expansion can be subscripted to allow them to be unambiguously referenced by action references or nonterminal references inside expr.

The type of action Action on nonterminal nonterminal must be declared using an action declaration. expr must have the type given by that action declaration.

nonterminal  expansion must be one of the productions in the grammar.

Action Function Definitions

Action[nonterminal  expansion](param1type1, ... , paramntypen) = body

This notation is a syntactic sugar for defining an action whose value is a function. This notation is equivalent to:

Action[nonterminal  expansion] =
    function(param1type1, ... , paramntypenbody

Combined Action Declarations and Definitions

action Action[nonterminal] : type = expr

This declaration is sometimes used when all expansions of nonterminal nonterminal share the same action semantics. This declaration states both the type type of action Action on nonterminal nonterminal as well as that action's value expr. Note that the expansions are not given between the square brackets, and expr can refer only to the nonterminal nonterminal on the left side of grammar productions. No additional action definitions are needed for nonterminal nonterminal.

See the Value action on Digit in the example above for an example of this declaration.


JavaScript 2.0
Formal Description
Stages
previousupnext

Thursday, November 11, 1999

This page is out of date

The source code is processed in the following stages:

  1. If necessary, convert the source code into the Unicode UTF-16 format, normalized form C.
  2. Split the source code into tokens using the lexer grammar and lexer semantics.
  3. Parse the resulting sequence of tokens using the parser grammar and evaluate it using the parser semantics [To be provided].

Lexing

Processing stage 2 is done as follows:

  1. Let tokens be an empty array of Token metalanguage records. (As defined in the lexer semantics, a Token can be either an identifier, a keyword, a punctuation symbol, a number, a number with a unit, a string, or the end token.)
  2. Let input be the input sequence of Unicode characters. Append a special placeholder End to the end of input.
  3. Let regExpMayFollow be a Boolean variable. Initialize it to true.
  4. Apply the lexer grammar to parse the longest possible prefix of input. If regExpMayFollow is true, use the start symbol NextTokenre. If regExpMayFollow is false, use the start symbol NextTokendiv. The result of the parse should be a parse tree T. If the parse failed, return a syntax error.
  5. Compute the action Token on T to obtain a Token t. If t is the end token, return the tokens array and go to the parse stage.
  6. Append t to the end of the tokens array.
  7. Compute the action RegExpMayFollow on T to obtain a Boolean value and assign that value to the regExpMayFollow variable.
  8. Remove the characters matched by T from input, leaving only the yet-unparsed suffix of input.
  9. Go to step 4.

If an implementation encounters an error while lexing, it is permitted to either report the error immediately or defer it until the affected token would actually be used by the parser. This flexibility allows an implementation to do lexing at the same time it parses the source program.

Provide language prohibiting an identifier from immediately following a number. This will fall out of the revised definition of QuantityLiteral.

Show mapping from Token structures to parser grammar terminals (obvious, but needs to be written).

Parsing

To be provided


JavaScript 2.0
Formal Description
Lexer Grammar
previousupnext

Thursday, November 11, 1999

This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

The start symbols are NextTokenre and NextTokendiv depending on whether a / should be interpreted as a regular expression or division.

Unicode Character Classes

UnicodeCharacter  Any Unicode character
UnicodeInitialAlphabetic  Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator  «LF» | «CR» | «u2028» | «u2029»
ASCIIDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Comments

LineComment  / / LineCommentCharacters
LineCommentCharacters 
   «empty»
|  LineCommentCharacters NonTerminator
NonTerminator  UnicodeCharacter except LineTerminator
BlockComment  / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
|  BlockCommentCharacters NonSlash
|  PreSlashCharacters /
PreSlashCharacters 
   «empty»
|  BlockCommentCharacters NonAsteriskOrSlash
|  PreSlashCharacters /
NonSlash  UnicodeCharacter except /
NonAsteriskOrSlash  UnicodeCharacter except * | /

White space

WhiteSpace 
   «empty»
|  WhiteSpace WhiteSpaceCharacter
|  WhiteSpace LineTerminator
|  WhiteSpace LineComment LineTerminator
|  WhiteSpace BlockComment

Tokens

t  {rediv}
NextTokent  WhiteSpace Tokent
Tokenre 
   IdentifierOrReservedWord
|  Punctuator
|  NumericLiteral
|  QuantityLiteral
|  StringLiteral
|  RegExpLiteral
|  EndOfInput
Tokendiv 
   IdentifierOrReservedWord
|  Punctuator
|  DivisionPunctuator
|  NumericLiteral
|  QuantityLiteral
|  StringLiteral
|  EndOfInput
EndOfInput 
   End
|  LineComment End

Keywords and identifiers

IdentifierName 
   InitialIdentifierCharacter
|  IdentifierName ContinuingIdentifierCharacter
InitialIdentifierCharacter 
   OrdinaryInitialIdentifierCharacter
|  \ HexEscape
OrdinaryInitialIdentifierCharacter  UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacter 
   OrdinaryContinuingIdentifierCharacter
|  \ HexEscape
OrdinaryContinuingIdentifierCharacter  UnicodeAlphanumeric | $ | _
IdentifierOrReservedWord  IdentifierName

Punctuators

Punctuator 
   PunctuatorRE
|  PunctuatorDiv
PunctuatorRE 
   !
|  ! =
|  ! = =
|  #
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  *
|  * =
|  +
|  + =
|  ,
|  -
|  - =
|  - >
|  .
|  . .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  @
|  [
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  ~
PunctuatorDiv 
   )
|  + +
|  - -
|  ]
|  }
DivisionPunctuator 
   /
|  / =

Numeric literals

NumericLiteral 
   DecimalLiteral
|  HexIntegerLiteral [lookahead{HexDigit}]
DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE  E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits ASCIIDigit
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction  DecimalDigits
SignedInteger 
   DecimalDigits
|  + DecimalDigits
|  - DecimalDigits
DecimalDigits 
   ASCIIDigit
|  DecimalDigits ASCIIDigit
HexIntegerLiteral 
   0 LetterX HexDigit
|  HexIntegerLiteral HexDigit
LetterX  X | x
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Quantity literals

QuantityLiteral  NumericLiteral QuantityName
QuantityName  [lookahead{LetterELetterX}] IdentifierName

String literals

q  {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "
StringCharsq 
   «empty»
|  StringCharsq StringCharq
StringCharq 
   LiteralStringCharq
|  \ StringEscape
LiteralStringCharsingle  UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble  UnicodeCharacter except " | \ | LineTerminator
StringEscape 
   ControlEscape
|  ZeroEscape
|  HexEscape
|  IdentityEscape
IdentityEscape  NonTerminator except UnicodeAlphanumeric
ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v
ZeroEscape  0 [lookahead{ASCIIDigit}]
HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit

Regular expression literals

RegExpLiteral  RegExpBody RegExpFlags
RegExpFlags 
   «empty»
|  RegExpFlags ContinuingIdentifierCharacter
RegExpBody  / RegExpFirstChar RegExpChars /
RegExpFirstChar 
   OrdinaryRegExpFirstChar
|  \ NonTerminator
OrdinaryRegExpFirstChar  NonTerminator except \ | / | *
RegExpChars 
   «empty»
|  RegExpChars RegExpChar
RegExpChar 
   OrdinaryRegExpChar
|  \ NonTerminator
OrdinaryRegExpChar  NonTerminator except \ | /

JavaScript 2.0
Formal Description
Lexer Semantics
previousupnext

Thursday, November 11, 1999

The lexer semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexer grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The start symbols are NextTokenre and NextTokendiv depending on whether a / should be interpreted as a regular expression or division.

Semantics

type SemanticException = oneof {syntaxError}

Unicode Character Classes

Syntax

UnicodeCharacter  Any Unicode character
UnicodeInitialAlphabetic  Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator  «LF» | «CR» | «u2028» | «u2029»
ASCIIDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

action DecimalValue[ASCIIDigit] : Integer = digitValue(ASCIIDigit)

Comments

Syntax

LineComment  / / LineCommentCharacters
LineCommentCharacters 
   «empty»
|  LineCommentCharacters NonTerminator
NonTerminator  UnicodeCharacter except LineTerminator
BlockComment  / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
|  BlockCommentCharacters NonSlash
|  PreSlashCharacters /
PreSlashCharacters 
   «empty»
|  BlockCommentCharacters NonAsteriskOrSlash
|  PreSlashCharacters /
NonSlash  UnicodeCharacter except /
NonAsteriskOrSlash  UnicodeCharacter except * | /

White space

Syntax

WhiteSpace 
   «empty»
|  WhiteSpace WhiteSpaceCharacter
|  WhiteSpace LineTerminator
|  WhiteSpace LineComment LineTerminator
|  WhiteSpace BlockComment

Tokens

Syntax

t  {rediv}
NextTokent  WhiteSpace Tokent
Tokenre 
   IdentifierOrReservedWord
|  Punctuator
|  NumericLiteral
|  QuantityLiteral
|  StringLiteral
|  RegExpLiteral
|  EndOfInput
Tokendiv 
   IdentifierOrReservedWord
|  Punctuator
|  DivisionPunctuator
|  NumericLiteral
|  QuantityLiteral
|  StringLiteral
|  EndOfInput
EndOfInput 
   End
|  LineComment End

Semantics

type RegExp = tuple {reBodyStringreFlagsString}

type Quantity = tuple {amountDoubleunitString}

type Token
  = oneof {
           identifierString;
           keywordString;
           punctuatorString;
           numberDouble;
           quantityQuantity;
           stringString;
           regularExpressionRegExp;
           end}

action Token[NextTokent] : Token

Token[NextTokent  WhiteSpace Tokent] = Token[Tokent]

action RegExpMayFollow[NextTokent] : Boolean

RegExpMayFollow[NextTokent  WhiteSpace Tokent] = RegExpMayFollow[Tokent]

action Token[Tokent] : Token

Token[Tokent  IdentifierOrReservedWord] = Token[IdentifierOrReservedWord]

Token[Tokent  Punctuator] = Token[Punctuator]

Token[Tokendiv  DivisionPunctuator] = punctuator Punctuator[DivisionPunctuator]

Token[Tokent  NumericLiteral] = number DoubleValue[NumericLiteral]

Token[Tokent  QuantityLiteral] = quantity QuantityValue[QuantityLiteral]

Token[Tokent  StringLiteral] = string StringValue[StringLiteral]

Token[Tokenre  RegExpLiteral] = regularExpression REValue[RegExpLiteral]

Token[Tokent  EndOfInput] = end

action RegExpMayFollow[Tokent] : Boolean

RegExpMayFollow[Tokent  IdentifierOrReservedWord]
  = RegExpMayFollow[IdentifierOrReservedWord]

RegExpMayFollow[Tokent  Punctuator] = RegExpMayFollow[Punctuator]

RegExpMayFollow[Tokendiv  DivisionPunctuator] = true

RegExpMayFollow[Tokent  NumericLiteral] = false

RegExpMayFollow[Tokent  QuantityLiteral] = false

RegExpMayFollow[Tokent  StringLiteral] = false

RegExpMayFollow[Tokenre  RegExpLiteral] = false

RegExpMayFollow[Tokent  EndOfInput] = true

Keywords and identifiers

Syntax

IdentifierName 
   InitialIdentifierCharacter
|  IdentifierName ContinuingIdentifierCharacter
InitialIdentifierCharacter 
   OrdinaryInitialIdentifierCharacter
|  \ HexEscape
OrdinaryInitialIdentifierCharacter  UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacter 
   OrdinaryContinuingIdentifierCharacter
|  \ HexEscape
OrdinaryContinuingIdentifierCharacter  UnicodeAlphanumeric | $ | _

Semantics

action Name[IdentifierName] : String

Name[IdentifierName  InitialIdentifierCharacter]
  = [CharacterValue[InitialIdentifierCharacter]]

Name[IdentifierName  IdentifierName1 ContinuingIdentifierCharacter]
  = Name[IdentifierName1 [CharacterValue[ContinuingIdentifierCharacter]]

action ContainsEscapes[IdentifierName] : Boolean

ContainsEscapes[IdentifierName  InitialIdentifierCharacter]
  = ContainsEscapes[InitialIdentifierCharacter]

ContainsEscapes[IdentifierName  IdentifierName1 ContinuingIdentifierCharacter]
  = ContainsEscapes[IdentifierName1or ContainsEscapes[ContinuingIdentifierCharacter]

action CharacterValue[InitialIdentifierCharacter] : Character

CharacterValue[InitialIdentifierCharacter  OrdinaryInitialIdentifierCharacter]
  = OrdinaryInitialIdentifierCharacter

CharacterValue[InitialIdentifierCharacter  \ HexEscape]
  = if isOrdinaryInitialIdentifierCharacter(CharacterValue[HexEscape])
     then CharacterValue[HexEscape]
     else throw syntaxError

action ContainsEscapes[InitialIdentifierCharacter] : Boolean

ContainsEscapes[InitialIdentifierCharacter  OrdinaryInitialIdentifierCharacter] = false

ContainsEscapes[InitialIdentifierCharacter  \ HexEscape] = true

action CharacterValue[ContinuingIdentifierCharacter] : Character

CharacterValue[ContinuingIdentifierCharacter  OrdinaryContinuingIdentifierCharacter]
  = OrdinaryContinuingIdentifierCharacter

CharacterValue[ContinuingIdentifierCharacter  \ HexEscape]
  = if isOrdinaryContinuingIdentifierCharacter(CharacterValue[HexEscape])
     then CharacterValue[HexEscape]
     else throw syntaxError

action ContainsEscapes[ContinuingIdentifierCharacter] : Boolean

ContainsEscapes[ContinuingIdentifierCharacter  OrdinaryContinuingIdentifierCharacter]
  = false

ContainsEscapes[ContinuingIdentifierCharacter  \ HexEscape] = true

reservedWordsRE : String[]
  = [abstract”,
      “break”,
      “case”,
      “catch”,
      “class”,
      “const”,
      “continue”,
      “debugger”,
      “default”,
      “delete”,
      “do”,
      “else”,
      “enum”,
      “eval”,
      “export”,
      “extends”,
      “final”,
      “finally”,
      “for”,
      “function”,
      “goto”,
      “if”,
      “implements”,
      “import”,
      “in”,
      “instanceof”,
      “native”,
      “new”,
      “package”,
      “private”,
      “protected”,
      “public”,
      “return”,
      “static”,
      “switch”,
      “synchronized”,
      “throw”,
      “throws”,
      “transient”,
      “try”,
      “typeof”,
      “var”,
      “volatile”,
      “while”,
      “with]

reservedWordsDiv : String[] = [false”, “null”, “super”, “this”, “true]

nonReservedWords : String[]
  = [box”,
      “constructor”,
      “field”,
      “get”,
      “language”,
      “local”,
      “method”,
      “override”,
      “set”,
      “version]

keywords : String[] = reservedWordsRE  reservedWordsDiv  nonReservedWords

member(idStringlistString[]) : Boolean
  = if |list| = 0
     then false
     else if id = list[0]
     then true
     else member(idlist[1 ...])

Syntax

IdentifierOrReservedWord  IdentifierName

Semantics

action Token[IdentifierOrReservedWord] : Token

Token[IdentifierOrReservedWord  IdentifierName]
  = let idString = Name[IdentifierName]
     in if member(idkeywordsand not ContainsEscapes[IdentifierName]
         then keyword id
         else identifier id

action RegExpMayFollow[IdentifierOrReservedWord] : Boolean

RegExpMayFollow[IdentifierOrReservedWord  IdentifierName]
  = let idString = Name[IdentifierName]
     in member(idreservedWordsREand not ContainsEscapes[IdentifierName]

Punctuators

Syntax

Punctuator 
   PunctuatorRE
|  PunctuatorDiv
PunctuatorRE 
   !
|  ! =
|  ! = =
|  #
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  *
|  * =
|  +
|  + =
|  ,
|  -
|  - =
|  - >
|  .
|  . .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  @
|  [
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  ~
PunctuatorDiv 
   )
|  + +
|  - -
|  ]
|  }
DivisionPunctuator 
   /
|  / =

Semantics

action Token[Punctuator] : Token

Token[Punctuator  PunctuatorRE] = punctuator Punctuator[PunctuatorRE]

Token[Punctuator  PunctuatorDiv] = punctuator Punctuator[PunctuatorDiv]

action RegExpMayFollow[Punctuator] : Boolean

RegExpMayFollow[Punctuator  PunctuatorRE] = true

RegExpMayFollow[Punctuator  PunctuatorDiv] = false

action Punctuator[PunctuatorRE] : String

Punctuator[PunctuatorRE  !] = “!

Punctuator[PunctuatorRE  ! =] = “!=

Punctuator[PunctuatorRE  ! = =] = “!==

Punctuator[PunctuatorRE  #] = “#

Punctuator[PunctuatorRE  %] = “%

Punctuator[PunctuatorRE  % =] = “%=

Punctuator[PunctuatorRE  &] = “&

Punctuator[PunctuatorRE  & &] = “&&

Punctuator[PunctuatorRE  & & =] = “&&=

Punctuator[PunctuatorRE  & =] = “&=

Punctuator[PunctuatorRE  (] = “(

Punctuator[PunctuatorRE  *] = “*

Punctuator[PunctuatorRE  * =] = “*=

Punctuator[PunctuatorRE  +] = “+

Punctuator[PunctuatorRE  + =] = “+=

Punctuator[PunctuatorRE  ,] = “,

Punctuator[PunctuatorRE  -] = “-

Punctuator[PunctuatorRE  - =] = “-=

Punctuator[PunctuatorRE  - >] = “->

Punctuator[PunctuatorRE  .] = “.

Punctuator[PunctuatorRE  . .] = “..

Punctuator[PunctuatorRE  . . .] = “...

Punctuator[PunctuatorRE  :] = “:

Punctuator[PunctuatorRE  : :] = “::

Punctuator[PunctuatorRE  ;] = “;

Punctuator[PunctuatorRE  <] = “<

Punctuator[PunctuatorRE  < <] = “<<

Punctuator[PunctuatorRE  < < =] = “<<=

Punctuator[PunctuatorRE  < =] = “<=

Punctuator[PunctuatorRE  =] = “=

Punctuator[PunctuatorRE  = =] = “==

Punctuator[PunctuatorRE  = = =] = “===

Punctuator[PunctuatorRE  >] = “>

Punctuator[PunctuatorRE  > =] = “>=

Punctuator[PunctuatorRE  > >] = “>>

Punctuator[PunctuatorRE  > > =] = “>>=

Punctuator[PunctuatorRE  > > >] = “>>>

Punctuator[PunctuatorRE  > > > =] = “>>>=

Punctuator[PunctuatorRE  ?] = “?

Punctuator[PunctuatorRE  @] = “@

Punctuator[PunctuatorRE  [] = “[

Punctuator[PunctuatorRE  ^] = “^

Punctuator[PunctuatorRE  ^ =] = “^=

Punctuator[PunctuatorRE  ^ ^] = “^^

Punctuator[PunctuatorRE  ^ ^ =] = “^^=

Punctuator[PunctuatorRE  {] = “{

Punctuator[PunctuatorRE  |] = “|

Punctuator[PunctuatorRE  | =] = “|=

Punctuator[PunctuatorRE  | |] = “||

Punctuator[PunctuatorRE  | | =] = “||=

Punctuator[PunctuatorRE  ~] = “~

action Punctuator[PunctuatorDiv] : String

Punctuator[PunctuatorDiv  )] = “)

Punctuator[PunctuatorDiv  + +] = “++

Punctuator[PunctuatorDiv  - -] = “--

Punctuator[PunctuatorDiv  ]] = “]

Punctuator[PunctuatorDiv  }] = “}

action Punctuator[DivisionPunctuator] : String

Punctuator[DivisionPunctuator  /] = “/

Punctuator[DivisionPunctuator  / =] = “/=

Numeric literals

Syntax

NumericLiteral 
   DecimalLiteral
|  HexIntegerLiteral [lookahead{HexDigit}]

Semantics

action DoubleValue[NumericLiteral] : Double

DoubleValue[NumericLiteral  DecimalLiteral]
  = rationalToDouble(RationalValue[DecimalLiteral])

DoubleValue[NumericLiteral  HexIntegerLiteral [lookahead{HexDigit}]]
  = rationalToDouble(IntegerValue[HexIntegerLiteral])

expt(baseRationalexponentInteger) : Rational
  = if exponent = 0
     then 1
     else if exponent < 0
     then 1/expt(base, -exponent)
     else base*expt(baseexponent - 1)

Syntax

DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE  E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits ASCIIDigit
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction  DecimalDigits

Semantics

action RationalValue[DecimalLiteral] : Rational

RationalValue[DecimalLiteral  Mantissa] = RationalValue[Mantissa]

RationalValue[DecimalLiteral  Mantissa LetterE SignedInteger]
  = RationalValue[Mantissa]*expt(10, IntegerValue[SignedInteger])

action RationalValue[Mantissa] : Rational

RationalValue[Mantissa  DecimalIntegerLiteral] = IntegerValue[DecimalIntegerLiteral]

RationalValue[Mantissa  DecimalIntegerLiteral .] = IntegerValue[DecimalIntegerLiteral]

RationalValue[Mantissa  DecimalIntegerLiteral . Fraction]
  = IntegerValue[DecimalIntegerLiteral] + RationalValue[Fraction]

RationalValue[Mantissa  . Fraction] = RationalValue[Fraction]

action IntegerValue[DecimalIntegerLiteral] : Integer

IntegerValue[DecimalIntegerLiteral  0] = 0

IntegerValue[DecimalIntegerLiteral  NonZeroDecimalDigits]
  = IntegerValue[NonZeroDecimalDigits]

action IntegerValue[NonZeroDecimalDigits] : Integer

IntegerValue[NonZeroDecimalDigits  NonZeroDigit] = DecimalValue[NonZeroDigit]

IntegerValue[NonZeroDecimalDigits  NonZeroDecimalDigits1 ASCIIDigit]
  = 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[ASCIIDigit]

action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)

action RationalValue[Fraction] : Rational

RationalValue[Fraction  DecimalDigits]
  = IntegerValue[DecimalDigits]/expt(10, NDigits[DecimalDigits])

Syntax

SignedInteger 
   DecimalDigits
|  + DecimalDigits
|  - DecimalDigits

Semantics

action IntegerValue[SignedInteger] : Integer

IntegerValue[SignedInteger  DecimalDigits] = IntegerValue[DecimalDigits]

IntegerValue[SignedInteger  + DecimalDigits] = IntegerValue[DecimalDigits]

IntegerValue[SignedInteger  - DecimalDigits] = -IntegerValue[DecimalDigits]

Syntax

DecimalDigits 
   ASCIIDigit
|  DecimalDigits ASCIIDigit

Semantics

action IntegerValue[DecimalDigits] : Integer

IntegerValue[DecimalDigits  ASCIIDigit] = DecimalValue[ASCIIDigit]

IntegerValue[DecimalDigits  DecimalDigits1 ASCIIDigit]
  = 10*IntegerValue[DecimalDigits1] + DecimalValue[ASCIIDigit]

action NDigits[DecimalDigits] : Integer

NDigits[DecimalDigits  ASCIIDigit] = 1

NDigits[DecimalDigits  DecimalDigits1 ASCIIDigit] = NDigits[DecimalDigits1] + 1

Syntax

HexIntegerLiteral 
   0 LetterX HexDigit
|  HexIntegerLiteral HexDigit
LetterX  X | x
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

action IntegerValue[HexIntegerLiteral] : Integer

IntegerValue[HexIntegerLiteral  0 LetterX HexDigit] = HexValue[HexDigit]

IntegerValue[HexIntegerLiteral  HexIntegerLiteral1 HexDigit]
  = 16*IntegerValue[HexIntegerLiteral1] + HexValue[HexDigit]

action HexValue[HexDigit] : Integer = digitValue(HexDigit)

Quantity literals

Syntax

QuantityLiteral  NumericLiteral QuantityName
QuantityName  [lookahead{LetterELetterX}] IdentifierName

Semantics

action QuantityValue[QuantityLiteral] : Quantity

QuantityValue[QuantityLiteral  NumericLiteral QuantityName]
  = amount DoubleValue[NumericLiteral], unit Name[QuantityName]

action Name[QuantityName] : String

Name[QuantityName  [lookahead{LetterELetterX}] IdentifierName]
  = Name[IdentifierName]

String literals

Syntax

q  {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "

Semantics

action StringValue[StringLiteral] : String

StringValue[StringLiteral  ' StringCharssingle '] = StringValue[StringCharssingle]

StringValue[StringLiteral  " StringCharsdouble "] = StringValue[StringCharsdouble]

Syntax

StringCharsq 
   «empty»
|  StringCharsq StringCharq
StringCharq 
   LiteralStringCharq
|  \ StringEscape
LiteralStringCharsingle  UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble  UnicodeCharacter except " | \ | LineTerminator

Semantics

action StringValue[StringCharsq] : String

StringValue[StringCharsq  «empty»] = “”

StringValue[StringCharsq  StringCharsq1 StringCharq]
  = StringValue[StringCharsq1 [CharacterValue[StringCharq]]

action CharacterValue[StringCharq] : Character

CharacterValue[StringCharq  LiteralStringCharq] = LiteralStringCharq

CharacterValue[StringCharq  \ StringEscape] = CharacterValue[StringEscape]

Syntax

StringEscape 
   ControlEscape
|  ZeroEscape
|  HexEscape
|  IdentityEscape
IdentityEscape  NonTerminator except UnicodeAlphanumeric

Semantics

action CharacterValue[StringEscape] : Character

CharacterValue[StringEscape  ControlEscape] = CharacterValue[ControlEscape]

CharacterValue[StringEscape  ZeroEscape] = CharacterValue[ZeroEscape]

CharacterValue[StringEscape  HexEscape] = CharacterValue[HexEscape]

CharacterValue[StringEscape  IdentityEscape] = IdentityEscape

Syntax

ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v

Semantics

action CharacterValue[ControlEscape] : Character

CharacterValue[ControlEscape  b] = ‘«BS»

CharacterValue[ControlEscape  f] = ‘«FF»

CharacterValue[ControlEscape  n] = ‘«LF»

CharacterValue[ControlEscape  r] = ‘«CR»

CharacterValue[ControlEscape  t] = ‘«TAB»

CharacterValue[ControlEscape  v] = ‘«VT»

Syntax

ZeroEscape  0 [lookahead{ASCIIDigit}]

Semantics

action CharacterValue[ZeroEscape] : Character

CharacterValue[ZeroEscape  0 [lookahead{ASCIIDigit}]] = ‘«NUL»

Syntax

HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit

Semantics

action CharacterValue[HexEscape] : Character

CharacterValue[HexEscape  x HexDigit1 HexDigit2]
  = codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])

CharacterValue[HexEscape  u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
  = codeToCharacter(
         4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
         HexValue[HexDigit4])

Regular expression literals

Syntax

RegExpLiteral  RegExpBody RegExpFlags
RegExpFlags 
   «empty»
|  RegExpFlags ContinuingIdentifierCharacter
RegExpBody  / RegExpFirstChar RegExpChars /
RegExpFirstChar 
   OrdinaryRegExpFirstChar
|  \ NonTerminator
OrdinaryRegExpFirstChar  NonTerminator except \ | / | *
RegExpChars 
   «empty»
|  RegExpChars RegExpChar
RegExpChar 
   OrdinaryRegExpChar
|  \ NonTerminator
OrdinaryRegExpChar  NonTerminator except \ | /

Semantics

action REValue[RegExpLiteral] : RegExp

REValue[RegExpLiteral  RegExpBody RegExpFlags]
  = reBody REBody[RegExpBody], reFlags REFlags[RegExpFlags]

action REFlags[RegExpFlags] : String

REFlags[RegExpFlags  «empty»] = “”

REFlags[RegExpFlags  RegExpFlags1 ContinuingIdentifierCharacter]
  = REFlags[RegExpFlags1 [CharacterValue[ContinuingIdentifierCharacter]]

action REBody[RegExpBody] : String

REBody[RegExpBody  / RegExpFirstChar RegExpChars /]
  = REBody[RegExpFirstChar REBody[RegExpChars]

action REBody[RegExpFirstChar] : String

REBody[RegExpFirstChar  OrdinaryRegExpFirstChar] = [OrdinaryRegExpFirstChar]

REBody[RegExpFirstChar  \ NonTerminator] = [\’, NonTerminator]

action REBody[RegExpChars] : String

REBody[RegExpChars  «empty»] = “”

REBody[RegExpChars  RegExpChars1 RegExpChar]
  = REBody[RegExpChars1 REBody[RegExpChar]

action REBody[RegExpChar] : String

REBody[RegExpChar  OrdinaryRegExpChar] = [OrdinaryRegExpChar]

REBody[RegExpChar  \ NonTerminator] = [\’, NonTerminator]


JavaScript 2.0
Formal Description
Regular Expression Grammar
previousupnext

Thursday, November 11, 1999

This LR(1) grammar describes the regular expression syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

Unicode Character Classes

UnicodeCharacter  Any Unicode character
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
LineTerminator  «LF» | «CR» | «u2028» | «u2029»

Regular Expression Definitions

Regular Expression Patterns

RegularExpressionPattern  Disjunction

Disjunctions

Disjunction 
   Alternative
|  Alternative | Disjunction

Alternatives

Alternative 
   «empty»
|  Alternative Term

Terms

Term 
   Assertion
|  Atom
|  Atom Quantifier
Quantifier 
   QuantifierPrefix
|  QuantifierPrefix ?
QuantifierPrefix 
   *
|  +
|  ?
|  { DecimalDigits }
|  { DecimalDigits , }
|  { DecimalDigits , DecimalDigits }
DecimalDigits 
   DecimalDigit
|  DecimalDigits DecimalDigit
DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Assertions

Assertion 
   ^
|  $
|  \ b
|  \ B

Atoms

Atom 
   PatternCharacter
|  .
|  \ AtomEscape
|  CharacterClass
|  ( Disjunction )
|  ( ? : Disjunction )
|  ( ? = Disjunction )
|  ( ? ! Disjunction )
PatternCharacter  UnicodeCharacter except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |

Escapes

AtomEscape 
   DecimalEscape
|  CharacterEscape
|  CharacterClassEscape
CharacterEscape 
   ControlEscape
|  c ControlLetter
|  HexEscape
|  IdentityEscape
ControlLetter 
   A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|  a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
IdentityEscape  UnicodeCharacter except UnicodeAlphanumeric
ControlEscape 
   f
|  n
|  r
|  t
|  v

Decimal Escapes

DecimalEscape  DecimalIntegerLiteral [lookahead{DecimalDigit}]
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits DecimalDigit
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Hexadecimal Escapes

HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Character Class Escapes

CharacterClassEscape 
   s
|  S
|  d
|  D
|  w
|  W

User-Specified Character Classes

CharacterClass 
   [ [lookahead{^}] ClassRanges ]
|  [ ^ ClassRanges ]
ClassRanges 
   «empty»
|  NonemptyClassRangesdash
d  {dashnoDash}
NonemptyClassRangesd 
   ClassAtomdash
|  ClassAtomd NonemptyClassRangesnoDash
|  ClassAtomd - ClassAtomdash ClassRanges

Character Class Range Atoms

ClassAtomd 
   ClassCharacterd
|  \ ClassEscape
ClassCharacterdash  UnicodeCharacter except \ | ]
ClassCharacternoDash  ClassCharacterdash except -
ClassEscape 
   DecimalEscape
|  b
|  CharacterEscape
|  CharacterClassEscape

JavaScript 2.0
Formal Description
Regular Expression Semantics
previousupnext

Thursday, November 11, 1999

The regular expression semantics describe the actions the regular expression engine takes in order to transform a regular expression pattern into a function for matching against input strings. For convenience, the regular expression grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The regular expression semantics below are working (except for case-insensitive matches) and have been tried on sample cases, but they could be formatted better.

Semantics

type SemanticException = oneof {syntaxError}

Unicode Character Classes

Syntax

UnicodeCharacter  Any Unicode character
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
LineTerminator  «LF» | «CR» | «u2028» | «u2029»

Semantics

lineTerminators : {Character} = {‘«LF»’, ‘«CR»’, ‘«u2028»’, ‘«u2029»’}

reWhitespaces : {Character} = {‘«FF»’, ‘«LF»’, ‘«CR»’, ‘«TAB»’, ‘«VT»’, ‘ ’}

reDigits : {Character} = {‘0’ ... ‘9’}

reWordCharacters : {Character} = {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’, ‘_’}

Regular Expression Definitions

Semantics

type REInput = tuple {strStringignoreCaseBooleanmultilineBoolean}

Field str is the input string. ignoreCase and multiline are the corresponding regular expression flags.

type REResult = oneof {successREMatchfailure}

type REMatch = tuple {endIndexIntegercapturesCapture[]}

A REMatch holds an intermediate state during the pattern-matching process. endIndex is the index of the next input character to be matched by the next component in a regular expression pattern. If we are at the end of the pattern, endIndex is one plus the index of the last matched input character. captures is a zero-based array of the strings captured so far by capturing parentheses.

type Capture = oneof {presentStringabsent}

type Continuation = REMatch  REResult

A Continuation is a function that attempts to match the remaining portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. If a match is possible, it returns a success result that contains the final REMatch state; if no match is possible, it returns a failure result.

type Matcher = REInput  REMatch  Continuation  REResult

A Matcher is a function that attempts to match a middle portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. Since the remainder of the pattern heavily influences whether (and how) a middle portion will match, we must pass in a Continuation function that checks whether the rest of the pattern matched. If the continuation returns failure, the matcher function may call it repeatedly, trying various alternatives at pattern choice points.

The REInput parameter contains the input string and is merely passed down to subroutines.

type MatcherGenerator = Integer  Matcher

A MatcherGenerator is a function executed at the time the regular expression is compiled that returns a Matcher for a part of the pattern. The Integer parameter contains the number of capturing left parentheses seen so far in the pattern and is used to assign static, consecutive numbers to capturing parentheses.

characterSetMatcher(acceptanceSet{Character}invertBoolean) : Matcher
  = function(tREInputxREMatchcContinuation)
         let iInteger = x.endIndex;
             sString = t.str
         in if i = |s|
             then failure
             else if s[i acceptanceSet xor invert
             then c(endIndex (i + 1), captures x.captures)
             else failure

characterSetMatcher returns a Matcher that matches a single input string character. If invert is false, the match succeeds if the character is a member of the acceptanceSet set of characters (possibly ignoring case). If invert is true, the match succeeds if the character is not a member of the acceptanceSet set of characters (possibly ignoring case).

characterMatcher(chCharacter) : Matcher = characterSetMatcher({ch}, false)

characterMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is the same as ch (possibly ignoring case).

Regular Expression Patterns

Syntax

RegularExpressionPattern  Disjunction

Semantics

action Exec[RegularExpressionPattern] : REInput  Integer  REResult

Exec[RegularExpressionPattern  Disjunction]
  = let matchMatcher = GenMatcher[Disjunction](0)
     in function(tREInputindexInteger)
             match(
                 t,
                 endIndex indexcaptures fillCapture(CountParens[Disjunction]),
                 successContinuation)

successContinuation(xREMatch) : REResult = success x

fillCapture(iInteger) : Capture[]
  = if i = 0
     then []Capture
     else fillCapture(i - 1)  [absent]

Disjunctions

Syntax

Disjunction 
   Alternative
|  Alternative | Disjunction

Semantics

action GenMatcher[Disjunction] : MatcherGenerator

GenMatcher[Disjunction  Alternative] = GenMatcher[Alternative]

GenMatcher[Disjunction  Alternative | Disjunction1](parenIndexInteger)
  = let match1Matcher = GenMatcher[Alternative](parenIndex);
         match2Matcher = GenMatcher[Disjunction1](parenIndex + CountParens[Alternative])
     in function(tREInputxREMatchcContinuation)
             case match1(txcof
                success(yREMatch): success y;
                failurematch2(txc)
                end

action CountParens[Disjunction] : Integer

CountParens[Disjunction  Alternative] = CountParens[Alternative]

CountParens[Disjunction  Alternative | Disjunction1]
  = CountParens[Alternative] + CountParens[Disjunction1]

Alternatives

Syntax

Alternative 
   «empty»
|  Alternative Term

Semantics

action GenMatcher[Alternative] : MatcherGenerator

GenMatcher[Alternative  «empty»](parenIndexInteger)
  = function(tREInputxREMatchcContinuation)
         c(x)

GenMatcher[Alternative  Alternative1 Term](parenIndexInteger)
  = let match1Matcher = GenMatcher[Alternative1](parenIndex);
         match2Matcher = GenMatcher[Term](parenIndex + CountParens[Alternative1])
     in function(tREInputxREMatchcContinuation)
             let dContinuation
                     = function(yREMatch)
                            match2(tyc)
             in match1(txd)

action CountParens[Alternative] : Integer

CountParens[Alternative  «empty»] = 0

CountParens[Alternative  Alternative1 Term]
  = CountParens[Alternative1] + CountParens[Term]

Terms

Syntax

Term 
   Assertion
|  Atom
|  Atom Quantifier

Semantics

action GenMatcher[Term] : MatcherGenerator

GenMatcher[Term  Assertion](parenIndexInteger)
  = function(tREInputxREMatchcContinuation)
         if TestAssertion[Assertion](tx)
         then c(x)
         else failure

GenMatcher[Term  Atom] = GenMatcher[Atom]

GenMatcher[Term  Atom Quantifier](parenIndexInteger)
  = let matchMatcher = GenMatcher[Atom](parenIndex);
         minInteger = Minimum[Quantifier];
         maxLimit = Maximum[Quantifier];
         greedyBoolean = Greedy[Quantifier]
     in if 
             (case max of
                finite(mInteger): m < min;
                infinitefalse
                end)
         then throw syntaxError
         else repeatMatcher(matchminmaxgreedyparenIndexCountParens[Atom])

action CountParens[Term] : Integer

CountParens[Term  Assertion] = 0

CountParens[Term  Atom] = CountParens[Atom]

CountParens[Term  Atom Quantifier] = CountParens[Atom]

Syntax

Quantifier 
   QuantifierPrefix
|  QuantifierPrefix ?
QuantifierPrefix 
   *
|  +
|  ?
|  { DecimalDigits }
|  { DecimalDigits , }
|  { DecimalDigits , DecimalDigits }
DecimalDigits 
   DecimalDigit
|  DecimalDigits DecimalDigit
DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

type Limit = oneof {finiteIntegerinfinite}

resetParens(xREMatchpIntegernParensInteger) : REMatch
  = if nParens = 0
     then x
     else let yREMatch = endIndex x.endIndexcaptures x.captures[p  absent]
           in resetParens(yp + 1, nParens - 1)

repeatMatcher(bodyMatcherminIntegermaxLimitgreedyBooleanparenIndexIntegernBodyParensInteger)
  : Matcher
  = function(tREInputxREMatchcContinuation)
         if 
             (case max of
                finite(mInteger): m = 0;
                infinitefalse
                end)
         then c(x)
         else let dContinuation
                       = function(yREMatch)
                              if min = 0 and y.endIndex = x.endIndex
                              then failure
                              else let newMinInteger
                                            = if min = 0
                                               then 0
                                               else min - 1;
                                        newMaxLimit
                                            = case max of
                                                  finite(mInteger): finite (m - 1);
                                                  infiniteinfinite
                                                  end
                                    in repeatMatcher(
                                            body,
                                            newMin,
                                            newMax,
                                            greedy,
                                            parenIndex,
                                            nBodyParens)(tyc);
                   xrREMatch = resetParens(xparenIndexnBodyParens)
               in if min  0
                   then body(txrd)
                   else if greedy
                   then case body(txrdof
                             success(zREMatch): success z;
                             failurec(x)
                             end
                   else case c(xof
                            success(zREMatch): success z;
                            failurebody(txrd)
                            end

action Minimum[Quantifier] : Integer

Minimum[Quantifier  QuantifierPrefix] = Minimum[QuantifierPrefix]

Minimum[Quantifier  QuantifierPrefix ?] = Minimum[QuantifierPrefix]

action Maximum[Quantifier] : Limit

Maximum[Quantifier  QuantifierPrefix] = Maximum[QuantifierPrefix]

Maximum[Quantifier  QuantifierPrefix ?] = Maximum[QuantifierPrefix]

action Greedy[Quantifier] : Boolean

Greedy[Quantifier  QuantifierPrefix] = true

Greedy[Quantifier  QuantifierPrefix ?] = false

action Minimum[QuantifierPrefix] : Integer

Minimum[QuantifierPrefix  *] = 0

Minimum[QuantifierPrefix  +] = 1

Minimum[QuantifierPrefix  ?] = 0

Minimum[QuantifierPrefix  { DecimalDigits }] = IntegerValue[DecimalDigits]

Minimum[QuantifierPrefix  { DecimalDigits , }] = IntegerValue[DecimalDigits]

Minimum[QuantifierPrefix  { DecimalDigits1 , DecimalDigits2 }]
  = IntegerValue[DecimalDigits1]

action Maximum[QuantifierPrefix] : Limit

Maximum[QuantifierPrefix  *] = infinite

Maximum[QuantifierPrefix  +] = infinite

Maximum[QuantifierPrefix  ?] = finite 1

Maximum[QuantifierPrefix  { DecimalDigits }] = finite IntegerValue[DecimalDigits]

Maximum[QuantifierPrefix  { DecimalDigits , }] = infinite

Maximum[QuantifierPrefix  { DecimalDigits1 , DecimalDigits2 }]
  = finite IntegerValue[DecimalDigits2]

action IntegerValue[DecimalDigits] : Integer

IntegerValue[DecimalDigits  DecimalDigit] = DecimalValue[DecimalDigit]

IntegerValue[DecimalDigits  DecimalDigits1 DecimalDigit]
  = 10*IntegerValue[DecimalDigits1] + DecimalValue[DecimalDigit]

action DecimalValue[DecimalDigit] : Integer = digitValue(DecimalDigit)

Assertions

Syntax

Assertion 
   ^
|  $
|  \ b
|  \ B

Semantics

action TestAssertion[Assertion] : REInput  REMatch  Boolean

TestAssertion[Assertion  ^](tREInputxREMatch)
  = if x.endIndex = 0
     then true
     else t.multiline and t.str[x.endIndex - 1]  lineTerminators

TestAssertion[Assertion  $](tREInputxREMatch)
  = if x.endIndex = |t.str|
     then true
     else t.multiline and t.str[x.endIndex lineTerminators

TestAssertion[Assertion  \ b](tREInputxREMatch)
  = atWordBoundary(x.endIndext.str)

TestAssertion[Assertion  \ B](tREInputxREMatch)
  = not atWordBoundary(x.endIndext.str)

atWordBoundary(iIntegersString) : Boolean = inWord(i - 1, sxor inWord(is)

inWord(iIntegersString) : Boolean
  = if i = -1 or i = |s|
     then false
     else s[i reWordCharacters

Atoms

Syntax

Atom 
   PatternCharacter
|  .
|  \ AtomEscape
|  CharacterClass
|  ( Disjunction )
|  ( ? : Disjunction )
|  ( ? = Disjunction )
|  ( ? ! Disjunction )
PatternCharacter  UnicodeCharacter except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |

Semantics

action GenMatcher[Atom] : MatcherGenerator

GenMatcher[Atom  PatternCharacter](parenIndexInteger)
  = characterMatcher(PatternCharacter)

GenMatcher[Atom  .](parenIndexInteger) = characterSetMatcher(lineTerminatorstrue)

GenMatcher[Atom  \ AtomEscape] = GenMatcher[AtomEscape]

GenMatcher[Atom  CharacterClass](parenIndexInteger)
  = let a{Character} = AcceptanceSet[CharacterClass]
     in characterSetMatcher(aInvert[CharacterClass])

GenMatcher[Atom  ( Disjunction )](parenIndexInteger)
  = let matchMatcher = GenMatcher[Disjunction](parenIndex + 1)
     in function(tREInputxREMatchcContinuation)
             let dContinuation
                     = function(yREMatch)
                            let updatedCapturesCapture[]
                                    = y.captures[parenIndex 
                                           present t.str[x.endIndex ... y.endIndex - 1]]
                            in c(endIndex y.endIndexcaptures updatedCaptures)
             in match(txd)

GenMatcher[Atom  ( ? : Disjunction )] = GenMatcher[Disjunction]

GenMatcher[Atom  ( ? = Disjunction )](parenIndexInteger)
  = let matchMatcher = GenMatcher[Disjunction](parenIndex)
     in function(tREInputxREMatchcContinuation)
             case match(txsuccessContinuationof
                success(yREMatch): c(endIndex x.endIndexcaptures y.captures);
                failurefailure
                end

GenMatcher[Atom  ( ? ! Disjunction )](parenIndexInteger)
  = let matchMatcher = GenMatcher[Disjunction](parenIndex)
     in function(tREInputxREMatchcContinuation)
             case match(txsuccessContinuationof
                success(yREMatch): failure;
                failurec(x)
                end

action CountParens[Atom] : Integer

CountParens[Atom  PatternCharacter] = 0

CountParens[Atom  .] = 0

CountParens[Atom  \ AtomEscape] = 0

CountParens[Atom  CharacterClass] = 0

CountParens[Atom  ( Disjunction )] = CountParens[Disjunction] + 1

CountParens[Atom  ( ? : Disjunction )] = CountParens[Disjunction]

CountParens[Atom  ( ? = Disjunction )] = CountParens[Disjunction]

CountParens[Atom  ( ? ! Disjunction )] = CountParens[Disjunction]

Escapes

Syntax

AtomEscape 
   DecimalEscape
|  CharacterEscape
|  CharacterClassEscape

Semantics

action GenMatcher[AtomEscape] : MatcherGenerator

GenMatcher[AtomEscape  DecimalEscape](parenIndexInteger)
  = let nInteger = EscapeValue[DecimalEscape]
     in if n = 0
         then characterMatcher(‘«NUL»’)
         else if n > parenIndex
         then throw syntaxError
         else backreferenceMatcher(n)

GenMatcher[AtomEscape  CharacterEscape](parenIndexInteger)
  = characterMatcher(CharacterValue[CharacterEscape])

GenMatcher[AtomEscape  CharacterClassEscape](parenIndexInteger)
  = characterSetMatcher(AcceptanceSet[CharacterClassEscape], false)

backreferenceMatcher(nInteger) : Matcher
  = function(tREInputxREMatchcContinuation)
         case nthBackreference(xnof
            present(refString):
                  let iInteger = x.endIndex;
                      sString = t.str
                  in let jInteger = i + |ref|
                  in if j > |s|
                      then failure
                      else if s[i ... j - 1] = ref
                      then c(endIndex jcaptures x.captures)
                      else failure;
            absentc(x)
            end

nthBackreference(xREMatchnInteger) : Capture = x.captures[n - 1]

Syntax

CharacterEscape 
   ControlEscape
|  c ControlLetter
|  HexEscape
|  IdentityEscape
ControlLetter 
   A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|  a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
IdentityEscape  UnicodeCharacter except UnicodeAlphanumeric
ControlEscape 
   f
|  n
|  r
|  t
|  v

Semantics

action CharacterValue[CharacterEscape] : Character

CharacterValue[CharacterEscape  ControlEscape] = CharacterValue[ControlEscape]

CharacterValue[CharacterEscape  c ControlLetter]
  = codeToCharacter(bitwiseAnd(characterToCode(ControlLetter), 31))

CharacterValue[CharacterEscape  HexEscape] = CharacterValue[HexEscape]

CharacterValue[CharacterEscape  IdentityEscape] = IdentityEscape

action CharacterValue[ControlEscape] : Character

CharacterValue[ControlEscape  f] = ‘«FF»

CharacterValue[ControlEscape  n] = ‘«LF»

CharacterValue[ControlEscape  r] = ‘«CR»

CharacterValue[ControlEscape  t] = ‘«TAB»

CharacterValue[ControlEscape  v] = ‘«VT»

Decimal Escapes

Syntax

DecimalEscape  DecimalIntegerLiteral [lookahead{DecimalDigit}]
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits DecimalDigit
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

action EscapeValue[DecimalEscape] : Integer

EscapeValue[DecimalEscape  DecimalIntegerLiteral [lookahead{DecimalDigit}]]
  = IntegerValue[DecimalIntegerLiteral]

action IntegerValue[DecimalIntegerLiteral] : Integer

IntegerValue[DecimalIntegerLiteral  0] = 0

IntegerValue[DecimalIntegerLiteral  NonZeroDecimalDigits]
  = IntegerValue[NonZeroDecimalDigits]

action IntegerValue[NonZeroDecimalDigits] : Integer

IntegerValue[NonZeroDecimalDigits  NonZeroDigit] = DecimalValue[NonZeroDigit]

IntegerValue[NonZeroDecimalDigits  NonZeroDecimalDigits1 DecimalDigit]
  = 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[DecimalDigit]

action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)

Hexadecimal Escapes

Syntax

HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

action CharacterValue[HexEscape] : Character

CharacterValue[HexEscape  x HexDigit1 HexDigit2]
  = codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])

CharacterValue[HexEscape  u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
  = codeToCharacter(
         4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
         HexValue[HexDigit4])

action HexValue[HexDigit] : Integer = digitValue(HexDigit)

Character Class Escapes

Syntax

CharacterClassEscape 
   s
|  S
|  d
|  D
|  w
|  W

Semantics

action AcceptanceSet[CharacterClassEscape] : {Character}

AcceptanceSet[CharacterClassEscape  s] = reWhitespaces

AcceptanceSet[CharacterClassEscape  S] = {‘«NUL»’ ... ‘«uFFFF»’} - reWhitespaces

AcceptanceSet[CharacterClassEscape  d] = reDigits

AcceptanceSet[CharacterClassEscape  D] = {‘«NUL»’ ... ‘«uFFFF»’} - reDigits

AcceptanceSet[CharacterClassEscape  w] = reWordCharacters

AcceptanceSet[CharacterClassEscape  W] = {‘«NUL»’ ... ‘«uFFFF»’} - reWordCharacters

User-Specified Character Classes

Syntax

CharacterClass 
   [ [lookahead{^}] ClassRanges ]
|  [ ^ ClassRanges ]
ClassRanges 
   «empty»
|  NonemptyClassRangesdash
d  {dashnoDash}
NonemptyClassRangesd 
   ClassAtomdash
|  ClassAtomd NonemptyClassRangesnoDash
|  ClassAtomd - ClassAtomdash ClassRanges

Semantics

action AcceptanceSet[CharacterClass] : {Character}

AcceptanceSet[CharacterClass  [ [lookahead{^}] ClassRanges ]]
  = AcceptanceSet[ClassRanges]

AcceptanceSet[CharacterClass  [ ^ ClassRanges ]] = AcceptanceSet[ClassRanges]

action Invert[CharacterClass] : Boolean

Invert[CharacterClass  [ [lookahead{^}] ClassRanges ]] = false

Invert[CharacterClass  [ ^ ClassRanges ]] = true

action AcceptanceSet[ClassRanges] : {Character}

AcceptanceSet[ClassRanges  «empty»] = {}Character

AcceptanceSet[ClassRanges  NonemptyClassRangesdash]
  = AcceptanceSet[NonemptyClassRangesdash]

action AcceptanceSet[NonemptyClassRangesd] : {Character}

AcceptanceSet[NonemptyClassRangesd  ClassAtomdash] = AcceptanceSet[ClassAtomdash]

AcceptanceSet[NonemptyClassRangesd  ClassAtomd NonemptyClassRangesnoDash1]
  = AcceptanceSet[ClassAtomd AcceptanceSet[NonemptyClassRangesnoDash1]

AcceptanceSet[NonemptyClassRangesd  ClassAtomd1 - ClassAtomdash2 ClassRanges]
  = let range{Character}
             = characterRange(AcceptanceSet[ClassAtomd1], AcceptanceSet[ClassAtomdash2])
     in range  AcceptanceSet[ClassRanges]

characterRange(low{Character}high{Character}) : {Character}
  = if |low 1 or |high 1
     then throw syntaxError
     else let lCharacter = min low;
               hCharacter = min high
           in if l  h
               then {l ... h}
               else throw syntaxError

Character Class Range Atoms

Syntax

ClassAtomd 
   ClassCharacterd
|  \ ClassEscape
ClassCharacterdash  UnicodeCharacter except \ | ]
ClassCharacternoDash  ClassCharacterdash except -
ClassEscape 
   DecimalEscape
|  b
|  CharacterEscape
|  CharacterClassEscape

Semantics

action AcceptanceSet[ClassAtomd] : {Character}

AcceptanceSet[ClassAtomd  ClassCharacterd] = {ClassCharacterd}

AcceptanceSet[ClassAtomd  \ ClassEscape] = AcceptanceSet[ClassEscape]

action AcceptanceSet[ClassEscape] : {Character}

AcceptanceSet[ClassEscape  DecimalEscape]
  = if EscapeValue[DecimalEscape] = 0
     then {‘«NUL»’}
     else throw syntaxError

AcceptanceSet[ClassEscape  b] = {‘«BS»’}

AcceptanceSet[ClassEscape  CharacterEscape] = {CharacterValue[CharacterEscape]}

AcceptanceSet[ClassEscape  CharacterClassEscape] = AcceptanceSet[CharacterClassEscape]


JavaScript 2.0
Formal Description
Parser Grammar
previousupnext

Thursday, November 11, 1999

This LALR(1) grammar describes the syntax of the JavaScript 2.0 proposal. The starting nonterminal is Program. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

Terminals

General tokens: Identifier   Number   RegularExpression   String   VirtualSemicolon

Punctuation tokens: !   !=   !==   %   %=   &   &&   &&=   &=   (   )   *   *=   +   ++   +=   ,   -   --   -=   .   ...   /   /=   :   ::   ;   <   <<   <<=   <=   =   ==   ===   >   >=   >>   >>=   >>>   >>>=   ?   @   [   ]   ^   ^=   ^^   ^^=   {   |   |=   ||   ||=   }   ~

Future punctuation tokens: #   ->

Reserved words: break   case   catch   class   const   continue   default   delete   do   else   eval   extends   false   final   finally   for   function   if   in   instanceof   new   null   package   private   public   return   super   switch   this   throw   true   try   typeof   var   while   with

Future reserved words: abstract   debugger   enum   export   goto   implements   import   interface   native   protected   static   synchronized   throws   transient   volatile

Non-reserved words: box   constructor   field   get   language   local   method   override   set   version

Expressions

b  {allowInnoIn}

Identifiers

Identifier 
   Identifier
|  box
|  constructor
|  field
|  get
|  language
|  local
|  method
|  set
|  override
|  version
QualifiedIdentifier 
   Identifier
|  QualifiedIdentifier :: Identifier
|  ParenthesizedExpression :: Identifier

Primary Expressions

PrimaryExpression 
   null
|  true
|  false
|  Number
|  Number [no line break] String
|  String
|  this
|  super
|  QualifiedIdentifier
|  ? Identifier
|  RegularExpression
|  ParenthesizedExpression
|  ParenthesizedExpression [no line break] String
|  ArrayLiteral
|  ObjectLiteral
|  FunctionExpression
ParenthesizedExpression  ( ExpressionallowIn )

Function Expressions

FunctionExpression 
   AnonymousFunction
|  NamedFunction

Object Literals

ObjectLiteral 
   { }
|  { FieldList }
FieldList 
   LiteralField
|  FieldList , LiteralField
LiteralField  FieldName : AssignmentExpressionallowIn
FieldName 
   QualifiedIdentifier
|  String
|  Number

Array Literals

ArrayLiteral  [ ElementList ]
ElementList 
   LiteralElement
|  ElementList , LiteralElement
LiteralElement 
   «empty»
|  AssignmentExpressionallowIn

Postfix Unary Operators

PostfixExpression 
   FullPostfixExpression
|  ShortNewExpression
FullPostfixExpression 
   PrimaryExpression
|  FullNewExpression
|  FullPostfixExpression MemberOperator
|  FullPostfixExpression Arguments
|  PostfixExpression [no line break] ++
|  PostfixExpression [no line break] --
FullNewExpression  new FullNewSubexpression Arguments
ShortNewExpression  new ShortNewSubexpression
FullNewSubexpression 
   PrimaryExpression
|  FullNewSubexpression MemberOperator
|  FullNewExpression
ShortNewSubexpression 
   FullNewSubexpression
|  ShortNewExpression
MemberOperator 
   [ ArgumentList ]
|  . QualifiedIdentifier
|  . ParenthesizedExpression
|  @ QualifiedIdentifier
|  @ ParenthesizedExpression
Arguments  ( ArgumentList )
ArgumentList 
   «empty»
|  ArgumentListPrefix
|  NamedArgumentListPrefix
ArgumentListPrefix 
   AssignmentExpressionallowIn
|  ArgumentListPrefix , AssignmentExpressionallowIn
NamedArgumentListPrefix 
   LiteralField
|  ArgumentListPrefix , LiteralField
|  NamedArgumentListPrefix , LiteralField

Prefix Unary Operators

UnaryExpression 
   PostfixExpression
|  delete PostfixExpression
|  typeof UnaryExpression
|  eval UnaryExpression
|  ++ PostfixExpression
|  -- PostfixExpression
|  + UnaryExpression
|  - UnaryExpression
|  ~ UnaryExpression
|  ! UnaryExpression

Multiplicative Operators

MultiplicativeExpression 
   UnaryExpression
|  MultiplicativeExpression * UnaryExpression
|  MultiplicativeExpression / UnaryExpression
|  MultiplicativeExpression % UnaryExpression

Additive Operators

AdditiveExpression 
   MultiplicativeExpression
|  AdditiveExpression + MultiplicativeExpression
|  AdditiveExpression - MultiplicativeExpression

Bitwise Shift Operators

ShiftExpression 
   AdditiveExpression
|  ShiftExpression << AdditiveExpression
|  ShiftExpression >> AdditiveExpression
|  ShiftExpression >>> AdditiveExpression

Relational Operators

RelationalExpressionallowIn 
   ShiftExpression
|  RelationalExpressionallowIn < ShiftExpression
|  RelationalExpressionallowIn > ShiftExpression
|  RelationalExpressionallowIn <= ShiftExpression
|  RelationalExpressionallowIn >= ShiftExpression
|  RelationalExpressionallowIn instanceof ShiftExpression
|  RelationalExpressionallowIn in ShiftExpression
RelationalExpressionnoIn 
   ShiftExpression
|  RelationalExpressionnoIn < ShiftExpression
|  RelationalExpressionnoIn > ShiftExpression
|  RelationalExpressionnoIn <= ShiftExpression
|  RelationalExpressionnoIn >= ShiftExpression
|  RelationalExpressionnoIn instanceof ShiftExpression

Equality Operators

EqualityExpressionb 
   RelationalExpressionb
|  EqualityExpressionb == RelationalExpressionb
|  EqualityExpressionb != RelationalExpressionb
|  EqualityExpressionb === RelationalExpressionb
|  EqualityExpressionb !== RelationalExpressionb

Binary Bitwise Operators

BitwiseAndExpressionb 
   EqualityExpressionb
|  BitwiseAndExpressionb & EqualityExpressionb
BitwiseXorExpressionb 
   BitwiseAndExpressionb
|  BitwiseXorExpressionb ^ BitwiseAndExpressionb
|  BitwiseXorExpressionb ^ *
|  BitwiseXorExpressionb ^ ?
BitwiseOrExpressionb 
   BitwiseXorExpressionb
|  BitwiseOrExpressionb | BitwiseXorExpressionb
|  BitwiseOrExpressionb | *
|  BitwiseOrExpressionb | ?

Binary Logical Operators

LogicalAndExpressionb 
   BitwiseOrExpressionb
|  LogicalAndExpressionb && BitwiseOrExpressionb
LogicalXorExpressionb 
   LogicalAndExpressionb
|  LogicalXorExpressionb ^^ LogicalAndExpressionb
LogicalOrExpressionb 
   LogicalXorExpressionb
|  LogicalOrExpressionb || LogicalXorExpressionb

Conditional Operator

ConditionalExpressionb 
   LogicalOrExpressionb
|  LogicalOrExpressionb ? AssignmentExpressionb : AssignmentExpressionb
NonAssignmentExpressionb 
   LogicalOrExpressionb
|  LogicalOrExpressionb ? NonAssignmentExpressionb : NonAssignmentExpressionb

Assignment Operators

AssignmentExpressionb 
   ConditionalExpressionb
|  PostfixExpression = AssignmentExpressionb
|  PostfixExpression CompoundAssignment AssignmentExpressionb
CompoundAssignment 
   *=
|  /=
|  %=
|  +=
|  -=
|  <<=
|  >>=
|  >>>=
|  &=
|  ^=
|  |=
|  &&=
|  ^^=
|  ||=

Expressions

Expressionb 
   AssignmentExpressionb
|  Expressionb , AssignmentExpressionb
OptionalExpression 
   ExpressionallowIn
|  «empty»

Type Expressions

TypeExpressionb  NonAssignmentExpressionb

Statements

w  {abbrevabbrevNonEmptyabbrevNoShortIffull}
TopStatementw 
   Statementw
|  LanguageDeclarationw
Statementw 
   AnnotatedDefinitionw
|  EmptyStatementw
|  ExpressionStatement Semicolonw
|  AnnotatedBlock
|  LabeledStatementw
|  IfStatementw
|  SwitchStatement
|  DoStatement Semicolonw
|  WhileStatementw
|  ForStatementw
|  WithStatementw
|  ContinueStatement Semicolonw
|  BreakStatement Semicolonw
|  ReturnStatement Semicolonw
|  ThrowStatement Semicolonw
|  TryStatement
Semicolonabbrev 
   ;
|  VirtualSemicolon
|  «empty»
SemicolonabbrevNonEmpty 
   ;
|  VirtualSemicolon
|  «empty»
SemicolonabbrevNoShortIf 
   ;
|  VirtualSemicolon
|  «empty»
Semicolonfull 
   ;
|  VirtualSemicolon

Empty Statement

EmptyStatementabbrev 
   ;
|  «empty»
EmptyStatementabbrevNonEmpty  ;
EmptyStatementabbrevNoShortIf  ;
EmptyStatementfull  ;

Expression Statement

ExpressionStatement  [lookahead{function{}] ExpressionallowIn

Block

AnnotatedBlock 
   Block
|  Visibility [no line break] Block
Block  { TopStatements }
TopStatements 
   TopStatementabbrev
|  TopStatementsPrefix TopStatementabbrevNonEmpty
TopStatementsPrefix 
   TopStatementfull
|  TopStatementsPrefix TopStatementfull

Labeled Statements

LabeledStatementw  Identifier : Statementw

If Statement

IfStatementabbrev 
   if ParenthesizedExpression Statementabbrev
|  if ParenthesizedExpression StatementabbrevNoShortIf else Statementabbrev
IfStatementabbrevNonEmpty 
   if ParenthesizedExpression StatementabbrevNonEmpty
|  if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNonEmpty
IfStatementfull 
   if ParenthesizedExpression Statementfull
|  if ParenthesizedExpression StatementabbrevNoShortIf else Statementfull
IfStatementabbrevNoShortIf  if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNoShortIf

Switch Statement

SwitchStatement 
   switch ParenthesizedExpression { }
|  switch ParenthesizedExpression { CaseGroups LastCaseGroup }
CaseGroups 
   «empty»
|  CaseGroups CaseGroup
CaseGroup  CaseGuards CaseStatementsPrefix
LastCaseGroup  CaseGuards CaseStatements
CaseGuards 
   CaseGuard
|  CaseGuards CaseGuard
CaseGuard 
   case ExpressionallowIn :
|  default :
CaseStatements 
   Statementabbrev
|  CaseStatementsPrefix StatementabbrevNonEmpty
CaseStatementsPrefix 
   Statementfull
|  CaseStatementsPrefix Statementfull

Do-While Statement

DoStatement  do StatementabbrevNonEmpty while ParenthesizedExpression

While Statement

WhileStatementw  while ParenthesizedExpression Statementw

For Statements

ForStatementw 
   for ( ForInitializer ; OptionalExpression ; OptionalExpression ) Statementw
|  for ( ForInBinding in ExpressionallowIn ) Statementw
ForInitializer 
   «empty»
|  ExpressionnoIn
|  VariableDefinitionKind VariableBindingListnoIn
ForInBinding 
   PostfixExpression
|  VariableDefinitionKind VariableBindingnoIn

With Statement

WithStatementw  with ParenthesizedExpression Statementw

Continue and Break Statements

ContinueStatement  continue [no line break] OptionalLabel
BreakStatement  break [no line break] OptionalLabel
OptionalLabel 
   «empty»
|  Identifier

Return Statement

ReturnStatement  return [no line break] OptionalExpression

Throw Statement

ThrowStatement  throw [no line break] ExpressionallowIn

Try Statement

TryStatement 
   try AnnotatedBlock CatchClauses
|  try AnnotatedBlock FinallyClause
|  try AnnotatedBlock CatchClauses FinallyClause
CatchClauses 
   CatchClause
|  CatchClauses CatchClause
CatchClause  catch ( TypedIdentifierallowIn ) AnnotatedBlock
FinallyClause  finally AnnotatedBlock

Definitions

AnnotatedDefinitionw 
   Visibility [no line break] Definitionw
|  Definitionw
Definitionw 
   VariableDefinition Semicolonw
|  FunctionDefinition
|  MemberDefinitionw
|  ClassDefinition

Visibility Specifications

Visibility 
   ParenthesizedExpression
|  local
|  box
|  private
|  package
|  public
|  Identifier

Variable Definition

VariableDefinition  VariableDefinitionKind VariableBindingListallowIn
VariableDefinitionKind 
   var
|  const
VariableBindingListb 
   VariableBindingb
|  VariableBindingListb , VariableBindingb
VariableBindingb  TypedIdentifierb VariableInitializerb
TypedIdentifierb 
   Identifier
|  Identifier : TypeExpressionb
VariableInitializerb 
   «empty»
|  = AssignmentExpressionb

Function Definition

FunctionDefinition 
   NamedFunction
|  AccessorFunction
AnonymousFunction  function FunctionSignature Block
NamedFunction  function Identifier FunctionSignature Block
AccessorFunction 
   function get [no line break] Identifier FunctionSignature Block
|  function set [no line break] Identifier FunctionSignature Block
FunctionSignature  ParameterSignature ResultSignature
ParameterSignature  ( Parameters )
Parameters 
   «empty»
|  RestParameter
|  RequiredParameters
|  OptionalParameters
|  RequiredParameters , RestParameter
|  OptionalParameters , RestParameter
RequiredParameters 
   RequiredParameter
|  RequiredParameters , RequiredParameter
OptionalParameters 
   OptionalParameter
|  RequiredParameters , OptionalParameter
|  OptionalParameters , OptionalParameter
RequiredParameter  TypedIdentifierallowIn
OptionalParameter  TypedIdentifierallowIn = AssignmentExpressionallowIn
RestParameter 
   ...
|  ... TypedIdentifierallowIn
|  ... TypedIdentifierallowIn = AssignmentExpressionallowIn
ResultSignature 
   «empty»
|  : TypeExpressionallowIn

Class Member Definitions

MemberDefinitionw 
   FieldDefinition Semicolonw
|  MethodDefinitionw
|  ConstructorDefinition
FieldDefinition  field [no line break] VariableBindingListallowIn
MethodDefinitionw 
   ConcreteMethodDefinition
|  AbstractMethodDefinitionw
ConcreteMethodDefinition  MethodPrefix [no line break] MethodName FunctionSignature Block
AbstractMethodDefinitionw  MethodPrefix [no line break] MethodName FunctionSignature Semicolonw
MethodPrefix 
   method
|  override [no line break] method
|  final [no line break] method
|  final [no line break] override [no line break] method
MethodName 
   Identifier
|  get [no line break] Identifier
|  set [no line break] Identifier
ConstructorDefinition  constructor [no line break] ConstructorName ParameterSignature Block
ConstructorName 
   new
|  Identifier

Class Definition

ClassDefinition 
   class Identifier Superclasses Block
|  class extends TypeExpressionallowIn Block
Superclasses 
   «empty»
|  extends TypeExpressionallowIn

Language Declaration

LanguageDeclarationw  language LanguageId LanguageIdList LanguageAlternatives LanguageSemicolonw
LanguageSemicolonabbrev 
   ;
|  «empty»
LanguageSemicolonabbrevNonEmpty 
   ;
|  «empty»
LanguageSemicolonfull  ;
LanguageAlternatives 
   «empty»
|  LanguageAlternatives | LanguageIdList
LanguageIdList 
   «empty»
|  LanguageIdList LanguageId
LanguageId 
   Identifier
|  Number

Programs

Program  TopStatements

JavaScript 2.0
Rationale
previousupnext

Thursday, November 11, 1999


This chapter discusses the decisions made in desigining JavaScript 2.0. Rationales are presented together with descriptions of other alternatives that were/are considered. Currently outstanding issues are in red.


JavaScript 2.0
Rationale
Syntax
previousupnext

Thursday, November 11, 1999

Semicolon Insertion

Definitions

The term semicolon insertion informally refers to the ability to write programs while omitting semicolons between statements. In both JavaScript 1.5 and JavaScript 2.0 there are two kinds of semicolon insertion:

Grammatical Semicolon Insertion
Semicolons before a closing } and the end of the program are optional in both JavaScript 1.5 and 2.0. In addition, the JavaScript 2.0 parser allows semicolons to be omitted before the else of an if-else statement and before the while of a do-while statement.
Line-Break Semicolon Insertion
If the first through the nth tokens of a JavaScript program form are grammatically valid but the first through the n+1st tokens are not and there is a line break between the nth tokens and the n+1st tokens, then the parser tries to parse the program again after inserting a VirtualSemicolon token between the nth and the n+1st tokens.

Grammatical semicolon insertion is implemented directly by the parser grammar's productions, which simply do not require a semicolon in the aforementioned cases. Line breaks in the source code are not relevant to grammatical semicolon insertion.

Line-break semicolon insertion cannot be easily implemented in the parser's grammar. This kind of semicolon insertion turns a syntactically incorrect program into a correct program and relies on line breaks in the source code.

Discussion

Grammatical semicolon insertion is harmless. On the other hand, line-break semicolon insertion suffers from the following problems:

  1. Line breaks are relevant in the program's source code
  2. The consequences of this kind of semicolon insertion appear inconsistent to users
  3. Existing program behavior can change unexpectedly when new syntax is introduced

The first problem presents difficulty for some preprocessors such as the one for XML attributes which turn line breaks into spaces. The second and third ones are more serious. Users are confused when they discover that the program

a = b + c
(d + e).print()

doesn't do what they expect:

a = b + c;
(d + e).print();

Instead, that program is parsed as:

a = b + c(d + e).print();

The third problem is the most serious. New features are added to the language turn illegal syntax into legal syntax. If an existing program relies on the illegal syntax to trigger line-break semicolon insertion, then the program will silently change behavior once the feature is added. For example, the juxtaposition of a numeric literal followed by a string literal (such as 4 "in") is illegal in JavaScript 1.5. JavaScript 2.0 makes this legal syntax for expressions with units. This syntax extension has the unfortunate consequence of silently changing the meaning of the following JavaScript 1.5 program:

a = b + 4
"in".print()

from:

a = b + 4;
"in".print();

to:

a = b + 4"in".print();

JavaScript 2.0 gets around this incompatibility by adding a [no line break] restriction in the grammar that requires the numeric and string literals to be on the same line. Unfortunately, this compatibility is a double-edged sword. Due to JavaScript 1.5 compatibility, JavaScript 2.0 has to have a large number of these [no line break] restrictions. It is hard to remember all of them, and forgetting one of them often silently causes a JavaScript 2.0 program to be reinterpreted. Users will be dismayed to find that:

local
  function f(x) {return x*x}

turns into:

local;
function f(x) {return x*x}

(where local; is an expression statement) instead of:

local function f(x) {return x*x}

An earlier version of JavaScript 2.0 disallowed line-break semicolon insertion. The current version allows it but only in non-strict mode. Strict mode removes all [no line break] restrictions, simplifying the language again. As a side effect, it is possible to write a program that does different things in strict and non-strict modes (the last example above is one such program), but this is the price to pay to achieve simplicity.

Regular Expression Literals

JavaScript 2.0 retains compatibility with JavaScript 1.5 by adopting the same rules for detecting regular expression literals. This complicates the design of programs such as syntax-directed text editors and machine scanners because it makes it impossible to find all of the tokens in a JavaScript program without parsing the program.

Making JavaScript 2.0's lexical grammar independent of its syntactic grammar significantly would have allowed tools to easily process a JavaScript program and escape all instances of, say, </ to properly embed a JavaScript 2.0 or later program in an HTML page. The full parser changes for each version of JavaScript. To illustrate the difficulties, compare such JavaScript 1.5 gems as:

for (var x = a in foo && "</x>" || mot ? z:/x:3;x<5;y</g/i) {xyz(x++);}
for (var x = a in foo && "</x>" || mot ? z/x:3;x<5;y</g/i) {xyz(x++);}

Alternate Regular Expression Syntax

One idea explored early in the design of JavaScript 2.0 was providing an alternate, unambiguous syntax for regular expressions and encouraging the use of the new syntax. A RegularExpression could have been specified unambiguously using « and » as its opening and closing delimiters instead of / and /. For example, «3*» would be a regular expression that matches zero or more 3's. Such a regular expression could be empty: «» is a regular expression that matches only the empty string, while // starts a comment. To write such a regular expression using the slash syntax one needs to write /(?:)/.

Syntactic Resynchronization

Syntactic resynchronization occurs when the lexer needs to find the end of a block (the matching }) in order to skip a portion of a program written in a future version of JavaScript. Ordinarily this would not be a problem, but regular expressions complicate matters because they make lexing dependent on parsing. The rules for recognizing regular expression literals must be changed for those portions of the program. The rule below might work, or a simplified parse might be performed on the input to determine the locations of regular expressions. This is an area that needs further work.

During syntax resynchronization JavaScript 2.0 determines whether a / starts a regular expression or is a division (or /=) operator solely based on the previous token:

/ interpretation Previous token
/ or /=   Identifier   Number   RegularExpression   String
)   ++   --   ]   }
false   null   super   this   true
constructor   getter   method   override   setter   traditional   version
Any other punctuation
RegularExpression   !   !=   !==   #   %   %=   &   &&   &&=   &=   (   *   *=   +   +=   ,   -   -=   ->   .   ..   ...   /   /=   :   ::   ;   <   <<   <<=   <=   =   ==   ===   >   >=   >>   >>=   >>>   >>>=   ?   @   [   ^   ^=   ^^   ^^=   {   |   |=   ||   ||=   ~
abstract   break   case   catch   class   const   continue   debugger   default   delete   do   else   enum   eval   export   extends   field   final   finally   for   function   goto   if   implements   import   in   instanceof   native   new   package   private   protected   public   return   static   switch   synchronized   throw   throws   transient   try   typeof   var   volatile   while   with

Regardless of the previous token, // is interpreted as the beginning of a comment.

The only controversial choices are ) and }. A / after either a ) or } token can be either a division symbol (if the ) or } closes a subexpression or an object literal) or a regular expression token (if the ) or } closes a preceding statement or an if, while, or for expression). Having / be interpreted as a RegularExpression in expressions such as (x+y)/2 would be problematic, so it is interpreted as a division operator after ) or }. If one wants to place a regular expression literal at the very beginning of an expression statement, it's best to put the regular expression in parentheses. Fortunately, this is not common since one usually assigns the result of the regular expression operation to a variable.

Language Declarations

An alternative to language declarations that was considered early was to report syntax errors at the time the relevant statement was executed rather than at the time it was parsed. This way a single program could include parts written in a future version of JavaScript without getting an error unless it tries to execute those portions on a system that does not understand that version of JavaScript. If a program part that contains an error is never executed, the error never breaks the script. For example, the following function finishes successfully if whizBangFeature is false:

function move(integer x, integer y, integer d) {
  x += 10;
  y += 3;
  if (whizBangFeature) {
    simulate{@x and #y} along path
  } else {
    x += d; y += d;
  }
  return [x,y];
}

The code simulate{@x and #y} along path is a syntax error, but this error does not break the script unless the script attempts to execute that piece of code.

One problem with this approach is that it frustrates debugging; a script author benefits from knowing about syntax errors at compile time rather than at run time.


JavaScript 2.0
Rationale
Execution Model
previousupnext

Thursday, November 11, 1999

Introduction

When does a declaration (of a value, function, type, class, method, pragma, etc.) take effect? When are expressions evaluated? The answers to these questions distinguish among major kinds of programming languages. Let's consider the following function definition in a language with C++ or Java-like syntax:

gadget f(widget x) {
  if ((gizmo)(x) != null)
    return (gizmo)(x);
  return x.owner;
}

In a static language such as Java or C++, all type expressions are evaluated at compile time. Thus, in this example widget and gadget would be evaluated at compile time. If gizmo were a type, then it too would be evaluated at compile time ((gizmo)(x) would become a type cast). Note that we must be able to statically distinguish identifiers used for variables from identifiers used for types so we can decide whether (gizmo)(x) is a one-argument function call (in which case gizmo would be evaluated at run time) or a type cast (in which case gizmo would be evaluated at compile time). In most cases, in a static language a declaration is visible throughout its enclosing scope, although there are exceptions that have been deemed too complicated for a compiler to handle such as the following C++:

typedef int *x;

class foo {
  typedef x *y;
  typedef char *x;
}

Many dynamic languages can construct, evaluate, and manipulate type expressions at run time. Some dynamic languages (such as Common Lisp) distinguish between compile time and run time and provide constructs (eval-when) to evaluate expressions early. The simplest dynamic languages (such as Scheme) process input in a single pass and do not distinguish between compile time and run time. If we evaluated the above function in such a simple language, widget and gadget would be evaluated at the time the function is called.

Challenges

JavaScript is a scripting language. Many programmers wish to write JavaScript scripts embedded in web pages that work in a variety of environments. Some of these environments may provide libraries that a script would like to use, while on other environments the script may have to emulate those libraries. Let's take a look at an example of something one would expect to be able to easily do in a scripting language:

Bob is writing a script for a web page that wants to take advantage of an optional package MacPack that is present on some environments (Macintoshes) but not on others. MacPack provides a class HyperWindoid from which Bob wants to subclass his own class BobWindoid. On other platforms Bob has to define an emulation class BobWindoid' that is implemented differently from BobWindoid -- it has a different set of private methods and fields. There also is a class WindoidGuide in Bob's package; the code and method signatures of classes BobWindoid and BobWindoid' refer to objects of type WindoidGuide, and class WindoidGuide's code refers to objects of type BobWindoid (or BobWindoid' as appropriate).

Were JavaScript to use a dynamic execution model (described below), declarations take effect only when executed, and Bob can implement his package as shown below. The package keyword in front of both definitions of class BobWindoid lifts these definitions from the local if scope to the top level of Bob's package.

class WindoidGuide; // forward declaration

if (onMac()) {
  import "MacPack";

  package class BobWindoid extends HyperWindoid {
    private field x;
    field g:WindoidGuide;

    private method speck() {...};
    public method zoom(a:WindoidGuide, uncle:HyperWindoid = null):WindoidGuide {...};
  }
} else {
  // emulation class BobWindoid'
  package class BobWindoid {
    private field i:integer, j:integer;
    field g:WindoidGuide;

    private method advertise(h:WindoidGuide):WindoidGuide {...};
    private method subscribe(h:WindoidGuide):WindoidGuide {...};
    public method zoom(a:WindoidGuide):WindoidGuide {...};
  }
}

class WindoidGuide {
  field currentWindoid:BobWindoid;

  method introduce(arg:BobWindoid):BobWindoid {...};
}

On the other hand, if the language were static (meaning that types are compile-time expressions), Bob would run into problems. How could he declare the two alternatives for the class BobWindoid?

Bob's first thought was to split his package into three HTML SCRIPT tags (containing BobWindoid, BobWindoid', and WindoidGuide) and turn one of the first two off depending on the platform. Unfortunately this doesn't work because he gets type errors if he separates the definition of class BobWindoid (or BobWindoid') from the definition of WindoidGuide because these classes mutually refer to each other. Furthermore, Bob would like to share the script among many pages, so he'd like to have the entire script in a single BobUtilities.js file.

Note that this problem would be newly introduced by JavaScript 2.0 if it were to evaluate type expressions at compile time. JavaScript 1.5 does not suffer from this problem because it does not have a concept of evaluating an expression at compile time, and it is relatively easy to conditionally define a class (which is merely a function) by declaring a single global variable g and conditionally assigning either one or another anonymous function to it.

There exist other alternatives in between the dynamic execution model and the static model that also solve Bob's problem. One of them is described at the end of this chapter.

Dynamic Execution Model

In a pure dynamic execution model the entire program is processed in one pass. Declarations take effect only when they are executed. A declaration that is never executed is ignored. Scheme follows this model, as did early versions of Visual Basic.

The dynamic execution model considerably simplifies the language and allows an interpreter to treat programs read from a file identically to programs typed in via an interactive console. Also, a dynamic execution model interpreter or just-in-time compiler may start to execute a script even before it has finished downloading all of it.

One of the most significant advantages of the dynamic execution model is that it allows JavaScript 2.0 scripts to turn parts of themselves on and off based on dynamically obtained information. For example, a script or library could define additional functions and classes if it runs on an environment that supports CSS unit arithmetic while still working on environments that do not.

The dynamic execution model requires identifiers naming functions and variables to be defined before they are used. A use occurs when an identifier is read, written, or called, at which point that identifier is resolved to a variable or a function according to the scoping rules. A reference from within a control statement such as if and while located outside a function is resolved only when execution reaches the reference. References from within the body of a function are resolved only after the function is called; for efficiency, an implementation is allowed to resolve all references within a function or method that does not contain eval at the first time the function is called.

According to these rules, the following program is correct and would print 7:

function f(a:integer):integer {
  return a+b;
}

var b:integer = 4;
print(f(3));

Assuming that variable b is predefined by the host if featurePresent is true, this program would also work:

function f(a:integer):integer {
  return a+b;
}

if (!featurePresent) {
  package var b:integer = 4;
}

print(f(3));

On the other hand, the following program would produce an error because f is referenced before it is defined:

print(f(3));

function f(a:integer):integer {
  return a*2;
}

Defining mutually recursive functions is not a problem as long as one defines all of them before calling them.

Hybrid Execution Model

JavaScript 1.5 does not follow the pure dynamic execution model, and, for reasons of compatibility, JavaScript 2.0 strays from that model as well, adopting a hybrid execution model instead. Specifically, JavaScript 2.0 inherits the following static execution model aspects from JavaScript 1.5:

In addition to the above, the evaluation of class declarations has special provisions for delayed evaluation to allow mutually-referencing classes.

The second condition above allows the following program to work in JavaScript 2.0:

const b:string = "Bee";

function square(a:integer):integer {
  b = a;   // Refers to local b defined below, not global b
  return b*a;
  var b:integer;
}

While allowed, using variables ahead of declaring them, such as in the above example, is considered bad style and may generate a warning.

The third condition above makes the last example from the pure execution model section work:

print(f(3));

function f(a:integer):integer {
  return a*2;
}

Again, actually calling a function at the top level before declaring it is considered bad style and may generate a warning. It also will not work with classes.

Discussion

Compiling The Dynamic Execution Model

Perhaps the easiest way to compile a script under the dynamic execution model is to accumulate function definitions unprocessed and compile them only when they are first called. Many JITs do this anyway because this lets them avoid the overhead of compiling functions that are never called. This process does not impose any more of an overhead than the static model would because under the static model the compiler would need to either scan the source code twice or save all of it unprocessed during the first pass for processing in the second pass.

Compiling a dynamic execution model script off-line also does not present special difficulties as long as eval is restricted to not introduce additional declarations that shadow existing ones (if eval is allowed to do this, it would present problems for any execution model, including the static one). Under the dynamic execution model, once the compiler has reached the end of a scope it can assume that that scope is complete; at that point all identifiers inside that scope can be resolved to the same extent that they would be in the static model.

Conditional Compilation Alternative

Bob's problem could also be solved by using conditional compilation similar in spirit to C's preprocessor. If we do this, we have to ask about how expressive the conditional compilation meta-language should be. C's preprocessor is too weak. In JavaScript applications we'd often find that we need the full power of JavaScript so that we can inspect the DOM, the environment, etc. when deciding how to control compilation. Besides, using JavaScript as the meta-language would reduce the number of languages that a programmer would have to learn.

Here's one sketch of how this could be done:

Note that because variable initializers are not evaluated at compile time, one has to use #var a = int rather than var a = int to define an alias a for a type name int.

This sketch does not address many issues that would have to be resolved, such as how typed variables are handled after they are declared but before they are initialized (this problem doesn't arise in the dynamic execution model), how the lexical scopes of the run time pass would interact with scoping of the compile time pass, etc.

Comparing the Dynamic Execution Model with Conditional Compilation

Both approaches solve Bob's problem, but they differ in other areas. In the sequel "conditional compilation" refers to the conditional compilation alternative described above.


JavaScript 2.0
Rationale
Member Lookup
previousupnext

Thursday, November 11, 1999

Introduction

There have been much discussion in the TC39 subgroup about the meaning of a member lookup operation. Numerous considerations intersect here.

We will express a general unqualified member lookup operation as a.b, where a is an expression and b is an identifier. We will also consider qualified member lookup operations and write them as a.n::b, where n is an expression that evaluates to some namespace. In almost all cases we will be interested in the dynamic type Td of a. In one scheme we will also consider the static type Ts of the expression a. If the language is sound, we will always have Td  Ts.

In the simplest approach, we treat an object as merely an association table of member names and member values. In this interpretation we simply look inside object a and check if there is a member named b. If there is, we return the member's value; if not, we return undefined or signal an error.

There are a number of difficulties with this simple approach, and most object-oriented languages have not adopted it:

Once we allow private or package-protected members, we must allow for the possibility that object a will have more than one member named b -- abstraction considerations require that users of a class C not be aware of expose C's private members, so, in particular, a user should be able to create a subclass D of C and add members to D without knowing the names of C's private members. Both C++ and Java allow this. We must also allow for the possibility that object a will have a member named b but we are not allowed to access it. We will assume that access control is specified by lexical scoping, as is traditional in modern languages.

Desirable Criteria

Some of the criteria we would like the member lookup model to satisfy are:

  1. Safety. The lookup does not permit access to a private member outside the class where the member is defined, nor does it allow access to a package member outside the package where the member is defined. Furthermore, if a class C accesses its private member m, a hostile subclass D of C cannot silently substitute a member m' that would masquerade as m inside C's code.
  2. Abstraction. private and package package are invisible outside their respective classes or packages. For programming in the large, a class can provide several public versions to its importers, and public members of more recent versions are invisible to importers of older versions. This is needed to provide robust libraries.
  3. Robustness. We can make any of the following program changes without having to restructure the program:
    1. Add valid type annotations to variables and functions.
    2. Change a member's visibility to private, package, or public, assuming, of course, that that member is not used outside its new visibility.
    3. Split a complicated expression statement e into several statements that compute subexpressions of e, store them in local variables, and then combine them to compute e. We should be able to do this without intimate knowledge of what e does or calls.
    4. Rename a member to a different name, assuming, of course, that the new name does not cause conflicts and that we fix up all references to that member.
  4. Namespace independence. If one class C has a member named m, this should not place restrictions on an unrelated class D having an unrelated member with the same name m.
  5. Compatibility. A JavaScript 2.0 class should be usable from JavaScript 1.5 code and JavaScript 1.5 code minimally upgraded to JavaScript 2.0 without having to restructure the latter code. Achieving compatibility should not require the JavaScript 2.0 class itself to be restructured or give up any of the other desirable criteria. Code without type annotations works as expected.

Lookup Models

There are three main competing models for performing a general unqualified member lookup operation as a.b. Let S be the set of members named b of the object obtained by evaluating expression a (hereafter shortened to just "object a") that are accessible via the visibility rules applied in the lexical scope where a.b is evaluated. All three models pick some member s  S. Clearly, if the set S is empty, then the member lookup fails. In addition, the Spice and pure Static models may sometimes deliberately fail even when set S is not empty. Except for such deliberate failures, if the set S contains only one member s, all three models return that element s. If the set S contains multiple members, the three models will likely choose different members.

Another interesting (and useful) tidbit is that the Static and Dynamic models always agree on the interpretation of member lookup operations of the form this.b. All three models agree on on the interpretation of member lookup operations of the form this.b in the case where b is a member defined in the current class.

A note about overriding: When a subclass D overrides a member m of its superclass C, then the definition of the member m is conceptually replaced in all instances of D. However, the three models are only concerned with the topmost class in which member m is declared. All three models handle overriding the way one would expect of an object-oriented language. They differ in the cases where class C has a member named m, subclass D of C has a member with the same name m, but D's m does not override C's m because C's m is not visible inside D (it's not well known, but such non-overriding does and must happen in C++ and Java as well).

Static Model

In the Static model we look at the static type Ts of expression a. Let S1 be the subset of S whose class is either Ts or one of Ts's ancestors. We pick the member in S1 with the most derived class.

The pure static model above is implemented by Java and C++. It would not work well in that form in JavaScript because many, if not most, expressions have type Any. Because type Any has no members, users would have to cast expression a to a given type T before they could access members of type T. Because of this we must extend the static model to handle the case where the subset S1 is empty, or, in other words, the static lookup fails. (Rather than doing this, we could extend the static model in the case where the static type Ts is some special type, but then we would have to decide which types are special and which ones are not. Any is clearly special. What about Object? What about Array? It's hard to draw the line consistently.)

In whichever cases way we extend the static model, we also have a choice of which member we choose. We could back off to the dynamic model, we could choose the most derived member in S, or perhaps we could choose some other approach.

Constraints:

Safety Good within the pure static model. Problems in the extended static model (a subclass could silently shadow a member) that could perhaps be addressed by warnings.
Abstraction Good.
Robustness   Very bad. Updating a function's or global variable return type silently changes the meaning of all code that uses that function or global variable; in a large project such a change would be quite difficult. Difficult to correctly split expressions into subexpressions.
Namespace independence   Good.
Compatibility Bad within the pure static model (type casts needed everywhere). May be good in the extended static model, depending on the choice of how we extend it.
Other

This model may be difficult to compile well because the compiler may have difficulty in determining the intermediate types in compound expressions. Languages based on the static model have traditionally been compiled off-line, and such compilers tend to be difficult to write for on-line compilation without requiring the programmer to predeclare all of his data structures (if there are any forward-referenced ones, then the compiler doesn't know whether they should have a type or not). A more dynamic execution model may actually help because it defers compilation until more information is known.

Spice Model

In the Spice model we think of each member m defined in a class C as though it were a function definition for a (possibly overloaded) function whose first argument has type C. Definitions in an inner lexical scope shadow definitions in outer scopes. The Spice model does not consider the static type Ts of expression a.

Let L be the innermost lexical scope enclosing the member lookup expression a.b such that some member named b is defined in L. Let Lb be the set of all members named b defined in lexical scope L, and let S1 = S  Lb (the intersection of S and Lb). If S1 is empty, we fail. If S1 contains exactly one member s, we use s. If S1 contains several members, we fail (this would only happen for import conflicts).

Constraints:

Safety Good.
Abstraction Good.
Robustness   Poor. Renaming a package-visible member may break code outside the class that defines that member even if that code does not access that member. Converting a member from private to one of the other two visibilities also can introduce conflicts in other, unrelated classes in the same package that just happen to have an unrelated member with the same name. Fortunately these conflicts usually (but not always) result in errors rather than silent changes to the meaning of the program, so one can often find them by exhaustively testing the program after making a change.
Namespace independence   Bad. Members with the same name in unrelated classes often conflict.
Compatibility Poor? Many existing programs rely on namespace independence and would have to be restructured.
Other

Most object-oriented programmers would be confused by a violation of namespace independence. Programming without this assumption requires a different point of view than most programmers are used to. (I am not talking about Lisp and Self programmers, who are familiar with that way of thinking.)

[There are numerous other variants of the Spice model as well.]

Dynamic Model

In the Dynamic model we pick the member s in S defined in the innermost lexical scope L enclosing the member lookup expression a.b. We fail if the innermost such lexical scope L contains more than one member in S (this would only happen for import conflicts).

Constraints:

Safety Good at the language level, but see "other" below.
Abstraction Good.
Robustness   Good. All of these changes are easy to do.
Namespace independence   Good.
Compatibility Good.
Other

Packages using the dynamic model may be vulnerable to hijacking (coerced into doing something other than what the author intended) by a determined intruder. It is possible for a compiler to detect such vulnerabilities and warn about them.

Namespaces

The various models make it possible to get into situations where either there is no way to access a visible member of an object or it is not safe to do so (see member hijacking). In these cases we'd like to be able to explicitly choose one of several potential members with the same name. The :: namespace syntax allows this. The left operand of :: is an expression that evaluates to a package or class; we may also allow special keywords such as public, package, or private instead of an expression here, or omit the expression altogether. The right operand of :: is a name. The result is the name qualified by the namespace.

As we have seen, the name b in a member access expression a.b does not necessarily refer to a unique accessible member of object a. In a qualified member access expression a.n::b, the namespace n narrows the set of members considered, although it's possible that the set may still contain more than one member, in which case the lookup model again disambiguates. Let S be the set of members named b of object a that are accessible. The following table shows how a.n::b subsets set S depending on n:

n   Subset
None Only the ad-hoc member named b, if any exists
A class C The fixed member of C named b, if it exists; if not, try C's superclass instead, and so on up the chain
A package P   The subset of S containing all accessible members of P
private The fixed member named b of the current class
package The subset of S containing all accessible members that have package visibility
public The subset of S containing all accessible members that have public visibility

The :: operator serves a different role from the . operator. The :: operator produces a qualified name, while the . operator produces a value. A qualified name can be used as the right operand of .; a value cannot. If a qualified name is used in a place where a value is expected, the qualified name is looked up using the lexical scoping rules to obtain the value (most likely a global variable).

Ad-Hoc Members

All of the models above address only access to fixed members of a class. JavaScript also allows one to dynamically add members to individual instances of a class. For simplicity we do not provide access control or versioning on these ad-hoc members -- all of them are public and open to everyone. Because of the safety criterion, a member lookup of a private or package-protected member must choose the private or package-protected member even if there is an ad-hoc member of the same name. To satisfy the robustness criterion, we should treat public members as similarly as possible to private or package-protected members, so we always give preference to a fixed member when there is an ad-hoc member of the same name.

To access an ad-hoc member that is shadowed by a fixed member, we can either prefix the member's name with :: or use an indirect member access.

Indirect Member Access

How should we define the behavior of the expression a[b] (assuming the [] operator is not overridden by a's class)? There are a couple of possibilities:

  1. We could evaluate the expression b to some string "s" and treat a[b] as though it were a.s. This is essentially what JavaScript 1.5 does. Unfortunately it's hard to keep this behavior consistent with JavaScript 1.5 programs' expectations (they expect no more than one member with the same name, etc.), and this kind of indirection is also vulnerable to hijacking. It may be possible to solve the hijacking problem by devising restricted variants of the [] operator such as a.n::[b] that follow the rules given in the namespaces section above.
  2. We could evaluate the expression b to some string "s" and treat a[b] as though it were a.::s, thus limiting our selection to ad-hoc members. Ad-hoc members are well-behaved, but this kind of behavior would violate the compatibility criterion when JavaScript 1.5 scripts try to reflect a JavaScript 2.0 object using the [] operator.

In general it seems like it would be a bad idea to extend the syntax of the string "s" to allow :: operators inside the string. Such strings are too easily forged to play the role of pointers to members.

Member Hijacking

[explain security attacks]


JavaScript 2.0
Compatibility
previousup

Thursday, November 11, 1999

JavaScript 2.0 is intended to be upwards compatible with JavaScript 1.5 and earlier scripts. The following are the current compatibility issues:

JavaScript 2.0 is still evolving, and some of these compatibility issues may be addressed as the language matures. They are not expected to be a problem in practice because a browser could distinguish JavaScript 1.5 and earlier scripts from JavaScript 2.0 scripts and behave compatibly on the earlier ones.


Waldemar Horwat
Last modified Friday, November 12, 1999
up