JavaScript 2.0 Formal Description

JavaScript 2.0

Formal Description

Wednesday, May 12, 1999

Introduction

The following pages present the formal syntax and semantics of JavaScript 2.0. The syntax notation and semantic notation pages explain the notation used for this description. A simple metalanguage based on a typed lambda calculus is used to specify the semantics.

The syntax and semantic pages are available in both HTML 4.0 and Microsoft Word 98 RTF formats. In the HTML versions each use of a grammar nonterminal or metalanguage value, type, or field is hyperlinked to its definition, making the HTML version preferred for browsing. On the other hand, the RTF version looks much better when printed. The fonts, colors, and other formatting of the various grammar and semantic elements are all encoded as CSS (in HTML) or Word (in RTF) styles and can be altered if desired.

The syntax and semantics pages are machine-generated from code supplied to a small engine that can type-check and execute the semantics directly. This engine is in the CVS tree at mozilla/js/semantics; the input files are at mozilla/js/semantics/JS20.

Processing

The source code is processed in the following stages:

If necessary, convert the source code into the Unicode UTF-16 format, normalized form C.
Split the source code into tokens using the lexer grammar and lexer semantics.
Parse the resulting sequence of tokens using the parser grammar and evaluate it using the parser semantics [To be provided].

Lexing

Processing stage 2 is done as follows:

Let tokens be an empty array of Token metalanguage records. (As defined in the lexer semantics, a Token can be either an identifier, a keyword, a punctuation symbol, a number, a number with a unit, a string, or the end token.)
Let input be the input sequence of Unicode characters. Append a special placeholder End to the end of input.
Let regExpMayFollow be a Boolean variable. Initialize it to true.
Apply the lexer grammar to parse the longest possible prefix of input. If regExpMayFollow is true, use the start symbol NextToken^re. If regExpMayFollow is false, use the start symbol NextToken^div. The result of the parse should be a parse tree T. If the parse failed, return a syntax error.
Compute the action Token on T to obtain a Token t. If t is the end token, return the tokens array and go to the parse stage.
Append t to the end of the tokens array.
Compute the action RegExpMayFollow on T to obtain a Boolean value and assign that value to the regExpMayFollow variable.
Remove the characters matched by T from input, leaving only the yet-unparsed suffix of input.
Go to step 4.

If an implementation encounters an error while lexing, it is permitted to either report the error immediately or defer it until the affected token would actually be used by the parser. This flexibility allows an implementation to do lexing at the same time it parses the source program.

Provide language prohibiting an identifier from immediately following a number. This will fall out of the revised definition of QuantityLiteral.

Show mapping from Token structures to parser grammar terminals (obvious, but needs to be written).

Parsing

To be provided

Waldemar Horwat
Last modified Wednesday, May 12, 1999