|
JavaScript 2.0
|
Friday, November 12, 1999
A multi-page version of this document is also available.
JavaScript 2.0 is an experimental proposal maintained by waldemar for future changes in the JavaScript language. The eventual language may differ significantly from this proposal, but the goal is to move in the directions indicated here and do so via a coordinated plan rather than adding miscellaneous features ad hoc on a release-by-release basis.
JavaScript is Netscape's implementation of the ECMAScript standard. The development of JavaScript 2.0 is heavily coordinated with the ECMA TC39 modularity subgroup. The intent is to make JavaScript 2.0 and ECMAScript Edition 4 be the same language, and this document will evolve as necessary to accomplish this.
The following are recent major changes in this document:
| Date | Revisions |
|---|---|
| Nov 11, 1999 | Continuing major reorganization of this document.... |
| Nov 5, 1999 | Reorganized the document's structure into chapters. Structured the core language chapter more in the bottom-up style of the ECMAScript standard than in the previous issue-oriented style. Combined and moved rationales and issues into an appendix. Added introduction page. Removed or reworded many obsolete paragraphs throughout the document. |
| Nov 2, 1999 | Modified the parser grammar: added [no line break] constraints, removed version lists after
public keywords, added box and user-defined visibility
keywords, and added named function arguments. |
| Oct 29, 1999 | Revised the execution model based on recent ECMA modularity group discussions. JavaScript 2.0 now has a hybrid execution model instead of a pure dynamic one, which allows for better compatibility with JavaScript 1.5. |
| Oct 20, 1999 | Added throw and try-catch
semantic operators to semantic notation and used them to signal syntax errors detected
by the semantics that would be impossible or too messy to detect in the grammars. Updated formal
description pages to match recent ECMA TC39 subcommittee decisions: eliminated octal numbers and escapes (both in
strings and in regular expressions) to match ECMAScript Edition 3, switched to using the Identifier : TypeExpression
syntax for type declarations, and added local blocks and the local
visibility specifier. Also simplified the parser grammar for definitions and removed the « and »
syntax for regular expression literals. |
| Jul 26, 1999 | Wrote description of semantic notation. Updated grammar
notation page to describe lookahead constraints. Updated regular expression semantics
to match ECMA working group decisions for ECMAScript Edition 3; one of these included changing the behavior of (?=
to not backtrack. |
| Jun 7, 1999 | Revised all grammars and semantics to simplify the grammars. Fixed several errors and omissions in the regular expression
grammar and semantics. Added support for (?=
and (?!. |
| May 16, 1999 | Added regular expression grammar and semantics. |
| May 12, 1999 | Added preliminary Formal Description chapter. |
| Mar 25, 1999 | Added Member Lookup page. Released second draft. |
| Mar 24, 1999 | Added many clarifications, discussion sections, and small changes throughout the pages. |
| Mar 23, 1999 | Rewrote Execution Model page and split it off from the Definitions
page. Added discussion of float to Machine Types. |
| Mar 22, 1999 | Removed numbered versions from the Versions page; added motivation, discussion, and version
aliasing using =. Removed angle brackets < and > from VersionsAndRenames. |
| Mar 16, 1999 | Rewrote Types page. Split off byte, ubyte, short, ushort,
int, uint, long, ulong into an optional Machine
Types library. |
| Feb 18, 1999 | Released first draft. |
Older drafts are also available:
|
JavaScript 2.0
Introduction
|
Thursday, November 11, 1999
JavaScript 2.0 is the next major step in the evolution of the JavaScript language. JavaScript 2.0 incorporates the following features in addition to those already found in JavaScript 1.5:
const and finalprivate, package, public,
and user-defined access controls+ and [ ]int for more faithful communication with other programming languagesThese facilities reinforce each other while remaining fairly small and simple. Unlike in Java, the philosophy behind them is to provide the minimal necessary facilities that other parties can use to write packages that specialize the language for particular domains rather than define these packages as part of the language core.
The versioning and access control mechanisms make the language is suitable for programming-in-the-large.
The language remains firmly in the dynamic camp. Classes can be declared statically or dynamically. JavaScript 2.0 provides introspection facilities. In some ways JavaScript 2.0 is more dynamic than JavaScript 1.5. For example, it is much easier to conditionally declare functions in JavaScript 2.0 than in 1.5: one simply defines a function inside a conditional.
The overridable basic operators can be used to implement numbers with attached units similar to the Spice proposals. Rather than implement the full unit model in the language core, JavaScript 2.0 provides the syntactic and semantic hooks to allow one to implement a unit library with whatever sophistication one's application requires.
|
JavaScript 2.0
Introduction
Motivation
|
Thursday, November 11, 1999
The main goals of JavaScript 2.0 are:
The following are specifically not goals of JavaScript 2.0:
JavaScript is not currently an all-purpose programming language. Its strengths are its quick execution from source (thus enabling it to be distributed in web pages in source form), its dynamism, and its interfaces to Java and other environments. JavaScript 2.0 is intended to improve upon these strengths, while adding others such as the abilities to reliably compose JavaScript programs out of components and libraries and to write object-oriented programs. On the other hand, it is not our intent to have JavaScript 2.0 supplant languages such as C++ and Java, which will still be more suitable for writing many kinds of applications, including very large, performance-critical, and low-level ones.
The proposed features are derived from the goals above. Consider, for example, the goals of writing modular and robust applications.
To achieve modularity we would like some kind of a library mechanism. The proposed package mechanism serves this purpose, but by itself it would not be enough. Unlike existing JavaScript programs which tend to be monolithic, packages and their clients are often written by different people at different times. Once we introduce packages, we encounter the problems of the author of a package not having access to all of its clients, or the author of a client not having access to all versions of the library it needs. If we add packages to the language without solving these problems, we will never be able to achieve robustness, so we must address these problems by creating facilities for defining abstractions between packages and clients.
To create these abstractions we make the language more disciplined by adding optional types and type-checking. We also introduce a coherent and disciplined syntax for defining classes and hierarchies and versioning of classes. Unlike JavaScript 1.5, the author of a class can guarantee invariants concerning its instances and can control access to its instances, making the package author's job tractable. The class syntax is also much more self-documenting than in JavaScript 1.5, making it easier to understand and use JavaScript 2.0 code. Defining subclasses is easy in JavaScript 2.0, while doing it robustly in JavaScript 1.5 is quite difficult.
To make packages work we need to make the language more robust in other areas as well. It would not be good if one package
redefined Object.toString or added methods to the Array prototype and thereby corrupted another
package. We can simplify the language by eliminating many idioms like these (except when running legacy programs, which would
not use packages) and provide better alternatives instead. This has the added advantage of speeding up the language's implementation
by eliminating thread synchronization points. Making the standard packages robust can also significantly reduce the memory
requirements and improve speed on servers by allowing packages to be shared among many different requests rather than having
to start with a clean set of packages for each request because some other request might have modified some property.
JavaScript 2.0 should interface with other languages even better than JavaScript 1.5 does. If the goal of integration is achieved, the user of an abstraction should not have to care much about whether the abstraction is written in JavaScript, Java, or another language. It should also be possible to make JavaScript abstractions that appear native to Java or other language users.
In order to achieve seamless interfacing with other languages, JavaScript should provide equivalents for the fundamental
data types of those languages. Details such as syntax do not have to be the same, but the concepts should be there. JavaScript
1.5 lacks support for integers, making it hard to interface with a Java method that expects a long.
JavaScript is appearing in a number of different application domains, many of which are evolving. Rather than support all of these domains in the core JavaScript, JavaScript 2.0 should provide flexible facilities that allow these application domains to define their own, evolving standards that are convenient to use without requiring continuous changes to the core of JavaScript. JavaScript 2.0 addresses this goal by letting user programs define facilities such as getters, setters, and alternative definitions of operators --facilities that could only be done by the core of the language in JavaScript 1.5.
|
JavaScript 2.0
Introduction
Notation
|
Thursday, November 11, 1999
This proposal uses the following conventions to denote literal characters:
Printable ASCII literal characters (values 20 through 7E hexadecimal) are in a blue monospaced font. Other
characters are denoted by enclosing their four-digit hexadecimal Unicode value between «u
and ». For example, the non-breakable space character would be denoted in this
document as «u00A0». A few of the common control characters are represented
by name:
| Abbreviation | Unicode Value |
|---|---|
«NUL» |
«u0000» |
«BS» |
«u0008» |
«TAB» |
«u0009» |
«LF» |
«u000A» |
«VT» |
«u000B» |
«FF» |
«u000C» |
«CR» |
«u000D» |
«SP» |
«u0020» |
A space character is denoted in this document either by a blank space where it's obvious from the context or by «SP»
where the space might be confused with some other notation.
Each LR(1) parser grammar and lexer grammar rule consists of a nonterminal, a , and one or more expansions of the nonterminal separated by vertical bars (|). The expansions are usually listed on separate lines but may be listed on the same line if they are short. An empty expansion is denoted as «empty».
Consider the sample rule:
... Identifier, ... IdentifierThis rule states that the nonterminal SampleList can represent one of four kinds of sequences of input tokens:
... followed by some expansion of the nonterminal Identifier;, and ... and an expansion of the nonterminal
Identifier.Input tokens are characters (and the special End placeholder) in the lexer
grammar and lexer tokens in the parser grammar. Spaces separate input tokens and nonterminals
from each other. An input token that consists of a space character is denoted as «SP».
Other non-ASCII or non-printable characters are denoted by also using « and »,
as described in the character notation section.
If the phrase "[lookahead set]" appears in the expansion of a production, it indicates that the production may not be used if the immediately following input terminal is a member of the given set. That set can be written as a list of terminals enclosed in curly braces. For convenience, set can also be written as a nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand.
For example, given the rules
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9the rule
n [lookahead {1, 3, 5, 7, 9}] DecimalDigitsmatches either the letter n followed by one or more decimal digits the first of which is even, or a decimal
digit not followed by another decimal digit.
These lookahead constraints do not make the grammars more theoretically powerful than LR(1), but they do allow these grammars to be written more simply. The semantic engine compiles grammars with lookahead constraints into parse tables that have the same format as those produced from ordinary LR(1) or LALR(1) grammars.
Many rules in the grammars occur in groups of analogous rules. Rather than list them individually, these groups have been summarized using the shorthand illustrated by the example below:
Metadefinitions such as
introduce grammar arguments a and b. If these arguments later parametrize the nonterminal on the left side of a rule, that rule is implicitly replicated into a set of rules in each of which a grammar argument is consistently substituted by one of its variants. For example, the sample rule
= AssignmentExpressionnormal,bexpands into the following four rules:
= AssignmentExpressionnormal,allowIn= AssignmentExpressionnormal,noIn= AssignmentExpressionnormal,allowIn= AssignmentExpressionnormal,noInAssignmentExpressionnormal,allowIn is now an unparametrized nonterminal and processed normally by the grammar.
Some of the expanded rules (such as the fourth one in the example above) may be unreachable from the grammar's starting nonterminal; these are ignored.
A few lexer rules have too many expansions to be practically listed. These are specified by descriptive text instead of a list of expansions after the .
Some lexer rules contain the metaword except. These rules match any expansion that is listed before the except
but that does not match any expansion after the except. All of these rules ultimately expand into single characters.
For exaple, the rule below matches any single UnicodeCharacter except the * and
/ characters:
A few parts of the main body of this proposal still use an informal syntax to describe language constructs, although this syntax is being phased out. An example is the following:
< VersionRange [: Identifier] , ... , VersionRange [: Identifier] >].. [Version]VersionsAndRenames and VersionRange are the names of the grammar
rules. The black square brackets represent optional items, and the black ... together with its neighbors represents optional
repetition of zero or more items, so a VersionsAndRenames can have zero or more sets of VersionRange [: Identifier]
separated by commas. A black | indicates that either its left or right alternative may be present, but not both; |'s have
the lowest metasymbol precedence. Syntactic tokens to be typed literally are in a bold blue monospaced
font. Grammar nonterminals are in green italic and correspond to the nonterminals in the
parser grammar or lexer grammar.
|
JavaScript 2.0
Core Language
|
Thursday, November 11, 1999
This chapter presents an informal description of the core language. The exact syntax and semantics are specified in the formal description. Libraries are also specified in a separate library chapter.
|
JavaScript 2.0
Core Language
Concepts
|
Thursday, November 11, 1999
The words type and class are used interchangeably in this specification. A type represents a possibly
infinite set of values. A value can be a member of multiple such sets, so a value can have more than one type. A value
may not have an intrinsic most specific type -- one can ask whether the value v is a member of a given type t,
but this does not prevent the value v from also being a member of some unrelated type s. For example,
null is a member of type Array as well as type Function, but neither Array
nor Function is a subtype of the other.
On the other hand, a variable does have a particular type. If one declares a variable x of type Array,
then whatever value is held in x is guaranteed to have type Array, and one can assign any value of
type Array to x.
|
JavaScript 2.0
Core Language
Lexer
|
Thursday, November 11, 1999
This section presents an informal overview of the JavaScript 2.0 lexer. See the stages and lexer semantics sections in the formal description chapter for the details.
The JavaScript 2.0 lexer behaves in the same way as the JavaScript 1.5 lexer except for the following:
}. In addition, the JavaScript 2.0 parser allows semicolons to be
omitted before the else of an if-else statement and before
the while of a do-while statement.JavaScript 2.0 source text consists of a sequence of UTF-16 Unicode version 2.1 or later characters normalized to Unicode Normalized Form C (canonical composition), as described in the Unicode Technical Report #15.
Comments and white space behave just like in JavaScript 1.5.
The following JavaScript 1.5 punctuation tokens are recognized in JavaScript 2.0:
! != !==
% %= &
&& &= (
) * *=
+ ++ +=
, - --
-= . /
/= : ::
; < <<
<<= <= =
== === >
>= >> >>=
>>> >>>= ?
[ ] ^
^= { |
|= || }
~
The following punctuation tokens are new in JavaScript 2.0:
# &&= ->
.. ... @
^^ ^^= ||=
The following reserved words are used in JavaScript 2.0:
break case catch
class const continue
default delete do
else eval extends
false final finally
for function if
in instanceof new
null package private
public return super
switch this throw
true try typeof
var while with
Out of these, the only word that was not reserved in JavaScript 1.5 is eval.
The following reserved words are reserved for future expansion:
abstract debugger enum
export goto implements
import interface native
protected static synchronized
throws transient volatile
The following words have special meaning in some contexts in JavaScript 2.0 but are not reserved and may be used as identifiers:
box constructor field
get language local
method override set
version
The following words name predefined types but are not reserved and may be used as identifiers (although this is not recommended):
Any Array array
boolean character Function
integer Null number
Object object string
Type type void
The JavaScript 2.0 grammar explicitly makes semicolons optional in the following situations:
}else of an if-else statementwhile of a do-while statement (but not before the while
of a while statement)Semicolons are optional in these situations even if they would construct empty statements. Strict mode has no effect on semicolon insertion in the above cases.
In addition, sometimes line breaks in the input stream are turned into VirtualSemicolon tokens. Specifically, if the first through the nth tokens of a JavaScript program form are grammatically valid but the first through the n+1st tokens are not and there is a line break (or a comment including a line break) between the nth tokens and the n+1st tokens, then the parser tries to parse the program again after inserting a VirtualSemicolon token between the nth and the n+1st tokens. This kind of VirtualSemicolon insertion does not occur in strict mode.
See also the semicolon insertion syntax rationale.
Regular expression literals begin with a slash (/) character not immediately followed by another slash (two
slashes start a line comment). Like in JavaScript 1.5, regular expression literals are ambiguous with the division (/)
or division-assignment (/=) tokens. The lexer treats a / or /= as a division or division-assignment
token if either of these tokens would be allowed by the syntactic grammar as the next token; otherwise, the lexer treats a
/ or /= as starting a regular expression.
This unfortunate dependence of lexical parsing on grammatical parsing is inherited from JavaScript 1.5. See the regular expression syntax rationale for a discussion of the issues.
When a numeric literal is be immediately followed by an optional underscore and an identifier, the lexer drops the underscore if it is present and converts the identifier to a string literal. The parser then treats the number and string as a unit expression. There are no reserved word restrictions on the identifier in this case; any identifier that begins with a letter will work, even if it is a reserved word.
For example, 3in and 3_in are both converted to 3 "in". 5xena
is converted to 5 "xena". On the other hand, 0xena is converted to 0xe "na".
It is unwise to define unit names that begin with the letters e or E either alone or followed by
a decimal digit, or x or X followed by a hexadecimal digit because of potential ambiguities with
exponential or hexadecimal notation.
|
JavaScript 2.0
Core Language
Expressions
|
Thursday, November 11, 1999
Most of the behavior of expressions is the same as in JavaScript 1.5. Differences are highlighted below. One general difference is that most expression operators can be overridden via operator overloading.
boxconstructorfieldgetlanguagelocalmethodsetoverrideversionThe above keywords are not reserved and may be used in identifiers.
Just like in ECMAScript Edition 3, an identifier evaluates to an internal data structure called a reference. However, JavaScript 2.0 references have several additional attributes, one of which is a namespace. The namespace is set to the value of the ParenthesizedExpression. If the ParenthesizedExpression is a simple Identifier or QualifiedIdentifier then the parentheses may be omitted.
nulltruefalsethissuper? IdentifierA Number literal or ParenthesizedExpression
followed by a String literal is a unit expression. The unit object specified by the String
is looked up; the result is called as a function and passed two arguments: the numeric value of the Number
literal or ParenthesizedExpression, and either null
(if a ParenthesizedExpression was provided) or the original
Number literal expressed as a string.
The string representation allows user-defined unit classes to define extended syntaxes for numbers. For instance, a long-integer
package might define a unit called "L" that treats the Number literal as
a full 64-bit number without rounding it to a double first.
A ? Identifier expression
is used to access scope information.
++--The @ operator performs a type cast. The second operand specifies the type. Both the
. and the @ operators accept either a QualifiedIdentifier
or a ParenthesizedExpression as the second operand.
If it is a ParenthesizedExpression, the second operand
of . must evaluate to a string. a.(x) is a synonym for a[x]
except that the latter can be overridden via operator overloading.
The [] operator can take multiple (or even named) arguments. This allows users to define
data structures such as multidimensional arrays via operator overloading.
An ArgumentList can contain both positional and named arguments. Named arguments use the same syntax as object literals.
delete PostfixExpressiontypeof UnaryExpressioneval UnaryExpression++ PostfixExpression-- PostfixExpression+ UnaryExpression- UnaryExpression~ UnaryExpression! UnaryExpressionThe ^^ operator is a logical exclusive-or operator. It evaluates both operands. If
they both convert to true or both convert to false, then ^^ returns false; otherwise ^^
returns the unconverted value of whichever argument converted to true.
|
JavaScript 2.0
Core Language
Statements
|
Thursday, November 11, 1999
Most of the behavior of statements is the same as in JavaScript 1.5. Differences are highlighted below.
;;;;;;A box has the syntax:
box { Statement ... Statement }A box behaves like a regular block except that it forms its own scope. Variable and function definitions without a Visibility prefix inside the box belong to that box instead of the global scope or the enclosing function, class, or box.
A block can be annotated with a Visibility prefix as follows:
{ Statement ... Statement }Such a block behaves like a regular block except that every declaration inside that block (but not inside any enclosed function, class, box, or nested visibility-specifying block) that does not have an explicit Visibility prefix uses the Visibility prefix given by the block.
Visibility-specifying blocks are useful to define several items without having to repeat a Visibility prefix for each one. For example,
class foo {
field z:integer;
public field a;
private field b;
public method f() {}
public method g(x:integer) {}
}
is equivalent to:
class foo {
field z:integer;
public {
field a;
private field b;
method f() {}
method g(x:integer) {}
}
}
if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNoShortIfThe semicolon is optional before the else.
The semicolon is optional before the closing while.
|
JavaScript 2.0
Core Language
Definitions
|
Thursday, November 11, 1999
Any definition can have a Visibility prefix. That prefix specifies the following:
A Visibility prefix can be one of the prefixes in the table below, or it can be user-defined. User-defined Visibility prefixes allow the author of a package P to control definition visibility based on the version by which a client package imports P. User-defined Visibility prefixes also allow definition access to be controlled by the manner in which a client attempts to reference the definition.
The following are the predefined Visibility prefixes. The access privileges they provide are
described in more detail in the next section. Unless overridden, the default Visibility is box.
| Visibility | Access allowed from |
|---|---|
local |
only within current block |
box |
only within current package (when applied to a class member), function, or box |
private |
only within current class |
package |
only within current package |
public |
within any package that imports this package |
To understand the scope to which a definition applies we need to define a few terms. In the definitions below D represents a variable, function, member, or class definition.
box Block lexically
enclosing D. If there is no such block, then the containing box of D is the package scope.box Block,
or visibility-specifying block lexically enclosing D. If there
is no such block, then the containing visibility specifier of D is the package scope.box.To determine the scope S to which a definition D applies, we look up the definition's Visibility prefix in the table below. A definition without a Visibility prefix uses its visibility default prefix.
| Visibility | Scope where entity is declared |
|---|---|
local |
D's containing block |
box |
D's containing box |
private |
D's containing class |
package |
D's containing class |
public |
D's containing class |
| User-defined | D's containing class |
The scope S is not the scope in which the definition is accessible; rather, it is the scope into which the declared entity is inserted.
If S is a class and Visibility is not local,
then the declared entity will appear as a member of class S. If S is a class and Visibility
is local, then the declared entity will only be created inside class S's block
without becoming a member of class S; it is an error if this case arises for a method or field definition.
Once the scope S is known, the accessibility of definition D is determined by the table below. P is the lexically enclosing package.
| Visibility prefix |
Scope S is ... | ||||
|---|---|---|---|---|---|
| a package P | a class C | a function F | a box B | a block B | |
local |
Package P | Class C | Function F | Box B | Block B |
box |
Package P | Package P | Function F | Box B | |
private |
Package P | Class C | |||
package |
Package P | Package P | |||
public |
Any package | Any package | |||
| User-defined | User-defined | User-defined | |||
All of these definitions share several common scoping rules:
Rules 3 and 4 state that once an identifier is resolved to a variable or function in a scope, that resolution cannot be changed. This permits efficient compilation and avoids confusion with programs such as:
const b:integer = 7;
function f():integer {
function g():integer {return b}
var a = g();
const b:integer = 8;
return g() - a;
}
Definitions at the top level of a Program or at the top level of a ClassDefinition's
Block may omit Visibility, in which case they are treated as if
they had package visibility. When used outside a ClassDefinition's
Block, private is equivalent to package.
does not apply to the current Block. Instead, it declares either an entity at the top level of the current package (if outside a ClassDefinition's Block) or a member of the current class (if inside a ClassDefinition's Block). In addition to lifting the definition out of the current scope in this way, Visibility also specifies the definition's visibility from other packages or classes. Visibility can take one of the following forms:
Most lexical scopes are established by Block productions in the grammar. Lexical scopes nest, and a definition in an inner scope can shadow definitions in outer ones.
In the example below the comments indicate the scope and visibility of each definition:
var a0; // Package-visible global variable
private var a1 = true; // Package-visible global variable
package var a2; // Package-visible global variable
public var a3; // Public global variable
if (a1) {
var b0; // Local to this block
private var b1; // Package-visible global variable
package var b2; // Package-visible global variable
public var b3; // Public global variable
}
public function F() { // Public global function
var c0; // Local to this function
private var c1; // Package-visible global variable
package var c2; // Package-visible global variable
public var c3; // Public global variable
}
function G() { // Package-visible global function
var d0; // Never defined because G isn't called
private var d1; // Never defined because G isn't called
package var d2; // Never defined because G isn't called
public var d3; // Never defined because G isn't called
}
class C { // Package-visible global class
var e0; // Package-visible class variable
private var e1; // Class-visible class variable
package var e2; // Package-visible class variable
public var e3; // Public class variable
field e4; // Package-visible instance variable
private field e5; // Class-visible instance variable
package field e6; // Package-visible instance variable
public field e7; // Public instance variable
function H() { // Package-visible class function
var f0; // Local to this function
private var f1; // Class-visible class variable
package var f2; // Package-visible class variable
public var f3; // Public class variable
private field f4; // Class-visible instance variable
package field f5; // Package-visible instance variable
public field f6; // Public instance variable
}
public method I() {} // Public class method
H();
}
F();
A public definition's identifier is exported to other packages. To help avoid accidental collisions between
identifiers declared in different packages, identifiers can be selectively exported depending on the version requested by
an importing package. An identifier definition with a version number newer than that requested by the importer will not be
seen by that importer. The versioning facilities also include additional facilities that allow
robust removal and renaming of identifiers.
VersionsAndRenames describes the set of versions in which an identifier is exported, together with a possible alias for the identifier:
: Identifier] , ... , VersionRange [: Identifier]].. [Version]Suppose a client package C imports version V of package P that exports identifier N with some VersionsAndRenames. If the VersionsAndRenames's VersionRange includes version V, then package C can use the corresponding Identifier alias to access package P's N. If the Identifier alias is omitted, then package C can use N to access package P's N. Multiple VersionRanges operate independently.
In most cases VersionsAndRenames is just a Version name (a string):
public "1.2" const z = 3;
If VersionsAndRenames is omitted, the default version "" is assumed.
Do we want to collapse all block scopes into one inside functions? On one hand this complicates the language conceptually and surprises Java and C++ programmers. On the other hand, this would match JavaScript 1.5 better and simplify closure creation when a closure is created nested inside several blocks in a function.
Should we make private illegal outside a class rather than making it
equivalent to package?
Should we introduce a local Visibility prefix
that explicitly means that the definition is visible locally? This wouldn't provide any additional functionality but it
would provide a convenient name for talking about the four kinds of visibility prefixes.
What should the default visibilities be? The current defaults are loosely modeled after Java:
| Definition Location | Default visibility |
|---|---|
| Package top level | package (equivalent to local in this case) |
| Inside a statement outside a function or class | local |
| Function or method code's top level | local |
| Inside a statement inside a function or method | local |
| Class definition block's top level | package |
| Inside a statement inside a class definition block | local |
Should we have a protected Visibility? It has been omitted
for now to keep the language simple, but there does not appear to be any fundamental reason why it could not be supported.
If we do support it, should we choose the C++ protected concept (visible only in class and subclasses) or the
Java protected concept (visible in class, subclasses, and the original class's package)?
|
JavaScript 2.0
Core Language
Variables
|
Thursday, November 11, 1999
The general syntax for defining variables is:
var Identifier [: TypeExpression] [= AssignmentExpression] , ... , Identifier [: TypeExpression] [= AssignmentExpression] ;const Identifier [: TypeExpression] = AssignmentExpression , ... , Identifier [: TypeExpression] = AssignmentExpression ;A variable defined with var can be modified, while one defined with const
cannot. Identifier is the name of the variable and TypeExpression
is its type. Identifier can be any non-reserved identifier. TypeExpression
is evaluated at the time the variable definition is evaluated and should evaluate to a type t.
If provided, AssignmentExpression gives the variable's initial value v. If not,
undefined is assumed; an error occurs if undefined cannot be coerced
to type t. AssignmentExpression is evaluated just after the TypeExpression
is evaluated. The value v is then coerced to the variable's type t and stored in the variable. If the
variable is defined using var, any values subsequently assigned to the variable are
also coerced to type t at the time of each such assignment.
Multiple variables separated by commas can be defined in the same VariableDefinition. The values of earlier variables are available in the TypeExpressions and AssignmentExpressions of later variables.
If omitted, TypeExpression defaults to type any. Thus, the definition
var a, b=3, c:integer=7, d, e:type=boolean, f:number, g:e, h:int;
is equivalent to:
var a:Any=undefined; var b:Any=3; var c:integer=7; var d:integer=undefined; // coerced to +0 var e:type=boolean; var f:number=undefined; // coerced to +0 var g:boolean=undefined; // coerced to false var h:int=undefined; // coerced to int(0)
const Definitionsconst means that Identifier cannot be written after
it is defined. It does not mean that Identifier will have the same value the next time it is
bound. For example, the following is legal; a new j binding is created each time through the loop:
var k = 0;
for (var i = 0; i < 10; i++) {
local const j = i;
k += j;
}
|
JavaScript 2.0
Core Language
Functions
|
Thursday, November 11, 1999
...To define a function we use the following syntax:
function [get | set] Identifier ( Parameters ) [: TypeExpression] BlockIf Visibility is absent, the above declaration defines a local function within the current Block scope. If Visibility is present, the above declaration declares either a global function (if outside a ClassDefinition's Block) or a class function (if inside a ClassDefinition's Block) according to the declaration scope rules.
The function's result type is TypeExpression, which defaults to type Any if not
given. If the function does not return a value, it's good practice to set TypeExpression to void
to document this fact.
Block contains the function body and is evaluated only when the function is called.
Parameters has one of the following forms:
, ... , RequiredParameter [, OptionalParameter ... , OptionalParameter] [, ... [Identifier]]... [Identifier]If the ... is present, the function accepts more arguments than just the listed parameters.
If an Identifier is given after the ..., then that Identifier
is bound to an array of arguments given after the listed parameters. That Identifier is
declared locally as though by the declaration const array Identifier.
Individual parameters have the forms:
: TypeExpression]: TypeExpression] = AssignmentExpressionTypeExpression gives the parameter's type and defaults to type Any. If the parameter
name Identifier is followed by a =, then that parameter is
optional. If the nth parameter is optional and a call to this function provides fewer than n arguments,
then the nth parameter is set to the value of its AssignmentExpression, coerced to
the nth parameter's type if necessary. The nth parameter's AssignmentExpression
is evaluated only if fewer than n arguments are given in a call.
A RequiredParameter may not follow an OptionalParameter. If a
function has n RequiredParameters and m OptionalParameters
and no ... in its parameter list, then any call of that function must supply at least
n arguments and at most n+m arguments. If this function has a ...
in its parameter list, then any call of that function must supply at least n arguments. These restrictions do not
apply to traditional functions.
The parameters' Identifiers are local variables with types given by the corresponding TypeExpressions inside the function's Block. Code in the Block may read and write these variables. Arguments are passed by value, so writes to these variables do not affect the passed arguments' values in the caller.
In addition to local variables generated by the parameters' Identifiers, each function also
has a predefined arguments local variable which holds an array (of type const array) of all
arguments passed to this function.
When a function is called, the following list indicates the order of evaluation of the various expressions in a FunctionDefinition. These steps are taken only after all of the arguments have been evaluated.
...
followed by an Identifier, bind that Identifier to an array
comprised of the zero or more leftover arguments not already bound to a parameter.Note that later TypeExpressions and AssignmentExpressions can refer to previously bound arguments. Thus, the following is legal:
function choice(boolean a, type b, b c, b d=) b {
return a ? c : d;
}
The call choice(true,integer,8,4) would return 8, while choice(false,integer,6) would return
0 (undefined coerced to type integer).
Unless the function is a traditional function, the function definition using the above
syntax does not define a class; the function's name cannot be used in a new expression, and the function
does not have a this parameter. Any attempt to use this inside the function's body is an error.
To define a method that can access this, use the method
keyword.
If a FunctionDefinition is located at a class scope (either because it is located the top
level of a ClassDefinition's Block
or it has a Visibility prefix and is located inside a ClassDefinition's
Block), then the function is a static
method of the class. Unlike C++ or Java, JavaScript 2.0 does not use the static keyword to indicate such functions;
instead, instance methods (i.e. non-static methods) are defined using the method
keyword.
If a FunctionDefinition contains the keyword get or set,
then the defined function is a getter or a setter.
A getter must not take any parameters and cannot have a ... in its Parameters
list. Unlike an ordinary function, a getter is invoked by merely mentioning its name without an Arguments
list in any expression except as the destination of an assignment. For example, the following code returns the string <2,3,1>:
var x:integer = 0;
function get serialNumber():integer {return ++x}
var y = serialNumber;
return "<" + serialNumber + "," + serialNumber + "," + y + ">";
A setter must take exactly one required parameter and cannot have a ... in its Parameters
list. Unlike an ordinary function, a setter is invoked by merely mentioning its name (without an Arguments
list) on the left side of an assignment or as the target of a mutator such as ++ or --. The result
of the setter becomes the result of the assignment. For example, the following code returns the string <1,2,43>:
var x:integer = 0;
function get serialNumber():integer {return ++x}
function set serialNumber(n:integer):integer {return x=n}
var s = "<" + serialNumber + "," + serialNumber;
serialNumber = 42;
return s + "," + serialNumber + ">";
A setter can have the same name as a getter in the same lexical scope. A getter or setter cannot be extracted from its variable, so the notion of the type of a getter or setter is vacuous; a getter or setter can only be called.
Contrast the following:
var x:integer = 0;
function f():integer {return ++x}
function g():Function {return f}
function get h():Function {return f}
f; // Evaluates to function f
g; // Evaluates to function g
h; // Evaluates to function f (not h)
f(); // Evaluates to 1
g(); // Evaluates to function f
h(); // Evaluates to 2
g()(); // Evaluates to 3
We can use a getter and a setter to create an alias to another variable, as in:
function get myAlias() {return Pkg::var}
function set myAlias(x) {return Pkg::var = x}
myAlias = myAlias+4;
Traditional function definitions are provided for compatibility with JavaScript 1.5. The syntax is as follows:
traditional function Identifier ( Identifier , ... , Identifier ) BlockA function declared with the traditional keyword cannot have any argument or result
type declarations, optional arguments, or getter or setter
keyword. Such a function is treated as though every argument were optional and more arguments than just the listed ones were
allowed. Thus, the definition
traditional function Identifier ( Identifier , ... , Identifier ) Block
behaves like the following function definition:
function Identifier ( Identifier = , ... , Identifier = , ... ) Block
Furthermore, a traditional function defines its own class and treats this in the same manner as JavaScript
1.5.
Every function (except a getter or a setter) is also a value and has type Function. Like other values, it can
be stored in a variable, passed as an argument, and returned as a result. The identifiers in a function are all lexically
scoped.
We can use a variant of a function definition to define a function inside an expression. The syntax is:
function [Identifier] ( Parameters ) [: TypeExpression] BlockThis expression defines a function and returns it as a value of type Function. The function can be named by
providing the Identifier, but this name is only accessible from inside the function's Block.
To avoid confusion between a FunctionDefinition and a FunctionExpression, a Statement (and a few other grammar nonterminals) may not begin with a FunctionExpression. To place a FunctionExpression at the beginning of a Statement, enclose it in parentheses.
A FunctionDefinition is merely convenient syntax for a const variable definition
and a FunctionExpression:
[Visibility] function Identifier ( Parameters ) [: TypeExpression] Block
is equivalent to:
[Visibility] const Identifier : Function = function Identifier ( Parameters ) [: TypeExpression] Block ;
Unless a function is a getter or a setter, we call that function by listing its arguments in parentheses after the function expression, just as in JavaScript 1.5:
( AssignmentExpression , ... , AssignmentExpression )By consensus in the ECMA TC39 modularity subcommittee, we decided to use the above syntax for getters and setters instead of:
getter | setter] function Identifier ( Parameters ) [: TypeExpression] BlockThe decision was based on aesthetics; neither syntax is more difficult to implement than the other.
Do we want to have a named rest parameter (as in the proposal above), or only support the arguments
special local variable as in JavaScript 1.5? The main difference is in the handling of fixed arguments -- they must be added
to the arguments array but can be omitted from the rest array.
The traditional keyword is ugly, so let's take a look at some alternatives. Unless we want to continue to
make each function into a class (as JavaScript 1.5 does), we need some way to indicate which functions are also classes
and which ones are not. Also, we'd like to be able to indicate which functions can be called with more or fewer than the
desired number of arguments and which cannot.
One possibility would be to state that any function that uses a type annotation in its signature (either the parameter
list or the result type) is a new-style function and does not define a class; other functions would declare classes. Furthermore,
new-style functions would have to be called with the exact number of arguments unless some parameters are optional or a
... is present in the parameter list. These are analogous to the rules that ANSI C used to distinguish new-style
functions from traditional C functions. As with ANSI C, we have somewhat of a difficulty with functions that take no parameters;
such functions would need to specify a return type to be considered new-style.
C++ did away with the ANSI C treatment of traditional C functions. We could do the same by having a pragma (analogous
to Perl's use pragmas) that could indicate that all functions are to be considered new-style unless prefixed
by the traditional keyword. If we do this, we should decide whether the default setting of this pragma would
be on or off.
|
JavaScript 2.0
Core Language
Classes
|
Thursday, November 11, 1999
methodoverride [no line break] methodfinal [no line break] methodfinal [no line break] override [no line break] methodIn JavaScript 2.0 we define classes using the class keyword. Limited classes can also
be defined via JavaScript 1.5-style functions, but doing so is discouraged
for new code.
class Identifier [extends TypeExpression] Blockclass extends TypeExpression BlockThe first format declares a class with the name Identifier, binding Identifier
to this class in the scope specified by the Visibility
prefix (which usually includes the ClassDefinition's Block). Identifier
is a constant variable with type type and can be used anywhere a type expression is allowed.
When the first ClassDefinition format is evaluated, the following steps take place:
extends TypeExpression is given, TypeExpression
is evaluated to obtain a type s, which must be another class. If extends
TypeExpression is absent, type s defaults to the class Object.const,
var, function, constructor, and class declarations evaluated at its
top level (or placed at its top level by the scope rules) become class
members of type t. All field and method declarations evaluated at the Block's
top level (or placed at its top level by the scope rules) become instance
members of type t.A ClassDefinition's Block is evaluated just like any other Block, so it can contain expressions, statements, loops, etc. Such statements that do not contain declarations do not contribute members to the class being declared, but they are evaluated when the class is declared.
If a ClassDefinition omits the class name Identifier, it extends
the original class rather than creating a subclass. A class extension may define new methods and class constants and variables,
but it does not have special privileges in accessing the original class definition's private members (or package
members if in a separate package). A class extension may not override methods, and it may not define constructors or instance
variables.
Each instance of the original class is automatically also an instance of the extended class. Several extensions can apply to the same class.
An extension is useful to add methods to system classes, as in the following code in some user package P:
class extends string {
public method scramble() string {...}
public method unscramble() string {...}
}
var x = "abc".scramble();
Once the class extension is evaluated, methods scramble and unscramble become available on all
strings. There is no possibility of name clashes with extensions of class string in other, unrelated packages
because the names scramble and unscramble belong to package P and not the system package
that defines string. Any packages that import package P will also be able to call scramble
and unscramble on strings, but other packages will not.
A class has an associated set of class members and another set of instance members. Class members are properties of the class itself, while instance members are properties of each instance object of this class and have independent values for different instance objects.
Class members are one of the following:
const keyword.var keyword.function keyword.constructor keyword.class keyword.Instance members are one of the following:
field keyword.method keyword.Members can only be defined within the intersection of the lexical and dynamic extent of a ClassDefinition's Block. A few examples illustrate this rule.
The code
var bool extended = false;
function callIt(x) {return x()}
class C {
extended = true;
public function square(integer x) integer {return x*x}
if (extended) {
public function cube(integer x) integer {return x*x*x}
} else {
public function reciprocal(number x) number {return 1/x}
}
field string firstName, lastName;
method name() string {return firstName + lastName}
public function genMethod(boolean b) {
if (b) {
public field time = 0;
} else {
public field date = 0;
}
}
genMethod(true);
}
defines class C with members square (a class function), cube (a class function),
firstName (an instance variable), lastName (an instance variable), name (an instance
method), and genMethod (a class function).
On the other hand, executing the following code after the above example would be illegal due to three different errors:
genMethod(false); // Field date declared outside of C's block's dynamic extent
public field color; // Field declared outside a class's block
function genField() {
public field style;
}
class D {
genField(); // Field style declared outside D's block's lexical extent
}
While a ClassDefinition's Block is being evaluated, the already defined class members (other than constructors) are visible and usable by the code in that Block. Afterwards members can be accessed in one of several ways:
package or omitted), or anywhere within the current package or any package that imports the appropriate
version of the current package (if a member's Visibility is public) can access
class members by using the . operator on the class.package or omitted), or anywhere within the current package or any package that imports the appropriate
version of the current package (if a member's Visibility is public) can access
instance members by using the . operator on any of the class's instances.A subclass inherits all members except constructors from its superclass. Class variables have only one global value, not one value per subclass. A subclass may override visible methods, but it may not override or shadow any other visible members. On the other hand, imports and versioning can hide members' names from some or all users in importing packages, including subclasses in importing packages.
We have already seen the definition syntax for variables and constants, functions, and classes. Any of these defined at a ClassDefinition's Block's top level (or placed at its top level by the scope rules) become class members of the class.
Fields, methods, and constructor definitions have their own syntax described below. These definitions must be lexically enclosed by a ClassDefinition's Block.
field Identifier [: TypeExpression] [= AssignmentExpression] , ... , Identifier [: TypeExpression] [= AssignmentExpression] ;A FieldDefinition is similar to a VariableDefinition except that it defines an instance variable of the lexically enclosing class. Each new instance of the class contains a new, independent set of instance variables initialized to the values given by the AssignmentExpressions in the FieldDefinition.
Identifier is the name of the instance variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression is evaluated at the time the variable definition is evaluated and should evaluate to a type t. The TypeExpressions and AssignmentExpressions are evaluated once, at the time the FieldDefinition is evaluated, rather than every time an instance of the class is constructed; their values are saved for use in constructors.
If omitted, TypeExpression defaults to type any.
If provided, AssignmentExpression gives the instance variable's initial value v.
If not, undefined is assumed; an error occurs if undefined cannot be coerced
to type t. AssignmentExpression is evaluated just after the TypeExpression
is evaluated. The value v is then coerced to the variable's type t and stored in the instance variable.
Any values subsequently assigned to the instance variable are also coerced to type t at the time of each such assignment.
Multiple instance variables separated by commas can be defined in the same FieldDefinition.
A field cannot be overridden in a subclass.
final] [override] method [get | set] Identifier ( Parameters ) [: TypeExpression] Blockfinal] [override] method [get | set] Identifier ( Parameters ) [: TypeExpression] ;A MethodDefinition is similar to a FunctionDefinition except that it defines an instance method of the lexically enclosing class. Parameters, the result TypeExpression, and the body Block behave just like for function definitions, with the following differences:
this that refers to the instance object of the method's class on
which the method was called.. operator produces a function (more specifically, a closure) that is already dispatched and has this
bound to the left operand of the . operator.traditional syntax
for methods. Optional parameters must be specified explicitly.We call a regular method by combining the . operator with a function call. For example:
class C { returns
field x:integer = 3;
method m() {return x}
method n(x) {return x+4}
}
var c = new C;
c.m(); //3 returns
c.n(7); //11
var f:Function = c.m; //f is a zero-argument function with this
bound to c
returns
f(); //3
returns
c.x = 8;
f(); //8
A class c may override a method m defined in its superclass s. To do this, c
should define a method m' with the same name as m and use the override
keyword in the definition of m'. Overriding a method without using the override
keyword or using the override keyword when not overriding a method results in a warning
intended to catch misspelled method names. The warning is not an error to allow subclass c to either define a method
if it is not present in s or override it if it is present in s -- this situation can arise when s
is imported from a different package and provides several versions.
The overriding method m' does not have to have the same number or type of parameters as the overridden method m. In fact, since parameter types can be arbitrary expressions and are evaluated only during a call, checking for parameter type compatibility when the overriding method m is declared would require solving the halting problem. Moreover, defining overriding methods that are more general than overridden methods is useful.
A method defined with the final keyword cannot
be overridden (or further overridden) in subclasses.
If a MethodDefinition contains the keyword get or set,
then the defined method is a getter or a setter. These are analogous to getter
and setter functions in that they are invoked without listing the parentheses after the method name.
A getter or setter method cannot be overridden. We could relax this restriction, but then we'd also
have to allow overriding of fields by getters, setters, or other fields, and, as a corollary, allow fields to be declared
final.
constructor Identifier ( Parameters ) BlockA constructor is a class function that creates a new instance of the lexically enclosing class c. A constructor's
body Block is required to call one of c's superclass's constructors (when
and how?). Afterwards it may access the instance object under construction via the this local variable.
A constructor should not return a value with a return statement; the newly created object is returned automatically.
A constructor can have any non-reserved name, in which case we would invoke it as though it were a class function. In addition,
a constructor's Identifier can have the special name new, in which case we invoke
it using the new prefix operator syntax as in JavaScript 1.5.
|
JavaScript 2.0
Core Language
Packages
|
Thursday, November 11, 1999
Packages are an abstraction mechanism for grouping and distributing related code. Packages are designed to be linked at run time to allow a program to take advantage of packages written elsewhere or provided by the embedding environment. JavaScript 2.0 offers a number of facilities to make packages robust for dynamic linking:
A package is a file (or analogous container) of JavaScript 2.0 code. There is no specific JavaScript statement that introduces or names a package -- every file is presumed to be a package. A package itself has no name, but it has a specific URI by which other packages can import it.
A package P typically starts with import statements that import other packages used by package
P. A package that is meant to be used by other packages typically has one or more version
declarations that declare versions available for export.
A package's body is described by the Program grammar nonterminal. A package is loaded (its body is evaluated) when the package is first imported or invoked directly (if, for example, the package is on an HTML web page). Some standard packages may also be loaded when the JavaScript engine first starts up.
Two attempts to load the same package in the same environment result in sharing of that package. What constitutes an environment is necessarily application-dependent. However, if package P1 loads packages P2 and P3, both of which load package P4, then P4 is loaded only once and thereafter its code and data is shared by P2 and P3.
When a package is loaded, all of its statements are evaluated in order, which may cause other packages to be loaded along
the way when import statements are encountered. A package's symbols are available for export to other packages
only after the package's body has been successfully evaluated. Unlike in Java, circularities are not allowed in the graph
of package imports.
To create packages A and B that access each others' symbols, we need to instead define a hidden package C that consists of all of the code that would have gone into A and B. Package C should define versions verA and verB and tag the symbols it exports with either verA or verB to indicate whether these symbols belong in package A or B. Package A should then be empty except for a directive (or several directives if there are multiple versions of A and verA) that reexports C's symbols tagged with verA. Similarly, package B should reexport C's symbols tagged with verB. To make this work we need a reexport directive. Is this really necessary? Also, do we want a mechanism for hiding package C from general view so that users can only use it through A or B?
We can export a symbol in a package by giving it public
Visibility.
To import symbols from a package we use the import statement:
import ImportList ;import ImportList Blockimport ImportList Block else CodeStatement, ... , ImportItemprotected] Identifier =] NonAssignmentExpression [: Version]The first form of the import statement (without a Block) imports symbols into
the current lexical scope. The second and third forms import symbols into the lexical scope of the Block.
If the imports are unsuccessful, the first two forms of the import statement throw an exception, while the last
form executes the CodeStatement after the else keyword.
An import statement can import one or more packages separated by commas. Each ImportItem
specifies one package to be imported. The NonAssignmentExpression should evaluate to a string
that contains a URI where the package may be found. If present, Version indicates the version
of the package's exports to be imported; if not present, Version defaults to version 1.
An ImportItem can introduce a name for the imported package if the NonAssignmentExpression
is preceded by Identifier =. Identifier
becomes bound (either in the current lexical scope or in the Block's scope) to the imported package
as a whole. Individual symbols can be extracted from the package by using Identifier with the
:: operator. For example, if package at URI P has public symbols
a and b, then after the statement
import x=P;
P's symbols can be referenced as either a, b, x::a, or x::b.
If an ImportItem contains the keyword protected, then
the imported symbols can only be accessed using the :: operator. If we were to import
package P using
import protected x=P;
then we'd have to access P's symbols using either x::a or x::b.
If two imports in the same scope import packages with clashing symbols, then neither symbol is accessible unless qualified
using the :: operator. If an imported symbol clashes with a symbol declared in the same
scope, then the declared symbol shadows the imported symbol. Scope rules 3 and
4 apply here as well, so the following code is illegal because a is referenced and then redefined:
import x=P; References P's
var y=a; //a Redefines
const a=17; //a in same scope
Version names cannot be imported.
Do we want to use URIs to locate packages, or do we want to invent our own, separate mechanism to do this?
Should we make private illegal outside a class rather than making it equivalent to
package?
Should we introduce a local Visibility prefix that explicitly
means that the declaration is visible locally? This wouldn't provide any additional functionality but it would provide a
convenient name for talking about the four kinds of visibility prefixes.
What should the default visibilities be? The current defaults are loosely modeled after Java:
| Definition Location | Default visibility |
|---|---|
| Package top level | package (equivalent to local in this case) |
| Inside a statement outside a function or class | local |
| Function or method code's top level | local |
| Inside a statement inside a function or method | local |
| Class declaration block's top level | package |
| Inside a statement inside a class declaration block | local |
|
JavaScript 2.0
Core Language
Language Declarations
|
Thursday, November 11, 1999
Language declarations allow a script writer to select the language to use for a script or a particular section of a script. A language denotes either a major language such as JavaScript 2.0 or a variation such as strict mode.
Developers often find it desirable to be able to write a single script that takes advantage of the latest features in a host environment such as a browser while at the same time working in older host environments that do not support these features. JavaScript 2.0's language declarations enable one to easily write such scripts. One may still need to use techniques such as the LANGUAGE HTML attribute to support pre-JavaScript 2.0 environments, but at least the number of such environments that will need to be special-cased will not increase in the future.
Language declarations are a dual of versioning: language declarations let a script run under a variety of historical hosts, while versioning lets a host run a variety of historical scripts.
;;A language declaration uses the syntax above. The keyword language is followed by one
or more language alternatives separated by vertical bars. Each language alternative consists of zero or more LanguageIds,
which are either identifiers or numbers. The first language alternative must contain at least one LanguageId.
The semicolon at the end of the LanguageDeclaration cannot
be inserted by line-break semicolon insertion.
When a JavaScript environment is lexing and parsing a JavaScript program and it encounters a language
declaration, it checks whether any of the language alternatives can be satisfied. If at least one can, the environment picks
the first language alternative that can be satisfied and processes the rest of the containing block (until the closing }
or until the end of the program if at the top level) using that language. A subsequent language
declaration in the same block can further change the language.
If no language alternatives can be satisfied, then the JavaScript environment skips to the end of the containing block
(until the closing matching } or until the end of the program if at the top level). Further
language declarations in the same block are ignored. No error occurs unless the failing
language declaration is executed as a statement, in which case it throws a syntax error.
[See rationale for a discussion of some of the issues here.]
The following LanguageIds are currently defined:
| LanguageId | Language |
|---|---|
1.0 |
JavaScript 1.0 |
1.1 |
JavaScript 1.1 |
1.2 |
JavaScript 1.2 |
1.3 |
JavaScript 1.3 |
1.4 |
JavaScript 1.4 |
1.5 |
JavaScript 1.5 (ECMAScript Edition 3) |
2.0 |
JavaScript 2.0 |
strict |
Strict mode |
traditional |
Traditional mode (default) |
It is meaningless to combine two or more numeric LanguageIds in the same alternative:
language 1.0 2.0;
will always fail. On the other hand, it is meaningful and useful to separate them with vertical bars. For example, one can indicate that one prefers JavaScript 2.1 but is willing to accept JavaScript 2.0 if 2.1 is not available:
language 2.1 | 2.0;
An empty alternative will always succeed. One can use it to indicate a preference for strict mode but willingness to work without it:
language strict |;
Language declarations are always lexically scoped and never extend past the end of the enclosing block.
This document specifies the 2.0 language and its strict and traditional modes. The consequences
of mixing in other languages are implementation-defined, but implementations are encouraged to do something reasonable.
Many parts of JavaScript 2.0 are relaxed or unduly convoluted due to compatibility requirements with JavaScript 1.5. Strict mode sacrifices some of this compatibility for simplicity and additional error checking. Strict mode is intended to be used in newly written JavaScript 2.0 programs, although existing JavaScript 1.5 programs may be retrofitted.
The opposite of strict mode is traditional mode, which is the default. A program can readily mix strict and traditional portions.
Strict mode has the following effects:
traditional
functions and in functions that explicitly allow a variable number of arguments. (The mode of the call site does not matter.)See also rationale.
|
JavaScript 2.0
Libraries
|
Thursday, November 11, 1999
This chapter presents the libraries that accompany the core language.
For the time being, only the libraries new to JavaScript 2.0 are described. The basic libraries such as String,
Array, etc. carry over from JavaScript 1.5.
|
JavaScript 2.0
Libraries
Types
|
Thursday, November 11, 1999
The following types are predefined in JavaScript 2.0:
| Type | Set of Values |
|---|---|
void |
undefined |
Null |
null |
boolean |
true and false |
integer |
Double-precision IEEE floating-point numbers that are mathematical integers, including positive and negative zeroes but excluding infinities and NaN |
number |
Double-precision IEEE floating-point numbers, including positive and negative zeroes and infinities and NaN |
character |
Single 16-bit unicode characters |
string |
Immutable strings of unicode characters |
Function |
All functions and null |
array |
All arrays |
Array |
All arrays and null |
type |
All types |
Type |
All types and null |
object |
All values except undefined and null |
Object |
All values except undefined |
Any |
All values |
By convention, predefined types whose names start with an upper-case letter include the value null, while
predefined types whose names start with a lower-case letter do not include null. User-defined type names do not
have to follow this convention.
Unlike in JavaScript 1.5, there is no distinction between objects and primitive values. All values can have methods. Some values can be sealed, which disallows addition of ad-hoc properties. User-defined classes can be made to behave like primitives.
The above type names are not reserved words. They are considered to be defined in a scope that encloses a package's global scope, so a package could use these type names as identifiers. However, defining these identifiers for other uses might be confusing because it would shadow the corresponding type names (the types themselves would continue to exist, but they could not be accessed by name).
The names Boolean, Number, and String have been deliberately left unused to enable
implementations to use them to emulate the behavior of the JavaScript 1.5 Boolean, Number, and String
wrapper objects. These are not part of JavaScript 2.0, but an implementation may support them for compatibility.
The name function could not be used to mean "all functions" because it is a reserved word. Use Function^*
instead.
A literal number that has an integral value has type integer; otherwise it has type number. integer
is a subtype of number, so every integer value is also a number value. A literal string
that has exactly one 16-bit unicode character has type character; otherwise it has type string.
character is a subtype of string, so every character value is also a string
value.
Any class defined using the class declaration is also a type that denotes the set of all of its and its descendants'
instances. These include the predefined classes, so Object, Date, etc. are all types. null
is an instance of a user-defined class c if it is an instance of any of c's superclasses.
We can use the following operators to construct more complex types. t and u are type expressions in the expressions below.
| Type | Values |
|---|---|
t | * |
null and all values of type t |
t ^ * |
All values of type t except null |
t | ? |
undefined and all values of type t |
t ^ ? |
All values of type t except undefined |
t | u |
All values belonging to either type t or type u or both |
t & u |
All values simultaneously belonging to both type t and type u |
The language does not syntactically distinguish type expressions from value expressions, so a type expression can also
use any other value operators such as !, +, and . (member access). Except for parentheses,
most of them are not very useful, though.
We write a b to denote that a is a subtype of b. Subtyping is transitive, so if a b and b c then a c is also true. Subtyping is also reflexive: a a.
The following subtype and type equivalence relations hold. t, u, and v represent arbitrary types.
t t |
u |
t & u
t |
t | t = t |
t & t = t |
t | u = u | t |
t & u = u & t |
(t | u) | v = t |
(u | v) |
(t & u) & v = t &
(u & v) |
t | * = t | Null |
t | ? = t | void |
integer number
object |
character string
object |
boolean object |
array object |
type object |
|
Array = array | Null |
Type = type | Null |
Object = object | Null |
|
t Any |
We write v t to indicate that v is a value that is a member of type t. The following subtyping rule holds: if v t and t s, then v s holds as well. Any particular value v is simultaneously a member of many types.
Types are generally used to restrict the set of objects that can be held in a variable or passed as a function argument. For example, the declaration
var integer x;
restricts the values that can be held in variable x to be integers.
A type declaration never affects the semantics of reading the variable or accessing one of its members. Thus, as
long as expression new MyType() returns a value of type MyType, the following two code snippets
are equivalent:
var MyType x = new MyType(); x.foo();
var x = new MyType(); x.foo();
This equivalence always holds, even if these snippets are inside the declaration of class MyType and foo
is a private field of that class. As a corollary, adding true type annotations does not change the meaning of a program.
A type is also a value (whose type is type) and can be used in expressions, assigned to variables, passed
to functions, etc. For example, the code
const type Z = integer;
function abs_val(Z i) Z {
return i<0 ? -i : i;
}
is equivalent to:
function abs_val(integer i) integer {
return i<0 ? -i : i;
}
As another example, the following method takes a type and returns an instance of that type:
method QueryInterface(type t) t { ... }
Coercions can take place in the following situations:
@t operator.In any of these cases, if v t, then
v is passed unchanged. If v t,
then an error occurs unless v is undefined, in which case the following coercions are tried, in order:
Null t, then null
is used instead of undefined.boolean t, then false
is used instead of undefined.integer t, then +0.0
is used instead of undefined.string t, then ""
is used instead of undefined.If none of the coercions succeeds, an error occurs.
Some types such as machine integers define additional coercions. These are listed along with descriptions of these types.
@ OperatorOne can explicitly request a coercion in an expression by using the @ operator. This operator has the same
precedence as . and coerces its left operand to the right operand, which must be a type. ... v@t ...
can be used in an expression and has the same effect as:
function coerce_to_t(t a) t {return a} //
Declared at the top level
... coerce_to_t(v) ...
assuming that coerce_to_t is an identifier not used anywhere else. The @
operator is useful as a type assertion as in w@Window. It's a postfix operator to simplify cascading expressions:
w@Window.child@Window.pos
is equivalent to:
(((w@Window).child)@Window).pos
A type cast performs more aggressive transformations than a type coercion. To cast a value to a given type, we use the type as a function, passing it the value as an argument:
type(value)
For example, integer(258.1) returns the integer 258, and string(2+2==4) returns
the string "true".
Need to specify the semantics of type casts. They are intended to mimic the current ToNumber, ToString, etc. methods.
Would we rather have the colon syntax for declaring types? Two sample declarations would be:
var x:integer = 7;
function f(a:integer, b:Object):number {...}
A few considerations:
{a:17, b:33}).
The latter would present a conundrum if we ever wanted to declare field types in an object literal. Some users have
been using these as a convenient facility for passing named arguments to functions.field be a reserved word.Do we want to make type expressions have a distinct syntax from value expressions? I have not heard any "pro" arguments. Here are the "con" arguments:
(expr1)(expr2),
is expr1 a type or a value expression? If the two have the same syntax, it doesn't matter.|
JavaScript 2.0
Libraries
Versions
|
Thursday, November 11, 1999
As a package evolves over time it often becomes necessary to change its exported interface. Most of these changes involve adding symbols (global and class members), although occasionally a symbol may be deleted or renamed. In a monolithic environment where all JavaScript source code comes preassembled from the same source, this is not a problem. On the other hand, if packages are dynamically linked from several sources then versioning problems are likely to arise.
One of the most common avoidable problems is collision of symbols. Unless we solve this problem, an author of a library will not be able to add even one symbol in a future version of his library because that symbol could already be in use by some client or some other library that a client also links with. This problem occurs both in the global namespace and in the namespaces within classes from which clients are allowed to inherit.
Here's an example of how such a collision can arise. Suppose that a library provider creates a library called BitTracker
that exports a class Data. This library becomes so successful that it is bundled with all web browsers produced
by the BrowsersRUs company:
package BitTracker;
public class Data {
public field author;
public field contents;
function save() {...}
};
function store(d) {
...
storeOnFastDisk(d);
}
Now someone else writes a web page W that takes advantage of BitTracker. The class Picture
derives from Data and adds, among other things, a method called size that returns the dimensions
of the picture:
import BitTracker;
class Picture extends Data {
public method size() {...}
field palette;
};
function orientation(d) {
if (d.size().h >= d.size().v)
return "Landscape";
else
return "Portrait";
}
The author of the BitTracker library, who hasn't seen W, decides in response to customer requests
to add a method called size that returns the number of bytes of data in a Data object. He then releases
the new and improved BitTracker library. BrowsersRUs includes this library with its latest NavigatorForInternetComputing
17.0 browser:
package BitTracker;
public class Data {
public field author;
public field contents;
public method size() {...}
function save() {...}
};
function store(d) {
...
if (d.size() > limit)
storeOnSlowDisk(d);
else
storeOnFastDisk(d);
}
An unsuspecting user U upgrades his old BrowsersRUs browser to the latest NavigatorForInternetComputing 17.0
browser and a week later is dismayed to find that page W doesn't work anymore. U's granddaughter Alyssa
P. Hacker tries to explain to U that he's experiencing a name conflict on the size methods, but U
has no idea what she is talking about. U attempts to contact the author of W, but she has moved on to
other pursuits and is on a self-discovery mission to sub-Saharan Africa. Now U is steaming at BrowsersRUs, which
in turn is pointing its finger at the author of BitTracker.
How could the author of BitTracker have avoided this problem? Simply choosing a name other than size
wouldn't work, because there could be some other page W2 that conflicts with the new name. There are several possible
approaches:
com_netscape_length
method while MIT's objects used the edu_mit_length method.The last approach appears to be the most desirable because it places the smallest burden on casual users of the language, who merely have to import the packages they use and supply the current version numbers in the import statements. A package author has to be careful not to disturb the set of visible prior-version symbols when releasing an updated package, but authors of dynamically linkable packages are assumed to be more sophisticated users of the language and could be supplied with tools to automatically check updated packages' consistency.
The versioning system in JavaScript 2.0 only affects exports of symbols. The concept of a version does not apply to a package's internal code; it is up to package developers to ensure that newer releases of their packages continue to behave compatibly with older ones.
A version describes the API of a package. A release refers to the entirety of a package, including its code. One release can export many versions of its API. A package developer should make sure that multiple releases of a package that export version V export exactly the same set of symbols in version V.
As an example, suppose that a developer wrote a sorting package P with functions sort and merge
that called bubble sort in version "1.0". In the next release the developer adds a function called
stablesort and includes it in version "2.0". In a subsequent release the developer changes
the sort algorithm to a quicksort that calls stablesort as a subroutine. That last release of the
package might look like:
const V1_0 = new Version("1.0",""); // The "" makes version "1.0" be the default
const V2_0 = new Version("2.0","1.0");
public var serialNumber;
public function sort(compare: Function, array: any[]):any[] {...}
public function merge(compare: Function, array1: any[], array2: any[]):any[] {...}
V2_0 function stablesort(compare: Function, array: any[]):any[] {...}
Suppose, further, that client package C1 imports version "1.0" of P, client
package C2 simultaneously imports version "2.0" of P, and a search for P
yields the latest release described above. There would be only one instance of P running -- the latest release.
Both clients would get the same sort and merge functions, and both would see the same serialNumber
variable (in particular, if client C1 wrote to serialNumber, then client C2 would see the
updated value), but only client package C2 would see the stablesort function. Both clients would get
the quicksort release of sort. If client package C1 defined its own stablesort function,
then that function would not conflict with P's stablesort; furthermore, P's sort
would still refer to P's stablesort in its internal subroutine call.
Had only the first release of P been available, client package C2 would obtain an error because version
2 of P's API would not be available. Client C1 could run normally, although the sort function
it calls would use bubble sort instead of the quicksort.
Note that the last release of P did not change the API so it did not need a new version. Of course, it could define a new version if for some reason it wanted clients to be able to demand the last release of P even though its API is the same as the second release.
A version name Version is a quoted string literal such as "1.2" or
"Private Interface 2.0". Two version names are equal if their strings are equal. A special version
whose name is the empty string "" is called the default version.
A package must declare every version it uses except "", which is declared by default if not explicitly
declared. A version must be declared before its first use. A given version name may be declared only once per package. A package
declares a version name Version using the version declaration:
version Version [> VersionList] ;version Version [= Version] ;, ... , VersionA version declaration cannot be nested inside a ClassDefinition's Block.
If Visibility is present, it must be either private, package,
or public (without VersionsAndRenames). Unlike in other declarations,
the default is public, which makes Version accessible by
other packages. A private or package Visibility
hides its Version from other packages; such a Version can be used
only by being included in the VersionList of another Version. Also
unlike other declarations, all Version declarations are global.
If the Version being declared is followed by a > and
a VersionList, then the Version is said to be greater than
all of the Versions in the VersionList. We write v1 :>
v2 to indicate that v1 is greater than v2 and v1 :
v2 to indicate that either v1 and v2 are the same version or v1 :> v2.
Order is transitive, which means that if v1 :> v2 and v2 :> v3, then v1
:> v3. This order induces a partial order on the set of all versions. It is possible for two versions to be
unordered with respect to each other, in which case they are not equal and neither is greater than the other.
If the Version v1 being declared is followed by a =
and another Version v2, then v1 becomes an alias for v2, and
they may be used interchangeably.
A VersionRange specifies a subset of all versions. This subset contains all versions that are both greater than or equal to a given Version1 and less than or equal to a given Version2. A VersionRange can have either of the following forms:
.. [Version2]The first form specifies the one-element set {Version}. The second form specifies the set of all Versions v such that v : Version1 and Version2 : v. If Version1 is omitted, the condition v : Version1 is dropped. If Version2 is omitted, the condition Version2 : v is dropped.
The original version of this specification allowed both strings and numbers as Version names.
Two version names were equal if their toString representations were identical, so version names 2.0
and "2" were identical but 2.0 and "2.0" were not. In addition, numbered versions
had an implicit order: For any two versions v1 and v2 whose names could be represented as numbers,
v1 :> v2 if and only if v1 was numerically greater than v2. Additionally,
every version except 0 was greater than version 0. It was an error to define explicit version
containment relations that would violate this default order, directly or indirectly.
Numbered Version names were dropped for simplicity and to avoid confusion with versions
such as 1.2.3 (which would be a syntax error unless quoted).
Another, simpler, approach is to require all Version names to be nonnegative integers (without quotes). Versions would not need to be declared, and all versions would be totally ordered in numerical order. A disadvantage of this approach is that the total order keeps versions from being branched.
Currently version definitions are fixed. These could be turned into function calls that define versions and list their
relationships. If we can get a variable or constant to hold a set of version names, then we could use these variables rather
than specific version names in the VersionsAndRenames lists after public keywords.
This would provide another level of abstraction and flexibility.
Yet another approach is to consolidate all of the information in VersionsAndRenames into
a set of export statements, say, at the top of the file rather than being interspersed throughout a package
along with public declarations. This would make it easier to see all of the identifiers exported by a particular
version of the package, but it would also likely lead to inconsistencies when someone forgets to update an export
statement after inserting another variable, function, field, or method definition. Such errors would likely be caught after
a package has been released.
|
JavaScript 2.0
Libraries
Machine Types
|
Thursday, November 11, 1999
The machine types library is an optional library that provides additional low-level types for use in JavaScript 2.0 programs.
On implementations that support this library, these types provide faster, Java-style integer operations that are useful for
communicating between JavaScript 2.0 and other programming languages and for performance-critical code. These types are not
intended to replace number and integer for general-purpose scripting.
When the machine types library is imported via an import of "machine-types" version 1, the following types
become available:
|
Type |
Values |
|---|---|
byte |
Machine integers between -128 and 127 inclusive |
ubyte |
Machine integers between 0 and 255 inclusive |
short |
Machine integers between -32768 and 32767 inclusive |
ushort |
Machine integers between 0 and 65535 inclusive |
int |
Machine integers between -2147483648 and 2147483647 inclusive |
uint |
Machine integers between 0 and 4294967295 inclusive |
long |
Machine integers between -9223372036854775808 and 9223372036854775807 inclusive |
ulong |
Machine integers between 0 and 18446744073709551615 inclusive |
Values belonging to the eight machine integer types above are distinct from each other and from values of type integer.
Thus, byte(7) is distinct from int(7), which in turn is distinct from the plain integer 7.
However, the coercions listed below usually hide these distinctions.
No subtype relations hold between the machine types.
The above type names are not reserved words.
The following coercions take place:
integer value v can be coerced to one of the machine integer types M
if v is within range of the target type M. Both +0 and -0 coerce to the
machine integer 0. Note that non-integer numbers are not coerced to any of the machine types.byte| =
|ubyte| = 256, |short| = |ushort| = 65536, |int| = |uint|
= 232, and |long| = |ulong| = 264.integer or number as long as m
can be represented exactly using the IEEE double-precision floating-point format. 0 always becomes +0.Machine integers support the arithmetic operators +, -, *, /, %,
comparisons ==, !=, <, >, <=, =>,
and bitwise logical operations ~, &, |, ^, <<,
>>. If supplied two operands of different machine integer types M1
and M2, all of these binary operators except << and >>
first coerce both operands to the same type M. If M1 appears before M2
in the list byte, ubyte, short, ushort, int, uint,
long, ulong, then M is M2; otherwise M
is M1. Then these operators perform the operation and finally return the result as a
value of type M. If the result is not within range of the target type M, it is treated modulo |M|.
If one of the operands is a machine integer of type M and the other is an integer value v,
then v is first coerced to type M.
The result type of a shift expression (<< or >>) is the same as the type of its first
operand. The second operand's type does not affect the type of the result. Right shifts are signed if the first operand has
type byte, short, int, or long, and unsigned if it has type ubyte,
ushort, uint, or ulong.
These rules are designed to permit machine integer operations to be implemented as single instructions on most processor
architectures yet give predictable results. Overflows wrap around instead of signaling errors because such behavior is useful
for many bit-manipulation algorithms and permits much better optimization of performance-critical code. Code that is concerned
about overflows should be using regular integer instead of the machine integer types.
Why are values of the eight machine integer types distinct? This was done because of a desire to allow arithmetic operators
to only support 32 bits when operating on int values. Let's take a look at the alternative:
Suppose we unify the values of all eight machine types so that int(2000000000) is indistinguishable from
long(2000000000). To what precision should an operator like + calculate its results? Clearly,
if we're adding two long values and the result is within the range of long values, then we'd
expect to get the right result. In particular, long(2000000000) + long(2000000000)
should yield long(4000000000). However, long(2000000000) is indistinguishable from int(2000000000),
so int(2000000000) + int(2000000000) should also yield long(4000000000),
which is not representable as an int value. Thus, even if both operands are known to be int
values, the + operator has to use 64-bit arithmetic.
If a has type int and we compute a+1, then we have to use 64-bit arithmetic
because the result could be 2147483648. However, if we compute var int r = a+1 instead, then a smart compiler
could make do with 32-bit arithmetic because the result is treated modulo 232. However, this trick would not
work with an expression such as var boolean b = a+1 > 0.
The alternative is viable but it leads to more demand for 64-bit arithmetic. It does have the advantage that one does not need to worry about intermediate overflows as long as the values don't approach 264.
Do we want to support a float type for holding single-precision IEEE floating-point numbers? This type may
be useful for:
floats originally written in another language such as C++ or Java that one would want to replicate
exactly in JavaScript; without support for the float type the JavaScript version would give different answers
from the original.One difficulty with supporting float is deciding what the coercion rules should be. If we invoke +
with one number operand and one float operand, should the result be a float or a
number? One might expect number, but this makes adding constants to floats using
single-precision arithmetic awkward since every constant is a number. If s is a float,
the expression s+1 would yield a number instead of a float because 1
is a number. One would have to write s+float(1) instead.
|
JavaScript 2.0
Libraries
Operator Overloading
|
Thursday, November 11, 1999
Operator overloading is useful to implement Spice-style units without having to add units to the core of the JavaScript 2.0 language. Operator overloading is done via an optional library that, when imported, exposes several additional methods of the Object class. This library is analogous to the internationalization library in that it does not have to be present on all implementations of JavaScript 2.0; implementations without this library do not support operator overloading.
|
JavaScript 2.0
Formal Description
|
Thursday, November 11, 1999
This chapter presents the formal syntax and semantics of JavaScript 2.0. The syntax notation and semantic notation sections explain the notation used for this description. A simple metalanguage based on a typed lambda calculus is used to specify the semantics.
The syntax and semantic sections are available in both HTML 4.0 and Microsoft Word 98 RTF formats. In the HTML versions each use of a grammar nonterminal or metalanguage value, type, or field is hyperlinked to its definition, making the HTML version preferred for browsing. On the other hand, the RTF version looks much better when printed. The fonts, colors, and other formatting of the various grammar and semantic elements are all encoded as CSS (in HTML) or Word (in RTF) styles and can be altered if desired.
The syntax and semantics sections are machine-generated from code supplied to a small engine that can type-check and execute the semantics directly. This engine is in the CVS tree at mozilla/js/semantics; the input files are at mozilla/js/semantics/JS20.
|
JavaScript 2.0
Formal Description
Semantic Notation
|
Thursday, November 11, 1999
To precisely specify the semantics of JavaScript 2.0, we use the notation described below to define the behavior of all JavaScript 2.0 constructs and their interactions.
The semantics describe the meaning of a JavaScript 2.0 program in terms of operations on simpler objects borrowed from mathematics collectively called semantic values. Semantic values can be held in semantic variables and passed to semantic functions. The kinds of semantic values used in this specification are summarized in the table below and explained in the next few sections:
| Semantic Value Examples | Description |
|---|---|
| The result of a nonterminating computation | |
| syntaxError | The result of a computation that returns by throwing a semantic exception |
| The result of a semantic function that does not return a useful value | |
| true, false | Booleans |
| -3, 0, 1, 2, 93 | Mathematical integers |
| 1/2, -12/7 | Mathematical rational numbers |
| 1.0, 3.5, 2.0e-10, -0.0, -, NaN | Double-precision IEEE floating-point numbers |
A,
b,
«LF»,
«uFFFF» |
Characters (Unicode 16-bit code points) |
| [value0, ... , valuen-1] | Vectors indexed lists of semantic values |
,
abc
, 1«TAB»5
|
Strings |
| {value1, value2, ... , valuen} | Mathematical sets of semantic values |
| name1 value1, name2 value2, ... , namen valuen | Tuples with named member semantic values |
| name or name value | Tagged semantic values |
| function(n: Integer) n*n | Semantic functions |
There is a special semantic value (pronounced as "bottom") that represents the result of an inconsistent or nonterminating computation. Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to or calling a semantic function with as any argument also yields without evaluating any remaining operands or arguments (in technical terms, semantic functions and operators are strict in all of their arguments unless specified otherwise).
If interpreting a JavaScript program according to the semantics here gives a result, an actual implementation executing that JavaScript program will either fail to terminate or throw an exception because it runs out of memory or stack space.
Semantic values of the form value represents the result of a computation that throws a semantic exception. value is the exception's value (which must be a member of the SemanticException semantic type). Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to value or calling a semantic function with value as any argument also yields value (with the same value) without evaluating any remaining operands or arguments.
The throw statement takes a value v and returns v. The catch statement converts v back to v.
Semantic functions that do not return a useful value return the semantic value . There are no operations defined on .
The semantic values true and false are booleans. The not, and, or, and xor operators operate on booleans. Like most other operators, and, or, and xor evaluate both operands before returning a result; these operators do not short-circuit.
Unless specified otherwise, numbers in the semantics written without a slash or decimal point are mathematical integers: ..., -3, -2, -1, 0, 1, 2, 3, .... The usual mathematical operators +, -, *, and unary - can be used on integers. Integers can be compared using =, , <, , >, and .
Numbers in the semantics written with a slash are mathematical rational numbers. Every integer is also a rational. Rational numbers include, for example, 0, 1, 2, -1, 1/2, -12/7, and -24/14; the last two are different ways of writing the same rational number. The usual mathematical operators +, -, *, /, and unary - can be used on rationals. Rationals can be compared using =, , <, , >, and .
Numbers in the semantics written with a decimal point are double-precision IEEE floating-point numbers (often abbreviated as doubles), including distinct +0.0, -0.0, +, -, and NaN. Doubles are distinct from integers and rationals; when writing doubles in the semantics, we always include a decimal point to distinguish them from integers and rationals.
Doubles other than +, -, and NaN are called finite. We define the significand of a finite double d as follows:
Characters are single Unicode 16-bit code points. We write them enclosed in single quotes
and . There are exactly 65536 characters: «u0000»,
«u0001»,
...,A,
B,
C,
..., «uFFFF»
(see also notation for non-ASCII characters). Unicode surrogates are considered
to be pairs of characters for the purpose of this specification.
The characterToCode and codeToCharacter semantic functions convert between characters and their integer Unicode values.
A semantic vector contains zero or more elements indexed by integers starting from zero. We write a vector value by enclosing a comma-separated list of values inside bold brackets:
[element0, element1, ... , elementn-1]
For example, the following semantic value is a vector whose elements are four strings:
[parsley, sage, rosemary, thyme]
The empty vector is written as [].
Let u = [e0, e1, ... , en-1] and v = [f0, f1, ... , fm-1] be vectors, i and j be integers, and x be a value. The following notations describe common operations on vectors:
| Notation | Result Value |
|---|---|
| u v | The concatenated vector [e0, e1, ... , en-1, f0, f1, ... , fm-1] |
| |u| | The length n of the vector |
| u[i] | The ith element ei, or if i<0 or in |
| u[i ... j] | The vector slice [ei, ei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive, or if i<0, jn, or j<i-1. The result is the empty vector [] if j=i-1. |
| u[i ...] | The vector slice [ei, ei+1, ... , en-1] consisting of all elements of u between the ith and the end, or if i<0 or i>n. The result is the empty vector [] if i=n. |
| u[i x] | The vector [e0, ... , ei-1, x, ei+1, ... , en-1] with the ith element replaced by the value x and the other elements unchanged, or if i<0 or in |
Semantic vectors are functional; there is no notation for modifying a semantic vector in place.
A semantic string is merely a vector of characters. For notational convenience we can write a string literal as zero or more characters enclosed in double quotes. Thus,
Wonder«LF»
is equivalent to:
[W, o, n, d, e, r, «LF»]
In addition to all of the other vector operations, we can use =, , <, , >, and to compare two strings.
A semantic set is an unordered collection of values. Each value may occur at most once in a set. There must be a well-defined = semantic operator defined on all pairs of values in the set, and that operator must induce an equivalence relation.
A semantic set is denoted by enclosing a comma-separated list of values inside braces:
{element1, element2, ... , elementn}
The empty set is written as {}.
For example, the following set contains seven integers:
{3, 0, 10, 11, 12, 13, -5}
When using elements such as integers and characters that have an obvious total order, we can also write sets by using the ... range operator. For example, we can rewrite the above set as:
{0, -5, 3 ... 3, 10 ... 13}
If the beginning of the range is equal to the end of the range, then the range consists of only one element: {7 ... 7} is the same as {7}. If the end of the range is one "less" than the beginning, then the range contains no elements: {7 ... 6} is the same as {}. If the end of the range is more than one "less" than the beginning, then the set is .
Let A and B be sets and x be a value. The following notations describe common operations on sets:
| Notation | Result Value |
|---|---|
| |A| | The number of elements in the set A; if A has infinitely many elements |
| min A | If there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite descending sequence of elements with no lower bound in A) |
| max A | If there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite ascending sequence of elements with no upper bound in A) |
| A B | The intersection of sets A and B (the set of all values that are present both in A and in B) |
| A B | The union of sets A and B (the set of all values that are present in at least one of A or B) |
| A - B | The difference of sets A and B (the set of all values that are present in A but not B) |
| x A | Return true if x is an element of set A and false if not |
| A = B | Return true if the two sets A and B are equal and false otherwise. Sets A and B are equal if every element of A is also in B and every element of B is also in A. |
min and max are only defined for sets whose elements can be compared with <.
A semantic tuple is an aggregate of several named semantic values. Tuples are sometimes called records or structures in other languages. A tuple is denoted by a comma-separated list of names and values between bold triangular brackets:
name1 value1, name2 value2, ... , namen valuen
Each namei valuei pair is called a field. The order of fields in a tuple is irrelevant, so x 3, y 4 is the same as y 4, x 3. A tuple's names must all be distinct.
Let w be an expression that evaluates to a tuple name1 value1, name2 value2, ... , namen valuen. We can extract the value of the field named namei from w by using the notation w.namei. w is required to have this field. For example, x 3, y 4.x is 3.
In the HTML versions of the semantics, each use of namei is linked back to its tuple type's definition.
A semantic oneof is a pair consisting of a name (called the tag) and a value. Oneofs are sometimes called variants or tagged unions in other languages. A oneof is denoted by writing the tag followed by the value:
name value
For brevity, when value is , we can omit it altogether, so red is the same as red .
Let o be an expression that evaluates to some oneof n v. We can perform the following operations on o:
| Notation | Result Value |
|---|---|
| o.name | The value v if n is name; otherwise |
| o is name | true if n is name; false otherwise |
For example, (red 5) is blue evaluates to false, while (red 5) is red evaluates to true. (red 5).red evaluates to 5.
In addition to the operators above, the case statement evaluates one of several expressions based on a oneof tag.
In the HTML versions of the semantics, each use of name is linked back to its oneof type's definition.
A semantic function receives zero or more arguments, performs computations, and returns a result. We write a semantic function as follows:
function(param1: type1, ... , paramn: typen) body
Here param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, and body is an expression that computes the function's result. When the function is called with argument values v1 through vn, the function's body is evaluated and the resulting value returned to the caller. body can refer to the parameters param1 through paramn; each reference to a parameter parami evaluates to the corresponding argument value vi. Arguments are passed by value (which in this language is equivalent to passing them by reference because there is no way to write to a parameter).
Function parameters are statically scoped. When functions are nested and an inner function f defines a parameter with the same name as a parameter of an outer function g, then f's parameter shadows g's parameter inside f.
The only operation allowed on a semantic function f is calling it, which we do using the f(arg1, ..., argn) syntax. In the presence of side effects, f is evaluated first, followed by the argument expressions arg1 through argn, in left-to-right order. If the result of evaluating f or any of the argument expressions is , then the call immediately returns without evaluating the following argument expressions, if any. If the result of evaluating f or any of the argument expressions is v for some value v, then the call immediately returns that v without evaluating the following argument expressions, if any. Otherwise, f's body is evaluated and the resulting value returned to the caller.
A semantic type is a possibly infinite set of semantic values. Names of semantic types are shown in Capitalized Red Small Caps, and compound semantic type expressions are in red.
We use semantic types to make the semantics more readable by declaring the semantic type of each semantic variable (including function argument variables). Each such declaration states that the only values that will be stored in a semantic variable will be members of that variable's semantic type. These declarations can be proven statically. The JavaScript semantics have been machine type-checked to ensure that every type declaration holds, so, for example, if the semantics state that variable x has type Integer then there does not exist any place that could assign the value true to x.
Semantic type annotations allow us to restrict the description of each semantic operator and function to only describe its behavior on arguments that are members of the arguments' semantic types. Thus, for example, we need not describe the behavior of the + semantic operator when passed the semantic values true and as operands because we can prove that this case cannot arise.
Every semantic type includes the values and v for all values v whose semantic type is SemanticException. For brevity we do not list and v in the tables below.
The following are the basic semantic types:
The type Rational includes Integer as a subtype because every integer is also a rational number. Except for and v, the types Rational and Double are disjoint.
We can construct compound semantic types using the notation below. Here t, t1, t2, ..., tn represent some existing semantic types.
| Type | Set of Values |
|---|---|
| t[] | All vectors [v0, ... , vn-1] all of whose elements v0, ... , vn-1 have type t. Note that the empty vector [] is a member of every vector type t[]. |
| {t} | All sets {v1, v2, ... , vn} all of whose elements v1, ... , vn have type t. Note that the empty set {} is a member of every set type {t}. |
| tuple {name1: t1; ... ; namen: tn} | All tuples name1 v1, ... , namen vn for which each vi has type ti for 1 i n. The namei's must be distinct; the order in which the namei: ti fields are listed does not matter. |
| oneof {name1: t1; ... ; namen: tn} | All oneofs of the form namei v, where 1 i n and v has type ti. If tk is Void, then namek: tk can be abbreviated as simply namek in the oneof semantic type syntax. The namei's must be distinct; the order in which the namei: ti alternatives are listed does not matter. |
| t1 t2 ... tn t | Some* functions that take n arguments of types t1
through tn respectively and produce a result of type t.
If n is zero (the function takes no arguments), we write this type as () t. * Technically speaking, this semantic type includes only functions that are continuous in the domain-theoretical sense; this avoids set-theoretical paradoxes. |
| () t |
The type constructors earlier in the table bind tighter than ones later in the table, so, for example, Integer[] Rational[] is equivalent to (Integer[]) (Rational[]) (a function that takes a vector of Integers and returns a vector of Rationals) rather than ((Integer[]) Rational)[] (a vector of functions, each of which takes a vector of Integers and returns a Rational). In the rare cases where this is needed, parentheses are used to override precedence.
The table below lists the semantic operators in order from the highest precedence (tightest-binding) to the lowest precedence (loosest-binding). Operators under the same heading of the table have the same precedence and associate left-to-right, so, for example, 7-3+2-1 is interpreted as ((7-3)+2)-1 instead of 7-(3+(2-1)) or (7-(3+2))-1. When needed, parentheses can be used to group expressions.
The type signatures of the operators are also listed. Some operators are polymorphic; t, t1, t2, ..., and tn can represent any semantic types. The types of some operators are underdetermined; for example, [] can have type t[] for any type t. In these cases the particular choice of type is inferred from the context.
Each operator in the table below is strict: it evaluates all of its operands left-to-right, and if any operand evaluates to , then the operator immediately returns without evaluating the following operands, if any. If any operand evaluates to v for some value v, then the operator immediately returns that v without evaluating the following operands, if any.
| Operator | Signatures | Description |
|---|---|---|
| Nonassociative Operators | ||
| (x) | t t | Return x. Parentheses are used to override operator precedence. |
| |u| | t[] Integer | u is a vector [e0, e1, ... , en-1]. Return the length n of that vector. |
| {t} Integer | The number of elements in the set u; if u has infinitely many elements | |
| [x0, x1, ... , xn-1] | t ... t t[] | Return a vector with the elements x0, x1, ... , xn-1. |
| {x1, x2, ... , xn} | t ... t {t} | Return a set with the elements x1, x2, ... , xn. Any duplicate elements are included only once in the set. When t is Integer or Character, we can also replace any of the xi's by a range xi ... yi that contains all integers or characters greater than or equal to xi and less than or equal to yi. yi must not be less than xi "minus" one. |
| name1 x1, ... , namen xn | t1 ... tn tuple {name1: t1; ... ; namen: tn} | Return a tuple with the fields name1 x1, ... , namen xn. |
| name | oneof {name; name2: t2; ... ; namen: tn} | Return a oneof value with tag name and value . |
| Action[nonterminali] | Determined by Action's declaration | This notation can only be used inside an action definition for a grammar production that has nonterminal nonterminal on the production's right side. Return the value of action Action invoked on the ith instance of nonterminal nonterminal on the right side of . The subscript i can be omitted if there is only one instance of nonterminal nonterminal in . |
| nonterminali | Character | This notation can only be used inside an action definition for a grammar production that has
nonterminal nonterminal on
the production's left or right side. Furthermore, every complete expansion of grammar nonterminal nonterminal must
expand it into a single character. Return the character to which the ith instance of nonterminal nonterminal on the right side of expands. The subscript i can be omitted if there is only one instance of nonterminal nonterminal in . If the subscript is omitted and nonterminal nonterminal appears on the left side of , then this expression returns the single character to which this whole production expands. |
| Suffix Operators | ||
| u[i] | t[] Integer t | u is a vector [e0, e1, ... , en-1]. Return the ith element ei, or if i<0 or in. |
| u[i ... j] | t[] Integer Integer t[] | u is a vector [e0, e1, ... , en-1]. Return the vector slice [ei, ei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive, or if i<0, jn, or j<i-1. The result is the empty vector [] if j=i-1. |
| u[i ...] | t[] Integer t[] | u is a vector [e0, e1, ... , en-1]. Return the vector slice [ei, ei+1, ... , en-1] consisting of all elements of u between the ith and the end, or if i<0 or i>n. The result is the empty vector [] if i=n. |
| u[i x] | t[] Integer t t[] | u is a vector [e0, e1, ... , en-1]. Return the vector [e0, ... , ei-1, x, ei+1, ... , en-1] with the ith element replaced by the value x and the other elements unchanged, or if i<0 or in. |
| w.namei | tuple {name1: t1; ... ; namen: tn} ti | w is a tuple name1 v1, ... , namen vn. Return the value vi of w's field named namei. |
| oneof {name1: t1; ... ; namen: tn} ti | w is a oneof namek v for some k between 1 and n inclusive. Return the value v if namei is namek, or if not. | |
| f(x1, ..., xn) | (t1 ... tn t) t1 ... tn t | Call the function f with the arguments x1 through xn and return the result. |
| Prefix Operators | ||
| -x | Integer Integer
or Rational Rational |
The mathematical negation of x |
| min A | {t} t | Return the minimal element of set A. Specifically, if there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite descending sequence of elements with no lower bound in A). The type t must have = and < operations that define a total order. |
| max A | {t} t | Return the maximal element of set A. Specifically, if there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite ascending sequence of elements with no upper bound in A). The type t must have = and < operations that define a total order. |
| name x | t oneof {name: t; name2: t2; ... ; namen: tn} | Return a oneof value with tag name and value x. |
| Multiplicative Operators | ||
| x * y | Integer Integer Integer
or Rational Rational Rational |
The mathematical product of x and y |
| x / y | Rational Rational Rational | The mathematical quotient of x and y; if y=0 |
| A B | {t} {t} {t} | The intersection of sets A and B (the set of all values that are present both in A and in B) |
| Additive Operators | ||
| x + y | Integer Integer Integer
or Rational Rational Rational |
The mathematical sum of x and y |
| x - y | The mathematical difference of x and y | |
| u v | t[] t[] t[] | u is a vector [e0, e1, ... , en-1] and v is a vector [f0, f1, ... , fm-1]. Return the concatenated vector [e0, e1, ... , en-1, f0, f1, ... , fm-1]. |
| A B | {t} {t} {t} | The union of sets A and B (the set of all values that are present in at least one of A or B) |
| A - B | {t} {t} {t} | The difference of sets A and B (the set of all values that are present in A but not B) |
| Comparison Operators | ||
| x = y | Rational Rational Boolean
or Character Character Boolean or String String Boolean or {t} {t} Boolean |
Comparisons return true if the relation holds or false
if not. Rationals are compared mathematically. Characters are compared according to their code points. Two strings are equal when they have the same lengths and contain exactly the same sequences of characters. A string x is less than string y when either x is the empty string and y is not empty, the first character of x is less than the first character of y, or the first character of x is equal to the first character of y and the rest of string x is less than the rest of string y. Two sets x and y are equal if every element of x is also in y and every element of y is also in x. Only = and can be used to compare sets. |
| x y | ||
| x < y | ||
| x y | ||
| x > y | ||
| x y | ||
| x A | t {t} Boolean | Return true if x is an element of set A and false if not |
| o is namei | oneof {name1: t1; ... ; namen: tn} Boolean | o is a oneof namek v for some k between 1 and n inclusive. Return true if namei is namek, or false otherwise. |
| Logical Negation | ||
| not a | Boolean Boolean | true if a is false; false if a is true |
| Logical Conjunction | ||
| a and b | Boolean Boolean Boolean | true if both a and b are true; false if at least one of a and b is false |
| Logical Disjunction | ||
| a or b | Boolean Boolean Boolean | true if at least one of a and b is true; false if both a and b are false |
| a xor b | true if a is true and b is false or a is false and b is true; false if both a and b are true or both a and b are false | |
Semantic statements are similar to the semantic operators above in that they are also used to construct expressions, take zero or more operands, and return a value. Unlike other semantic operators, semantic statements are usually non-strict: they do not always evaluate all of their operands. Semantic statements have lower precedence than any of the semantic operators above.
Some semantic statements are syntactic sugars, which means that they are defined as macros that expand into other, simpler statements and operators.
function(param1: type1, ... , paramn: typen) body
See the description of function values.let var1: type1 = expr1; ... ; varn: typen = exprn in body
Evaluate expr1 through exprn in order and save the results. If any expri evaluates to , then immediately return without evaluating the following expr's. If any expri evaluates to v for some value v, then immediately return that v without evaluating the following expr's. Otherwise evaluate body with new local variable bindings of var1 through varn bound to the saved results of evaluating expr1 through exprn, respectively. Return the result of evaluating body.
type1 through typen are the local variables' respective semantic types. The type of the entire let expression is the type of its body.
The let expression above is syntactic sugar for:
(function(var1: type1, ... , varn: typen) body)(expr1, ... , exprn)
if expr then bodytrue else bodyfalse
Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to either true or false. If it evaluated to true, then evaluate bodytrue and return its result. If expr evaluated to false, then evaluate bodyfalse and return its result.
expr must have type Boolean. The entire if expression has any type t such that both bodytrue has type t and bodyfalse has type t.
case expr of
name1(var1: type1): body1;
...
namen(varn: typen): bodyn;
end
Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to a oneof name v where name matches namei for some i between 1 and n inclusive. Evaluate the corresponding bodyi with a new local variable vari bound to v. Return bodyi's result.
If we are not interested in using the oneof's value for a particular bodyi, we can shorten that bodyi's clause from:
namei(vari: typei): bodyi
to:
namei: bodyi
In this case no local variable is bound while evaluating bodyi.
expr must have type oneof {name1: type1; ... ; namen: typen}. The entire case expression has any type t such that all of its bodyi's have type t. The namei's must be distinct. The order in which the case clauses are listed does not matter.
throw expr
Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to some value v, in which case return v.
expr must have type SemanticException. The entire throw expression has any type whatsoever (because every semantic type includes v).
try
bodytry
catch (var: SemanticException)
bodyhandler
Evaluate bodytry to obtain a value w. If w does not have the form v for some value v, then return w. Otherwise w is v for some value v. In this case evaluate bodyhandler with a new local variable var bound to v and return bodyhandler's result.
The type of var is always SemanticException. The entire try-catch expression has any type t such that both bodytry has type t and bodyhandler has type t.
The sections below list the predefined semantic functions, their type signatures, and short descriptions. All functions are strict and evaluate their arguments left-to-right.
These functions perform bitwise operations on integers. The integers are treated as though they were written in binary notation, with each 1 bit representing true and 0 bit representing false. The integers must be nonnegative.
| Function | Signature | Description |
|---|---|---|
| rationalToDouble(r) | Rational Double | The rational number r rounded to the nearest IEEE double-precision floating-point value as follows: Consider the set of all doubles, with -0.0, +, -, and NaN removed and with two additional values added to it that are not representable as doubles, namely 21024 and -21024. Choose the member of this set that is closest in value to r. If two values of the set are equally close, choose the one with an even significand; for this purpose, the two extra values 21024 and -21024 are considered to have even significands. Finally, if 21024 was chosen, replace it with +; if -21024 was chosen, replace it with -; if +0.0 was chosen, replace it with -0.0 if and only if r < 0; any other chosen value is used unchanged. The result is the value of rationalToDouble(r). This procedure corresponds exactly to the behavior of the IEEE 754 "round to nearest" mode. |
| Function | Signature | Description |
|---|---|---|
| characterToCode(c) | Character Integer | The number of the Unicode code point c |
| codeToCharacter(i) | Integer Character | The Unicode code point number i, or if i<0 or i>65535 |
The function digitValue is defined as follows:
digitValue(c: Character) : Integer
= if c {0 ... 9}
then characterToCode(c) - characterToCode(0)
else if c {A ... Z}
then characterToCode(c) - characterToCode(A) + 10
else if c {a ... z}
then characterToCode(c) - characterToCode(a) + 10
else
| Function | Signature | Description |
|---|---|---|
| isOrdinaryInitialIdentifierCharacter(c) | Character Boolean | Return true if the nonterminal OrdinaryInitialIdentifierCharacter can expand into c and false otherwise |
| isOrdinaryContinuingIdentifierCharacter(c) | Character Boolean | Return true if the nonterminal OrdinaryContinuingIdentifierCharacter can expand into c and false otherwise |
We can define a global semantic constant named var as follows:
var : type = expr
expr should evaluate to a value of type type. expr should not have side effects, and it should not evaluate to .
In the HTML versions of the semantics, each reference to the global semantic constant var is linked to var's definition.
We can define a global semantic function named f as follows:
f(param1: type1, ... , paramn: typen) : type = body
param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, type is the function result's semantic type, and body is an expression that computes the function's result.
The above definition is syntactic sugar for the global constant definition:
f : type1 type2 ... typen type = function(param1: type1, ... , paramn: typen) body
In the HTML versions of the semantics, each reference to the global semantic function f is linked to f's definition.
For example, the function definition
square(x: Integer) : Integer = x*x
defines a function named square that takes an Integer parameter x and returns an Integer that is the square of x. This is equivalent to the following global definition:
square : Integer Integer = function(x: Integer) x*x
We can give a new name to a semantic type t by using the type definition, which has the form:
type name = t
For example, the following notation defines RegExp as a shorthand for tuple {reBody: String; reFlags: String}:
type RegExp = tuple {reBody: String; reFlags: String}
In the HTML versions of the semantics, each reference to the semantic type name name is linked to name's definition.
Semantic actions tie together the grammar and the semantics. A semantic action ascribes semantic meaning to a grammar production.
To illustrate the use of semantic actions, we shall look at an example, followed by a detailed description of the notation for specifying semantic actions.
Consider the following grammar, with the start nonterminal Numeral:
This grammar defines the syntax of an acceptable input: 37,
33#4
and 30#2
are acceptable syntactically, while 1a
is not. However, the grammar does not indicate what these various inputs mean. That is the job of the semantics, which are
defined in terms of actions on the parse tree of grammar rule expansions. Consider the following sample set of actions defined
on this grammar, with a starting Numeral action called (in this example)
Value:
type SemanticException = oneof {syntaxError}
action Value[Digit] : Integer = digitValue(Digit)
action DecimalValue[Digits] : Integer
DecimalValue[Digits Digit] = Value[Digit]
DecimalValue[Digits Digits1 Digit] = 10*DecimalValue[Digits1] + Value[Digit]
action BaseValue[Digits] : Integer Integer
BaseValue[Digits Digit](base: Integer)
= let d: Integer = Value[Digit]
in if d < base
then d
else throw syntaxError
BaseValue[Digits Digits1 Digit](base: Integer)
= let d: Integer = Value[Digit]
in if d < base
then base*BaseValue[Digits1](base) + d
else throw syntaxError
action Value[Numeral] : Integer
Value[Numeral Digits] = DecimalValue[Digits]
Value[Numeral Digits1 # Digits2]
= let base: Integer = DecimalValue[Digits2]
in if base 2 and base 10
then BaseValue[Digits1](base)
else throw syntaxError
Action names are written in violet cursive type. The last action
definition states in the example above that the action Value can be applied to any expansion
of the nonterminal Numeral, and the result is an Integer.
This action maps all acceptable inputs to integers or syntaxError.
If the result is syntaxError, then the input satisfies the grammar but
contains an error detected by the semantics; this is the case for the input 30#2.
A result of would indicate a nonterminating computation; this
cannot happen in this example.
There are two definitions of the Value action on Numeral,
one for each grammar production that expands Numeral. Each definition
of an action is allowed to call actions on the terminals and nonterminals on the right side of the expansion. For example,
Value applied to the first Numeral production
(the one that expands Numeral into Digits)
simply applies the DecimalValue action to the expansion of the nonterminal Digits
and returns the result. On the other hand, Value applied to the second Numeral
production (the one that expands Numeral into Digits # Digits)
performs a computation using the results of the DecimalValue and BaseValue
applied to the two expansions of the Digits nonterminals. In this case
there are two identical nonterminals Digits on the right side of the
expansion, so we use subscripts to indicate on which one we're calling the actions DecimalValue
and BaseValue.
The BaseValue action illustrates a syntactic sugar for defining an action that is a function; this syntactic sugar is analogous to that for defining global functions.
The Value action on Digit illustrates the direct use of a nonterminal in a semantic expression: digitValue(Digit). Here the Digit semantic expression evaluates to the character into which the Digit grammar rule expands.
We can fully evaluate the semantics on our sample inputs to get the following results:
| Input | Semantic Result |
|---|---|
37 |
37 |
33#4 |
15 |
30#2 |
syntaxError |
action Action[nonterminal] : type
This declaration states that action Action is defined on nonterminal nonterminal. Any reference to action Action[nonterminal] in a semantic expression returns a value of type type. The values of action Action must be defined using action definitions for each grammar production that has nonterminal on the left side.
Action[nonterminal expansion] = expr
This notation defines the value of action Action on nonterminal nonterminal in the case where nonterminal nonterminal expands to the given expansion. expansion can contain zero or more terminals and nonterminals (as well as other notations allowed on the right side of a grammar production). Furthermore, the terminals and nonterminals of expansion can be subscripted to allow them to be unambiguously referenced by action references or nonterminal references inside expr.
The type of action Action on nonterminal nonterminal must be declared using an action declaration. expr must have the type given by that action declaration.
nonterminal expansion must be one of the productions in the grammar.
Action[nonterminal expansion](param1: type1, ... , paramn: typen) = body
This notation is a syntactic sugar for defining an action whose value is a function. This notation is equivalent to:
Action[nonterminal expansion] =
function(param1: type1, ... , paramn: typen) body
action Action[nonterminal] : type = expr
This declaration is sometimes used when all expansions of nonterminal nonterminal share the same action semantics. This declaration states both the type type of action Action on nonterminal nonterminal as well as that action's value expr. Note that the expansions are not given between the square brackets, and expr can refer only to the nonterminal nonterminal on the left side of grammar productions. No additional action definitions are needed for nonterminal nonterminal.
See the Value action on Digit in the example above for an example of this declaration.
|
JavaScript 2.0
Formal Description
Stages
|
Thursday, November 11, 1999
The source code is processed in the following stages:
Processing stage 2 is done as follows:
If an implementation encounters an error while lexing, it is permitted to either report the error immediately or defer it until the affected token would actually be used by the parser. This flexibility allows an implementation to do lexing at the same time it parses the source program.
Provide language prohibiting an identifier from immediately following a number. This will fall out of the revised definition of QuantityLiteral.
Show mapping from Token structures to parser grammar terminals (obvious, but needs to be written).
To be provided
|
JavaScript 2.0
Formal Description
Lexer Grammar
|
Thursday, November 11, 1999
This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.
This document is also available as a Word 98 rtf file.
The start symbols are
NextTokenre
and
NextTokendiv
depending on whether a / should be interpreted as a regular expression or division.
«TAB» | «VT» | «FF» | «SP» | «u00A0»«u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»«u2008» | «u2009» | «u200A» | «u200B»«u3000»!! =! = =#%% =&& && & =& =(** =++ =,-- =- >.. .. . .:: :;<< << < =< === == = =>> => >> > => > >> > > =?@[^^ =^ ^^ ^ ={|| =| || | =~. Fraction|
JavaScript 2.0
Formal Description
Lexer Semantics
|
Thursday, November 11, 1999
The lexer semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexer grammar is repeated here. See also the description of the semantic notation.
This document is also available as a Word 98 rtf file.
The start symbols are
NextTokenre
and
NextTokendiv
depending on whether a / should be interpreted as a regular expression or division.
type SemanticException = oneof {syntaxError}
«TAB» | «VT» | «FF» | «SP» | «u00A0»«u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»«u2008» | «u2009» | «u200A» | «u200B»«u3000»action DecimalValue[ASCIIDigit] : Integer = digitValue(ASCIIDigit)
type RegExp = tuple {reBody: String; reFlags: String}
type Quantity = tuple {amount: Double; unit: String}
type Token
= oneof {
identifier: String;
keyword: String;
punctuator: String;
number: Double;
quantity: Quantity;
string: String;
regularExpression: RegExp;
end}
action Token[NextTokent] : Token
Token[NextTokent WhiteSpace Tokent] = Token[Tokent]
action RegExpMayFollow[NextTokent] : Boolean
RegExpMayFollow[NextTokent WhiteSpace Tokent] = RegExpMayFollow[Tokent]
Token[Tokent IdentifierOrReservedWord] = Token[IdentifierOrReservedWord]
Token[Tokent Punctuator] = Token[Punctuator]
Token[Tokendiv DivisionPunctuator] = punctuator Punctuator[DivisionPunctuator]
Token[Tokent NumericLiteral] = number DoubleValue[NumericLiteral]
Token[Tokent QuantityLiteral] = quantity QuantityValue[QuantityLiteral]
Token[Tokent StringLiteral] = string StringValue[StringLiteral]
Token[Tokenre RegExpLiteral] = regularExpression REValue[RegExpLiteral]
Token[Tokent EndOfInput] = end
action RegExpMayFollow[Tokent] : Boolean
RegExpMayFollow[Tokent IdentifierOrReservedWord]
= RegExpMayFollow[IdentifierOrReservedWord]
RegExpMayFollow[Tokent Punctuator] = RegExpMayFollow[Punctuator]
RegExpMayFollow[Tokendiv DivisionPunctuator] = true
RegExpMayFollow[Tokent NumericLiteral] = false
RegExpMayFollow[Tokent QuantityLiteral] = false
RegExpMayFollow[Tokent StringLiteral] = false
RegExpMayFollow[Tokenre RegExpLiteral] = false
RegExpMayFollow[Tokent EndOfInput] = true
action Name[IdentifierName] : String
Name[IdentifierName InitialIdentifierCharacter]
= [CharacterValue[InitialIdentifierCharacter]]
Name[IdentifierName IdentifierName1 ContinuingIdentifierCharacter]
= Name[IdentifierName1] [CharacterValue[ContinuingIdentifierCharacter]]
action ContainsEscapes[IdentifierName] : Boolean
ContainsEscapes[IdentifierName InitialIdentifierCharacter]
= ContainsEscapes[InitialIdentifierCharacter]
ContainsEscapes[IdentifierName IdentifierName1 ContinuingIdentifierCharacter]
= ContainsEscapes[IdentifierName1] or ContainsEscapes[ContinuingIdentifierCharacter]
action CharacterValue[InitialIdentifierCharacter] : Character
CharacterValue[InitialIdentifierCharacter OrdinaryInitialIdentifierCharacter]
= OrdinaryInitialIdentifierCharacter
CharacterValue[InitialIdentifierCharacter \ HexEscape]
= if isOrdinaryInitialIdentifierCharacter(CharacterValue[HexEscape])
then CharacterValue[HexEscape]
else throw syntaxError
action ContainsEscapes[InitialIdentifierCharacter] : Boolean
ContainsEscapes[InitialIdentifierCharacter OrdinaryInitialIdentifierCharacter] = false
ContainsEscapes[InitialIdentifierCharacter \ HexEscape] = true
action CharacterValue[ContinuingIdentifierCharacter] : Character
CharacterValue[ContinuingIdentifierCharacter OrdinaryContinuingIdentifierCharacter]
= OrdinaryContinuingIdentifierCharacter
CharacterValue[ContinuingIdentifierCharacter \ HexEscape]
= if isOrdinaryContinuingIdentifierCharacter(CharacterValue[HexEscape])
then CharacterValue[HexEscape]
else throw syntaxError
action ContainsEscapes[ContinuingIdentifierCharacter] : Boolean
ContainsEscapes[ContinuingIdentifierCharacter OrdinaryContinuingIdentifierCharacter]
= false
ContainsEscapes[ContinuingIdentifierCharacter \ HexEscape] = true
reservedWordsRE : String[]
= [“abstract”,
“break”,
“case”,
“catch”,
“class”,
“const”,
“continue”,
“debugger”,
“default”,
“delete”,
“do”,
“else”,
“enum”,
“eval”,
“export”,
“extends”,
“final”,
“finally”,
“for”,
“function”,
“goto”,
“if”,
“implements”,
“import”,
“in”,
“instanceof”,
“native”,
“new”,
“package”,
“private”,
“protected”,
“public”,
“return”,
“static”,
“switch”,
“synchronized”,
“throw”,
“throws”,
“transient”,
“try”,
“typeof”,
“var”,
“volatile”,
“while”,
“with”]
reservedWordsDiv : String[] = [“false”, “null”, “super”, “this”, “true”]
nonReservedWords : String[]
= [“box”,
“constructor”,
“field”,
“get”,
“language”,
“local”,
“method”,
“override”,
“set”,
“version”]
keywords : String[] = reservedWordsRE reservedWordsDiv nonReservedWords
member(id: String, list: String[]) : Boolean
= if |list| = 0
then false
else if id = list[0]
then true
else member(id, list[1 ...])
action Token[IdentifierOrReservedWord] : Token
Token[IdentifierOrReservedWord IdentifierName]
= let id: String = Name[IdentifierName]
in if member(id, keywords) and not ContainsEscapes[IdentifierName]
then keyword id
else identifier id
action RegExpMayFollow[IdentifierOrReservedWord] : Boolean
RegExpMayFollow[IdentifierOrReservedWord IdentifierName]
= let id: String = Name[IdentifierName]
in member(id, reservedWordsRE) and not ContainsEscapes[IdentifierName]
!! =! = =#%% =&& && & =& =(** =++ =,-- =- >.. .. . .:: :;<< << < =< === == = =>> => >> > => > >> > > =?@[^^ =^ ^^ ^ ={|| =| || | =~action Token[Punctuator] : Token
Token[Punctuator PunctuatorRE] = punctuator Punctuator[PunctuatorRE]
Token[Punctuator PunctuatorDiv] = punctuator Punctuator[PunctuatorDiv]
action RegExpMayFollow[Punctuator] : Boolean
RegExpMayFollow[Punctuator PunctuatorRE] = true
RegExpMayFollow[Punctuator PunctuatorDiv] = false
action Punctuator[PunctuatorRE] : String
Punctuator[PunctuatorRE !] = “!”
Punctuator[PunctuatorRE ! =] = “!=”
Punctuator[PunctuatorRE ! = =] = “!==”
Punctuator[PunctuatorRE #] = “#”
Punctuator[PunctuatorRE %] = “%”
Punctuator[PunctuatorRE % =] = “%=”
Punctuator[PunctuatorRE &] = “&”
Punctuator[PunctuatorRE & &] = “&&”
Punctuator[PunctuatorRE & & =] = “&&=”
Punctuator[PunctuatorRE & =] = “&=”
Punctuator[PunctuatorRE (] = “(”
Punctuator[PunctuatorRE *] = “*”
Punctuator[PunctuatorRE * =] = “*=”
Punctuator[PunctuatorRE +] = “+”
Punctuator[PunctuatorRE + =] = “+=”
Punctuator[PunctuatorRE ,] = “,”
Punctuator[PunctuatorRE -] = “-”
Punctuator[PunctuatorRE - =] = “-=”
Punctuator[PunctuatorRE - >] = “->”
Punctuator[PunctuatorRE .] = “.”
Punctuator[PunctuatorRE . .] = “..”
Punctuator[PunctuatorRE . . .] = “...”
Punctuator[PunctuatorRE :] = “:”
Punctuator[PunctuatorRE : :] = “::”
Punctuator[PunctuatorRE ;] = “;”
Punctuator[PunctuatorRE <] = “<”
Punctuator[PunctuatorRE < <] = “<<”
Punctuator[PunctuatorRE < < =] = “<<=”
Punctuator[PunctuatorRE < =] = “<=”
Punctuator[PunctuatorRE =] = “=”
Punctuator[PunctuatorRE = =] = “==”
Punctuator[PunctuatorRE = = =] = “===”
Punctuator[PunctuatorRE >] = “>”
Punctuator[PunctuatorRE > =] = “>=”
Punctuator[PunctuatorRE > >] = “>>”
Punctuator[PunctuatorRE > > =] = “>>=”
Punctuator[PunctuatorRE > > >] = “>>>”
Punctuator[PunctuatorRE > > > =] = “>>>=”
Punctuator[PunctuatorRE ?] = “?”
Punctuator[PunctuatorRE @] = “@”
Punctuator[PunctuatorRE [] = “[”
Punctuator[PunctuatorRE ^] = “^”
Punctuator[PunctuatorRE ^ =] = “^=”
Punctuator[PunctuatorRE ^ ^] = “^^”
Punctuator[PunctuatorRE ^ ^ =] = “^^=”
Punctuator[PunctuatorRE {] = “{”
Punctuator[PunctuatorRE |] = “|”
Punctuator[PunctuatorRE | =] = “|=”
Punctuator[PunctuatorRE | |] = “||”
Punctuator[PunctuatorRE | | =] = “||=”
Punctuator[PunctuatorRE ~] = “~”
action Punctuator[PunctuatorDiv] : String
Punctuator[PunctuatorDiv )] = “)”
Punctuator[PunctuatorDiv + +] = “++”
Punctuator[PunctuatorDiv - -] = “--”
Punctuator[PunctuatorDiv ]] = “]”
Punctuator[PunctuatorDiv }] = “}”
action Punctuator[DivisionPunctuator] : String
Punctuator[DivisionPunctuator /] = “/”
Punctuator[DivisionPunctuator / =] = “/=”
action DoubleValue[NumericLiteral] : Double
DoubleValue[NumericLiteral DecimalLiteral]
= rationalToDouble(RationalValue[DecimalLiteral])
DoubleValue[NumericLiteral HexIntegerLiteral [lookahead{HexDigit}]]
= rationalToDouble(IntegerValue[HexIntegerLiteral])
expt(base: Rational, exponent: Integer) : Rational
= if exponent = 0
then 1
else if exponent < 0
then 1/expt(base, -exponent)
else base*expt(base, exponent - 1)
. Fractionaction RationalValue[DecimalLiteral] : Rational
RationalValue[DecimalLiteral Mantissa] = RationalValue[Mantissa]
RationalValue[DecimalLiteral Mantissa LetterE SignedInteger]
= RationalValue[Mantissa]*expt(10, IntegerValue[SignedInteger])
action RationalValue[Mantissa] : Rational
RationalValue[Mantissa DecimalIntegerLiteral] = IntegerValue[DecimalIntegerLiteral]
RationalValue[Mantissa DecimalIntegerLiteral .] = IntegerValue[DecimalIntegerLiteral]
RationalValue[Mantissa DecimalIntegerLiteral . Fraction]
= IntegerValue[DecimalIntegerLiteral] + RationalValue[Fraction]
RationalValue[Mantissa . Fraction] = RationalValue[Fraction]
action IntegerValue[DecimalIntegerLiteral] : Integer
IntegerValue[DecimalIntegerLiteral 0] = 0
IntegerValue[DecimalIntegerLiteral NonZeroDecimalDigits]
= IntegerValue[NonZeroDecimalDigits]
action IntegerValue[NonZeroDecimalDigits] : Integer
IntegerValue[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit]
IntegerValue[NonZeroDecimalDigits NonZeroDecimalDigits1 ASCIIDigit]
= 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[ASCIIDigit]
action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)
action RationalValue[Fraction] : Rational
RationalValue[Fraction DecimalDigits]
= IntegerValue[DecimalDigits]/expt(10, NDigits[DecimalDigits])
action IntegerValue[SignedInteger] : Integer
IntegerValue[SignedInteger DecimalDigits] = IntegerValue[DecimalDigits]
IntegerValue[SignedInteger + DecimalDigits] = IntegerValue[DecimalDigits]
IntegerValue[SignedInteger - DecimalDigits] = -IntegerValue[DecimalDigits]
action IntegerValue[DecimalDigits] : Integer
IntegerValue[DecimalDigits ASCIIDigit] = DecimalValue[ASCIIDigit]
IntegerValue[DecimalDigits DecimalDigits1 ASCIIDigit]
= 10*IntegerValue[DecimalDigits1] + DecimalValue[ASCIIDigit]
action NDigits[DecimalDigits] : Integer
NDigits[DecimalDigits ASCIIDigit] = 1
NDigits[DecimalDigits DecimalDigits1 ASCIIDigit] = NDigits[DecimalDigits1] + 1
action IntegerValue[HexIntegerLiteral] : Integer
IntegerValue[HexIntegerLiteral 0 LetterX HexDigit] = HexValue[HexDigit]
IntegerValue[HexIntegerLiteral HexIntegerLiteral1 HexDigit]
= 16*IntegerValue[HexIntegerLiteral1] + HexValue[HexDigit]
action HexValue[HexDigit] : Integer = digitValue(HexDigit)
action QuantityValue[QuantityLiteral] : Quantity
QuantityValue[QuantityLiteral NumericLiteral QuantityName]
= amount DoubleValue[NumericLiteral], unit Name[QuantityName]
action Name[QuantityName] : String
Name[QuantityName [lookahead{LetterE, LetterX}] IdentifierName]
= Name[IdentifierName]
action StringValue[StringLiteral] : String
StringValue[StringLiteral ' StringCharssingle '] = StringValue[StringCharssingle]
StringValue[StringLiteral " StringCharsdouble "] = StringValue[StringCharsdouble]
action StringValue[StringCharsq] : String
StringValue[StringCharsq «empty»] = “”
StringValue[StringCharsq StringCharsq1 StringCharq]
= StringValue[StringCharsq1] [CharacterValue[StringCharq]]
action CharacterValue[StringCharq] : Character
CharacterValue[StringCharq LiteralStringCharq] = LiteralStringCharq
CharacterValue[StringCharq \ StringEscape] = CharacterValue[StringEscape]
action CharacterValue[StringEscape] : Character
CharacterValue[StringEscape ControlEscape] = CharacterValue[ControlEscape]
CharacterValue[StringEscape ZeroEscape] = CharacterValue[ZeroEscape]
CharacterValue[StringEscape HexEscape] = CharacterValue[HexEscape]
CharacterValue[StringEscape IdentityEscape] = IdentityEscape
action CharacterValue[ControlEscape] : Character
CharacterValue[ControlEscape b] = ‘«BS»’
CharacterValue[ControlEscape f] = ‘«FF»’
CharacterValue[ControlEscape n] = ‘«LF»’
CharacterValue[ControlEscape r] = ‘«CR»’
CharacterValue[ControlEscape t] = ‘«TAB»’
CharacterValue[ControlEscape v] = ‘«VT»’
action CharacterValue[ZeroEscape] : Character
CharacterValue[ZeroEscape 0 [lookahead{ASCIIDigit}]] = ‘«NUL»’
action CharacterValue[HexEscape] : Character
CharacterValue[HexEscape x HexDigit1 HexDigit2]
= codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])
CharacterValue[HexEscape u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
= codeToCharacter(
4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
HexValue[HexDigit4])
action REValue[RegExpLiteral] : RegExp
REValue[RegExpLiteral RegExpBody RegExpFlags]
= reBody REBody[RegExpBody], reFlags REFlags[RegExpFlags]
action REFlags[RegExpFlags] : String
REFlags[RegExpFlags «empty»] = “”
REFlags[RegExpFlags RegExpFlags1 ContinuingIdentifierCharacter]
= REFlags[RegExpFlags1] [CharacterValue[ContinuingIdentifierCharacter]]
action REBody[RegExpBody] : String
REBody[RegExpBody / RegExpFirstChar RegExpChars /]
= REBody[RegExpFirstChar] REBody[RegExpChars]
action REBody[RegExpFirstChar] : String
REBody[RegExpFirstChar OrdinaryRegExpFirstChar] = [OrdinaryRegExpFirstChar]
REBody[RegExpFirstChar \ NonTerminator] = [‘\’, NonTerminator]
action REBody[RegExpChars] : String
REBody[RegExpChars «empty»] = “”
REBody[RegExpChars RegExpChars1 RegExpChar]
= REBody[RegExpChars1] REBody[RegExpChar]
action REBody[RegExpChar] : String
REBody[RegExpChar OrdinaryRegExpChar] = [OrdinaryRegExpChar]
REBody[RegExpChar \ NonTerminator] = [‘\’, NonTerminator]
|
JavaScript 2.0
Formal Description
Regular Expression Grammar
|
Thursday, November 11, 1999
This LR(1) grammar describes the regular expression syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.
This document is also available as a Word 98 rtf file.
*+?.\ AtomEscapeA | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Za | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z|
JavaScript 2.0
Formal Description
Regular Expression Semantics
|
Thursday, November 11, 1999
The regular expression semantics describe the actions the regular expression engine takes in order to transform a regular expression pattern into a function for matching against input strings. For convenience, the regular expression grammar is repeated here. See also the description of the semantic notation.
This document is also available as a Word 98 rtf file.
The regular expression semantics below are working (except for case-insensitive matches) and have been tried on sample cases, but they could be formatted better.
type SemanticException = oneof {syntaxError}
lineTerminators : {Character} = {‘«LF»’, ‘«CR»’, ‘«u2028»’, ‘«u2029»’}
reWhitespaces : {Character} = {‘«FF»’, ‘«LF»’, ‘«CR»’, ‘«TAB»’, ‘«VT»’, ‘ ’}
reDigits : {Character} = {‘0’ ... ‘9’}
reWordCharacters : {Character} = {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’, ‘_’}
type REInput = tuple {str: String; ignoreCase: Boolean; multiline: Boolean}
Field str is the input string. ignoreCase and multiline are the corresponding regular expression flags.
type REResult = oneof {success: REMatch; failure}
type REMatch = tuple {endIndex: Integer; captures: Capture[]}
A REMatch holds an intermediate state during the pattern-matching process. endIndex is the index of the next input character to be matched by the next component in a regular expression pattern. If we are at the end of the pattern, endIndex is one plus the index of the last matched input character. captures is a zero-based array of the strings captured so far by capturing parentheses.
type Capture = oneof {present: String; absent}
type Continuation = REMatch REResult
A Continuation is a function that attempts to match the remaining portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. If a match is possible, it returns a success result that contains the final REMatch state; if no match is possible, it returns a failure result.
type Matcher = REInput REMatch Continuation REResult
A Matcher is a function that attempts to match a middle portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. Since the remainder of the pattern heavily influences whether (and how) a middle portion will match, we must pass in a Continuation function that checks whether the rest of the pattern matched. If the continuation returns failure, the matcher function may call it repeatedly, trying various alternatives at pattern choice points.
The REInput parameter contains the input string and is merely passed down to subroutines.
type MatcherGenerator = Integer Matcher
A MatcherGenerator is a function executed at the time the regular expression is compiled that returns a Matcher for a part of the pattern. The Integer parameter contains the number of capturing left parentheses seen so far in the pattern and is used to assign static, consecutive numbers to capturing parentheses.
characterSetMatcher(acceptanceSet: {Character}, invert: Boolean) : Matcher
= function(t: REInput, x: REMatch, c: Continuation)
let i: Integer = x.endIndex;
s: String = t.str
in if i = |s|
then failure
else if s[i] acceptanceSet xor invert
then c(endIndex (i + 1), captures x.captures)
else failure
characterSetMatcher returns a Matcher that matches a single input string character. If invert is false, the match succeeds if the character is a member of the acceptanceSet set of characters (possibly ignoring case). If invert is true, the match succeeds if the character is not a member of the acceptanceSet set of characters (possibly ignoring case).
characterMatcher(ch: Character) : Matcher = characterSetMatcher({ch}, false)
characterMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is the same as ch (possibly ignoring case).
action Exec[RegularExpressionPattern] : REInput Integer REResult
Exec[RegularExpressionPattern Disjunction]
= let match: Matcher = GenMatcher[Disjunction](0)
in function(t: REInput, index: Integer)
match(
t,
endIndex index, captures fillCapture(CountParens[Disjunction]),
successContinuation)
successContinuation(x: REMatch) : REResult = success x
fillCapture(i: Integer) : Capture[]
= if i = 0
then []Capture
else fillCapture(i - 1) [absent]
action GenMatcher[Disjunction] : MatcherGenerator
GenMatcher[Disjunction Alternative] = GenMatcher[Alternative]
GenMatcher[Disjunction Alternative | Disjunction1](parenIndex: Integer)
= let match1: Matcher = GenMatcher[Alternative](parenIndex);
match2: Matcher = GenMatcher[Disjunction1](parenIndex + CountParens[Alternative])
in function(t: REInput, x: REMatch, c: Continuation)
case match1(t, x, c) of
success(y: REMatch): success y;
failure: match2(t, x, c)
end
action CountParens[Disjunction] : Integer
CountParens[Disjunction Alternative] = CountParens[Alternative]
CountParens[Disjunction Alternative | Disjunction1]
= CountParens[Alternative] + CountParens[Disjunction1]
action GenMatcher[Alternative] : MatcherGenerator
GenMatcher[Alternative «empty»](parenIndex: Integer)
= function(t: REInput, x: REMatch, c: Continuation)
c(x)
GenMatcher[Alternative Alternative1 Term](parenIndex: Integer)
= let match1: Matcher = GenMatcher[Alternative1](parenIndex);
match2: Matcher = GenMatcher[Term](parenIndex + CountParens[Alternative1])
in function(t: REInput, x: REMatch, c: Continuation)
let d: Continuation
= function(y: REMatch)
match2(t, y, c)
in match1(t, x, d)
action CountParens[Alternative] : Integer
CountParens[Alternative «empty»] = 0
CountParens[Alternative Alternative1 Term]
= CountParens[Alternative1] + CountParens[Term]
action GenMatcher[Term] : MatcherGenerator
GenMatcher[Term Assertion](parenIndex: Integer)
= function(t: REInput, x: REMatch, c: Continuation)
if TestAssertion[Assertion](t, x)
then c(x)
else failure
GenMatcher[Term Atom] = GenMatcher[Atom]
GenMatcher[Term Atom Quantifier](parenIndex: Integer)
= let match: Matcher = GenMatcher[Atom](parenIndex);
min: Integer = Minimum[Quantifier];
max: Limit = Maximum[Quantifier];
greedy: Boolean = Greedy[Quantifier]
in if
(case max of
finite(m: Integer): m < min;
infinite: false
end)
then throw syntaxError
else repeatMatcher(match, min, max, greedy, parenIndex, CountParens[Atom])
action CountParens[Term] : Integer
CountParens[Term Assertion] = 0
CountParens[Term Atom] = CountParens[Atom]
CountParens[Term Atom Quantifier] = CountParens[Atom]
*+?type Limit = oneof {finite: Integer; infinite}
resetParens(x: REMatch, p: Integer, nParens: Integer) : REMatch
= if nParens = 0
then x
else let y: REMatch = endIndex x.endIndex, captures x.captures[p absent]
in resetParens(y, p + 1, nParens - 1)
repeatMatcher(body: Matcher, min: Integer, max: Limit, greedy: Boolean, parenIndex: Integer, nBodyParens: Integer)
: Matcher
= function(t: REInput, x: REMatch, c: Continuation)
if
(case max of
finite(m: Integer): m = 0;
infinite: false
end)
then c(x)
else let d: Continuation
= function(y: REMatch)
if min = 0 and y.endIndex = x.endIndex
then failure
else let newMin: Integer
= if min = 0
then 0
else min - 1;
newMax: Limit
= case max of
finite(m: Integer): finite (m - 1);
infinite: infinite
end
in repeatMatcher(
body,
newMin,
newMax,
greedy,
parenIndex,
nBodyParens)(t, y, c);
xr: REMatch = resetParens(x, parenIndex, nBodyParens)
in if min 0
then body(t, xr, d)
else if greedy
then case body(t, xr, d) of
success(z: REMatch): success z;
failure: c(x)
end
else case c(x) of
success(z: REMatch): success z;
failure: body(t, xr, d)
end
action Minimum[Quantifier] : Integer
Minimum[Quantifier QuantifierPrefix] = Minimum[QuantifierPrefix]
Minimum[Quantifier QuantifierPrefix ?] = Minimum[QuantifierPrefix]
action Maximum[Quantifier] : Limit
Maximum[Quantifier QuantifierPrefix] = Maximum[QuantifierPrefix]
Maximum[Quantifier QuantifierPrefix ?] = Maximum[QuantifierPrefix]
action Greedy[Quantifier] : Boolean
Greedy[Quantifier QuantifierPrefix] = true
Greedy[Quantifier QuantifierPrefix ?] = false
action Minimum[QuantifierPrefix] : Integer
Minimum[QuantifierPrefix *] = 0
Minimum[QuantifierPrefix +] = 1
Minimum[QuantifierPrefix ?] = 0
Minimum[QuantifierPrefix { DecimalDigits }] = IntegerValue[DecimalDigits]
Minimum[QuantifierPrefix { DecimalDigits , }] = IntegerValue[DecimalDigits]
Minimum[QuantifierPrefix { DecimalDigits1 , DecimalDigits2 }]
= IntegerValue[DecimalDigits1]
action Maximum[QuantifierPrefix] : Limit
Maximum[QuantifierPrefix *] = infinite
Maximum[QuantifierPrefix +] = infinite
Maximum[QuantifierPrefix ?] = finite 1
Maximum[QuantifierPrefix { DecimalDigits }] = finite IntegerValue[DecimalDigits]
Maximum[QuantifierPrefix { DecimalDigits , }] = infinite
Maximum[QuantifierPrefix { DecimalDigits1 , DecimalDigits2 }]
= finite IntegerValue[DecimalDigits2]
action IntegerValue[DecimalDigits] : Integer
IntegerValue[DecimalDigits DecimalDigit] = DecimalValue[DecimalDigit]
IntegerValue[DecimalDigits DecimalDigits1 DecimalDigit]
= 10*IntegerValue[DecimalDigits1] + DecimalValue[DecimalDigit]
action DecimalValue[DecimalDigit] : Integer = digitValue(DecimalDigit)
action TestAssertion[Assertion] : REInput REMatch Boolean
TestAssertion[Assertion ^](t: REInput, x: REMatch)
= if x.endIndex = 0
then true
else t.multiline and t.str[x.endIndex - 1] lineTerminators
TestAssertion[Assertion $](t: REInput, x: REMatch)
= if x.endIndex = |t.str|
then true
else t.multiline and t.str[x.endIndex] lineTerminators
TestAssertion[Assertion \ b](t: REInput, x: REMatch)
= atWordBoundary(x.endIndex, t.str)
TestAssertion[Assertion \ B](t: REInput, x: REMatch)
= not atWordBoundary(x.endIndex, t.str)
atWordBoundary(i: Integer, s: String) : Boolean = inWord(i - 1, s) xor inWord(i, s)
inWord(i: Integer, s: String) : Boolean
= if i = -1 or i = |s|
then false
else s[i] reWordCharacters
.\ AtomEscapeaction GenMatcher[Atom] : MatcherGenerator
GenMatcher[Atom PatternCharacter](parenIndex: Integer)
= characterMatcher(PatternCharacter)
GenMatcher[Atom .](parenIndex: Integer) = characterSetMatcher(lineTerminators, true)
GenMatcher[Atom \ AtomEscape] = GenMatcher[AtomEscape]
GenMatcher[Atom CharacterClass](parenIndex: Integer)
= let a: {Character} = AcceptanceSet[CharacterClass]
in characterSetMatcher(a, Invert[CharacterClass])
GenMatcher[Atom ( Disjunction )](parenIndex: Integer)
= let match: Matcher = GenMatcher[Disjunction](parenIndex + 1)
in function(t: REInput, x: REMatch, c: Continuation)
let d: Continuation
= function(y: REMatch)
let updatedCaptures: Capture[]
= y.captures[parenIndex
present t.str[x.endIndex ... y.endIndex - 1]]
in c(endIndex y.endIndex, captures updatedCaptures)
in match(t, x, d)
GenMatcher[Atom ( ? : Disjunction )] = GenMatcher[Disjunction]
GenMatcher[Atom ( ? = Disjunction )](parenIndex: Integer)
= let match: Matcher = GenMatcher[Disjunction](parenIndex)
in function(t: REInput, x: REMatch, c: Continuation)
case match(t, x, successContinuation) of
success(y: REMatch): c(endIndex x.endIndex, captures y.captures);
failure: failure
end
GenMatcher[Atom ( ? ! Disjunction )](parenIndex: Integer)
= let match: Matcher = GenMatcher[Disjunction](parenIndex)
in function(t: REInput, x: REMatch, c: Continuation)
case match(t, x, successContinuation) of
success(y: REMatch): failure;
failure: c(x)
end
action CountParens[Atom] : Integer
CountParens[Atom PatternCharacter] = 0
CountParens[Atom .] = 0
CountParens[Atom \ AtomEscape] = 0
CountParens[Atom CharacterClass] = 0
CountParens[Atom ( Disjunction )] = CountParens[Disjunction] + 1
CountParens[Atom ( ? : Disjunction )] = CountParens[Disjunction]
CountParens[Atom ( ? = Disjunction )] = CountParens[Disjunction]
CountParens[Atom ( ? ! Disjunction )] = CountParens[Disjunction]
action GenMatcher[AtomEscape] : MatcherGenerator
GenMatcher[AtomEscape DecimalEscape](parenIndex: Integer)
= let n: Integer = EscapeValue[DecimalEscape]
in if n = 0
then characterMatcher(‘«NUL»’)
else if n > parenIndex
then throw syntaxError
else backreferenceMatcher(n)
GenMatcher[AtomEscape CharacterEscape](parenIndex: Integer)
= characterMatcher(CharacterValue[CharacterEscape])
GenMatcher[AtomEscape CharacterClassEscape](parenIndex: Integer)
= characterSetMatcher(AcceptanceSet[CharacterClassEscape], false)
backreferenceMatcher(n: Integer) : Matcher
= function(t: REInput, x: REMatch, c: Continuation)
case nthBackreference(x, n) of
present(ref: String):
let i: Integer = x.endIndex;
s: String = t.str
in let j: Integer = i + |ref|
in if j > |s|
then failure
else if s[i ... j - 1] = ref
then c(endIndex j, captures x.captures)
else failure;
absent: c(x)
end
nthBackreference(x: REMatch, n: Integer) : Capture = x.captures[n - 1]
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Za | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | zaction CharacterValue[CharacterEscape] : Character
CharacterValue[CharacterEscape ControlEscape] = CharacterValue[ControlEscape]
CharacterValue[CharacterEscape c ControlLetter]
= codeToCharacter(bitwiseAnd(characterToCode(ControlLetter), 31))
CharacterValue[CharacterEscape HexEscape] = CharacterValue[HexEscape]
CharacterValue[CharacterEscape IdentityEscape] = IdentityEscape
action CharacterValue[ControlEscape] : Character
CharacterValue[ControlEscape f] = ‘«FF»’
CharacterValue[ControlEscape n] = ‘«LF»’
CharacterValue[ControlEscape r] = ‘«CR»’
CharacterValue[ControlEscape t] = ‘«TAB»’
CharacterValue[ControlEscape v] = ‘«VT»’
action EscapeValue[DecimalEscape] : Integer
EscapeValue[DecimalEscape DecimalIntegerLiteral [lookahead{DecimalDigit}]]
= IntegerValue[DecimalIntegerLiteral]
action IntegerValue[DecimalIntegerLiteral] : Integer
IntegerValue[DecimalIntegerLiteral 0] = 0
IntegerValue[DecimalIntegerLiteral NonZeroDecimalDigits]
= IntegerValue[NonZeroDecimalDigits]
action IntegerValue[NonZeroDecimalDigits] : Integer
IntegerValue[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit]
IntegerValue[NonZeroDecimalDigits NonZeroDecimalDigits1 DecimalDigit]
= 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[DecimalDigit]
action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)
action CharacterValue[HexEscape] : Character
CharacterValue[HexEscape x HexDigit1 HexDigit2]
= codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])
CharacterValue[HexEscape u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
= codeToCharacter(
4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
HexValue[HexDigit4])
action HexValue[HexDigit] : Integer = digitValue(HexDigit)
action AcceptanceSet[CharacterClassEscape] : {Character}
AcceptanceSet[CharacterClassEscape s] = reWhitespaces
AcceptanceSet[CharacterClassEscape S] = {‘«NUL»’ ... ‘«uFFFF»’} - reWhitespaces
AcceptanceSet[CharacterClassEscape d] = reDigits
AcceptanceSet[CharacterClassEscape D] = {‘«NUL»’ ... ‘«uFFFF»’} - reDigits
AcceptanceSet[CharacterClassEscape w] = reWordCharacters
AcceptanceSet[CharacterClassEscape W] = {‘«NUL»’ ... ‘«uFFFF»’} - reWordCharacters
action AcceptanceSet[CharacterClass] : {Character}
AcceptanceSet[CharacterClass [ [lookahead{^}] ClassRanges ]]
= AcceptanceSet[ClassRanges]
AcceptanceSet[CharacterClass [ ^ ClassRanges ]] = AcceptanceSet[ClassRanges]
action Invert[CharacterClass] : Boolean
Invert[CharacterClass [ [lookahead{^}] ClassRanges ]] = false
Invert[CharacterClass [ ^ ClassRanges ]] = true
action AcceptanceSet[ClassRanges] : {Character}
AcceptanceSet[ClassRanges «empty»] = {}Character
AcceptanceSet[ClassRanges NonemptyClassRangesdash]
= AcceptanceSet[NonemptyClassRangesdash]
action AcceptanceSet[NonemptyClassRangesd] : {Character}
AcceptanceSet[NonemptyClassRangesd ClassAtomdash] = AcceptanceSet[ClassAtomdash]
AcceptanceSet[NonemptyClassRangesd ClassAtomd NonemptyClassRangesnoDash1]
= AcceptanceSet[ClassAtomd] AcceptanceSet[NonemptyClassRangesnoDash1]
AcceptanceSet[NonemptyClassRangesd ClassAtomd1 - ClassAtomdash2 ClassRanges]
= let range: {Character}
= characterRange(AcceptanceSet[ClassAtomd1], AcceptanceSet[ClassAtomdash2])
in range AcceptanceSet[ClassRanges]
characterRange(low: {Character}, high: {Character}) : {Character}
= if |low| 1 or |high| 1
then throw syntaxError
else let l: Character = min low;
h: Character = min high
in if l h
then {l ... h}
else throw syntaxError
action AcceptanceSet[ClassAtomd] : {Character}
AcceptanceSet[ClassAtomd ClassCharacterd] = {ClassCharacterd}
AcceptanceSet[ClassAtomd \ ClassEscape] = AcceptanceSet[ClassEscape]
action AcceptanceSet[ClassEscape] : {Character}
AcceptanceSet[ClassEscape DecimalEscape]
= if EscapeValue[DecimalEscape] = 0
then {‘«NUL»’}
else throw syntaxError
AcceptanceSet[ClassEscape b] = {‘«BS»’}
AcceptanceSet[ClassEscape CharacterEscape] = {CharacterValue[CharacterEscape]}
AcceptanceSet[ClassEscape CharacterClassEscape] = AcceptanceSet[CharacterClassEscape]
|
JavaScript 2.0
Formal Description
Parser Grammar
|
Thursday, November 11, 1999
This LALR(1) grammar describes the syntax of the JavaScript 2.0 proposal. The starting nonterminal is Program. See also the description of the grammar notation.
This document is also available as a Word 98 rtf file.
General tokens: Identifier Number RegularExpression String VirtualSemicolon
Punctuation tokens: !
!= !==
% %=
& &&
&&=
&= (
) *
*= +
++ +=
, -
-- -=
. ...
/ /=
: ::
; <
<< <<=
<= =
== ===
> >=
>> >>=
>>>
>>>= ?
@ [
] ^
^= ^^
^^= {
| |=
|| ||=
} ~
Future punctuation tokens: #
->
Reserved words: break
case catch
class const
continue default
delete do
else eval
extends false
final finally
for function
if in
instanceof new
null package
private public
return super
switch this
throw true
try typeof
var while
with
Future reserved words: abstract
debugger enum
export goto
implements import
interface
native protected
static
synchronized throws
transient
volatile
Non-reserved words: box
constructor field
get language
local method
override set
version
boxconstructorfieldgetlanguagelocalmethodsetoverrideversionnulltruefalsethissuper? Identifier++--delete PostfixExpressiontypeof UnaryExpressioneval UnaryExpression++ PostfixExpression-- PostfixExpression+ UnaryExpression- UnaryExpression~ UnaryExpression! UnaryExpression;;;;;;if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNoShortIf...methodoverride [no line break] methodfinal [no line break] methodfinal [no line break] override [no line break] method;;|
JavaScript 2.0
Rationale
|
Thursday, November 11, 1999
This chapter discusses the decisions made in desigining JavaScript 2.0. Rationales are presented together with descriptions of other alternatives that were/are considered. Currently outstanding issues are in red.
|
JavaScript 2.0
Rationale
Syntax
|
Thursday, November 11, 1999
The term semicolon insertion informally refers to the ability to write programs while omitting semicolons between statements. In both JavaScript 1.5 and JavaScript 2.0 there are two kinds of semicolon insertion:
} and the end of the program are optional in both JavaScript
1.5 and 2.0. In addition, the JavaScript 2.0 parser allows semicolons to be omitted before the else
of an if-else statement and before the while of a do-while
statement.Grammatical semicolon insertion is implemented directly by the parser grammar's productions, which simply do not require a semicolon in the aforementioned cases. Line breaks in the source code are not relevant to grammatical semicolon insertion.
Line-break semicolon insertion cannot be easily implemented in the parser's grammar. This kind of semicolon insertion turns a syntactically incorrect program into a correct program and relies on line breaks in the source code.
Grammatical semicolon insertion is harmless. On the other hand, line-break semicolon insertion suffers from the following problems:
The first problem presents difficulty for some preprocessors such as the one for XML attributes which turn line breaks into spaces. The second and third ones are more serious. Users are confused when they discover that the program
a = b + c (d + e).print()
doesn't do what they expect:
a = b + c; (d + e).print();
Instead, that program is parsed as:
a = b + c(d + e).print();
The third problem is the most serious. New features are added to the language turn illegal syntax into legal syntax. If
an existing program relies on the illegal syntax to trigger line-break semicolon insertion, then the program will silently
change behavior once the feature is added. For example, the juxtaposition of a numeric literal followed by a string literal
(such as 4 "in") is illegal in JavaScript 1.5. JavaScript 2.0 makes this legal syntax for expressions with
units. This syntax extension has the unfortunate consequence of silently changing the meaning of the following JavaScript
1.5 program:
a = b + 4 "in".print()
from:
a = b + 4; "in".print();
to:
a = b + 4"in".print();
JavaScript 2.0 gets around this incompatibility by adding a [no line break] restriction in the grammar that requires the numeric and string literals to be on the same line. Unfortunately, this compatibility is a double-edged sword. Due to JavaScript 1.5 compatibility, JavaScript 2.0 has to have a large number of these [no line break] restrictions. It is hard to remember all of them, and forgetting one of them often silently causes a JavaScript 2.0 program to be reinterpreted. Users will be dismayed to find that:
local
function f(x) {return x*x}
turns into:
local;
function f(x) {return x*x}
(where local; is an expression statement) instead of:
local function f(x) {return x*x}
An earlier version of JavaScript 2.0 disallowed line-break semicolon insertion. The current version allows it but only in non-strict mode. Strict mode removes all [no line break] restrictions, simplifying the language again. As a side effect, it is possible to write a program that does different things in strict and non-strict modes (the last example above is one such program), but this is the price to pay to achieve simplicity.
JavaScript 2.0 retains compatibility with JavaScript 1.5 by adopting the same rules for detecting regular expression literals. This complicates the design of programs such as syntax-directed text editors and machine scanners because it makes it impossible to find all of the tokens in a JavaScript program without parsing the program.
Making JavaScript 2.0's lexical grammar independent of its syntactic grammar significantly would have allowed tools to
easily process a JavaScript program and escape all instances of, say, </ to properly embed a JavaScript 2.0
or later program in an HTML page. The full parser changes for each version of JavaScript. To illustrate the difficulties,
compare such JavaScript 1.5 gems as:
for (var x = a in foo && "</x>" || mot ? z:/x:3;x<5;y</g/i) {xyz(x++);}
for (var x = a in foo && "</x>" || mot ? z/x:3;x<5;y</g/i) {xyz(x++);}
One idea explored early in the design of JavaScript 2.0 was providing an alternate, unambiguous syntax for regular expressions
and encouraging the use of the new syntax. A RegularExpression could have been specified unambiguously
using « and » as its opening and closing delimiters instead of / and /.
For example, «3*» would be a regular expression that matches zero or more 3's. Such
a regular expression could be empty: «» is a regular expression that matches only the empty string,
while // starts a comment. To write such a regular expression using the slash syntax one needs to write /(?:)/.
Syntactic resynchronization occurs when the lexer needs to find the end of a block (the matching })
in order to skip a portion of a program written in a future version of JavaScript. Ordinarily this would not be a problem,
but regular expressions complicate matters because they make lexing dependent on parsing. The rules for recognizing regular
expression literals must be changed for those portions of the program. The rule below might work, or a simplified parse might
be performed on the input to determine the locations of regular expressions. This is an area that needs
further work.
During syntax resynchronization JavaScript 2.0 determines whether a / starts a regular expression or is a
division (or /=) operator solely based on the previous token:
/ interpretation |
Previous token |
|---|---|
/ or /= |
Identifier Number RegularExpression
String) ++ --
] }false null super
this trueconstructor getter method
override setter traditional
versionAny other punctuation |
| RegularExpression | ! != !==
# % %=
& && &&=
&= ( *
*= + +=
, - -=
-> . ..
... / /=
: :: ;
< << <<=
<= = ==
=== > >=
>> >>= >>>
>>>= ? @
[ ^ ^=
^^ ^^= {
| |= ||
||= ~abstract break case
catch class const
continue debugger default
delete do else
enum eval export
extends field final
finally for function
goto if implements
import in instanceof
native new package
private protected public
return static switch
synchronized throw throws
transient try typeof
var volatile while
with |
Regardless of the previous token, // is interpreted as the beginning of a comment.
The only controversial choices are ) and }. A /
after either a ) or } token can be either a division
symbol (if the ) or } closes a subexpression or an
object literal) or a regular expression token (if the ) or }
closes a preceding statement or an if, while, or for expression). Having /
be interpreted as a RegularExpression in expressions such as (x+y)/2 would be problematic,
so it is interpreted as a division operator after ) or }.
If one wants to place a regular expression literal at the very beginning of an expression statement, it's best to put the
regular expression in parentheses. Fortunately, this is not common since one usually assigns the result of the regular expression
operation to a variable.
An alternative to language declarations that was considered early was to report syntax errors at the time the relevant
statement was executed rather than at the time it was parsed. This way a single program could include parts written in a future
version of JavaScript without getting an error unless it tries to execute those portions on a system that does not understand
that version of JavaScript. If a program part that contains an error is never executed, the error never breaks the script.
For example, the following function finishes successfully if whizBangFeature is false:
function move(integer x, integer y, integer d) {
x += 10;
y += 3;
if (whizBangFeature) {
simulate{@x and #y} along path
} else {
x += d; y += d;
}
return [x,y];
}
The code simulate{@x and #y} along path is a syntax error, but this error does not break the script unless
the script attempts to execute that piece of code.
One problem with this approach is that it frustrates debugging; a script author benefits from knowing about syntax errors at compile time rather than at run time.
|
JavaScript 2.0
Rationale
Execution Model
|
Thursday, November 11, 1999
When does a declaration (of a value, function, type, class, method, pragma, etc.) take effect? When are expressions evaluated? The answers to these questions distinguish among major kinds of programming languages. Let's consider the following function definition in a language with C++ or Java-like syntax:
gadget f(widget x) {
if ((gizmo)(x) != null)
return (gizmo)(x);
return x.owner;
}
In a static language such as Java or C++, all type expressions are evaluated at compile time. Thus, in this example widget
and gadget would be evaluated at compile time. If gizmo were a type, then it too would be evaluated
at compile time ((gizmo)(x) would become a type cast). Note that we must be able to statically distinguish identifiers
used for variables from identifiers used for types so we can decide whether (gizmo)(x) is a one-argument function
call (in which case gizmo would be evaluated at run time) or a type cast (in which case gizmo would
be evaluated at compile time). In most cases, in a static language a declaration is visible throughout its enclosing scope,
although there are exceptions that have been deemed too complicated for a compiler to handle such as the following C++:
typedef int *x;
class foo {
typedef x *y;
typedef char *x;
}
Many dynamic languages can construct, evaluate, and manipulate type expressions at run time. Some dynamic languages (such
as Common Lisp) distinguish between compile time and run time and provide constructs (eval-when) to evaluate
expressions early. The simplest dynamic languages (such as Scheme) process input in a single pass and do not distinguish between
compile time and run time. If we evaluated the above function in such a simple language, widget and gadget
would be evaluated at the time the function is called.
JavaScript is a scripting language. Many programmers wish to write JavaScript scripts embedded in web pages that work in a variety of environments. Some of these environments may provide libraries that a script would like to use, while on other environments the script may have to emulate those libraries. Let's take a look at an example of something one would expect to be able to easily do in a scripting language:
Bob is writing a script for a web page that wants to take advantage of an optional package MacPack that is
present on some environments (Macintoshes) but not on others. MacPack provides a class HyperWindoid
from which Bob wants to subclass his own class BobWindoid. On other platforms Bob has to define an emulation
class BobWindoid' that is implemented differently from BobWindoid -- it has a different set of private
methods and fields. There also is a class WindoidGuide in Bob's package; the code and method signatures of classes
BobWindoid and BobWindoid' refer to objects of type WindoidGuide, and class WindoidGuide's
code refers to objects of type BobWindoid (or BobWindoid' as appropriate).
Were JavaScript to use a dynamic execution model (described below), declarations take effect only when executed, and Bob
can implement his package as shown below. The package keyword in front of both definitions of class BobWindoid
lifts these definitions from the local if scope to the top level of Bob's package.
class WindoidGuide; // forward declaration
if (onMac()) {
import "MacPack";
package class BobWindoid extends HyperWindoid {
private field x;
field g:WindoidGuide;
private method speck() {...};
public method zoom(a:WindoidGuide, uncle:HyperWindoid = null):WindoidGuide {...};
}
} else {
// emulation class BobWindoid'
package class BobWindoid {
private field i:integer, j:integer;
field g:WindoidGuide;
private method advertise(h:WindoidGuide):WindoidGuide {...};
private method subscribe(h:WindoidGuide):WindoidGuide {...};
public method zoom(a:WindoidGuide):WindoidGuide {...};
}
}
class WindoidGuide {
field currentWindoid:BobWindoid;
method introduce(arg:BobWindoid):BobWindoid {...};
}
On the other hand, if the language were static (meaning that types are compile-time expressions), Bob would run into problems.
How could he declare the two alternatives for the class BobWindoid?
Bob's first thought was to split his package into three HTML SCRIPT tags (containing BobWindoid,
BobWindoid', and WindoidGuide) and turn one of the first two off depending on the platform. Unfortunately
this doesn't work because he gets type errors if he separates the definition of class BobWindoid (or BobWindoid')
from the definition of WindoidGuide because these classes mutually refer to each other. Furthermore, Bob would
like to share the script among many pages, so he'd like to have the entire script in a single BobUtilities.js file.
Note that this problem would be newly introduced by JavaScript 2.0 if it were to evaluate type expressions at compile time. JavaScript 1.5 does not suffer from this problem because it does not have a concept of evaluating an expression at compile time, and it is relatively easy to conditionally define a class (which is merely a function) by declaring a single global variable g and conditionally assigning either one or another anonymous function to it.
There exist other alternatives in between the dynamic execution model and the static model that also solve Bob's problem. One of them is described at the end of this chapter.
In a pure dynamic execution model the entire program is processed in one pass. Declarations take effect only when they are executed. A declaration that is never executed is ignored. Scheme follows this model, as did early versions of Visual Basic.
The dynamic execution model considerably simplifies the language and allows an interpreter to treat programs read from a file identically to programs typed in via an interactive console. Also, a dynamic execution model interpreter or just-in-time compiler may start to execute a script even before it has finished downloading all of it.
One of the most significant advantages of the dynamic execution model is that it allows JavaScript 2.0 scripts to turn parts of themselves on and off based on dynamically obtained information. For example, a script or library could define additional functions and classes if it runs on an environment that supports CSS unit arithmetic while still working on environments that do not.
The dynamic execution model requires identifiers naming functions and variables to be defined before they are used. A
use occurs when an identifier is read, written, or called, at which point that identifier is resolved to a variable or a function
according to the scoping rules. A reference from within a control statement such as if and while
located outside a function is resolved only when execution reaches the reference. References from within the body of a function
are resolved only after the function is called; for efficiency, an implementation is allowed to resolve all references within
a function or method that does not contain eval at the first time the function is called.
According to these rules, the following program is correct and would print 7:
function f(a:integer):integer {
return a+b;
}
var b:integer = 4;
print(f(3));
Assuming that variable b is predefined by the host if featurePresent is true, this program would
also work:
function f(a:integer):integer {
return a+b;
}
if (!featurePresent) {
package var b:integer = 4;
}
print(f(3));
On the other hand, the following program would produce an error because f is referenced before it is defined:
print(f(3));
function f(a:integer):integer {
return a*2;
}
Defining mutually recursive functions is not a problem as long as one defines all of them before calling them.
JavaScript 1.5 does not follow the pure dynamic execution model, and, for reasons of compatibility, JavaScript 2.0 strays from that model as well, adopting a hybrid execution model instead. Specifically, JavaScript 2.0 inherits the following static execution model aspects from JavaScript 1.5:
local prefix, variable declarations of variables at the global scope
cause the variables to be created at the time the program is entered rather than at the time the declaractions are evaluated.local prefix, variable declarations of local variables inside a function
cause the variables to be created at the time the function is entered rather than at the time the declaractions are evaluated.In addition to the above, the evaluation of class declarations has special provisions for delayed evaluation to allow mutually-referencing classes.
The second condition above allows the following program to work in JavaScript 2.0:
const b:string = "Bee";
function square(a:integer):integer {
b = a; // Refers to local b defined below, not global b
return b*a;
var b:integer;
}
While allowed, using variables ahead of declaring them, such as in the above example, is considered bad style and may generate a warning.
The third condition above makes the last example from the pure execution model section work:
print(f(3));
function f(a:integer):integer {
return a*2;
}
Again, actually calling a function at the top level before declaring it is considered bad style and may generate a warning. It also will not work with classes.
Perhaps the easiest way to compile a script under the dynamic execution model is to accumulate function definitions unprocessed and compile them only when they are first called. Many JITs do this anyway because this lets them avoid the overhead of compiling functions that are never called. This process does not impose any more of an overhead than the static model would because under the static model the compiler would need to either scan the source code twice or save all of it unprocessed during the first pass for processing in the second pass.
Compiling a dynamic execution model script off-line also does not present special difficulties as long as eval is
restricted to not introduce additional declarations that shadow existing ones (if eval is allowed to do this,
it would present problems for any execution model, including the static one). Under the dynamic execution model, once
the compiler has reached the end of a scope it can assume that that scope is complete; at that point all identifiers inside
that scope can be resolved to the same extent that they would be in the static model.
Bob's problem could also be solved by using conditional compilation similar in spirit to C's preprocessor. If we do this, we have to ask about how expressive the conditional compilation meta-language should be. C's preprocessor is too weak. In JavaScript applications we'd often find that we need the full power of JavaScript so that we can inspect the DOM, the environment, etc. when deciding how to control compilation. Besides, using JavaScript as the meta-language would reduce the number of languages that a programmer would have to learn.
Here's one sketch of how this could be done:
(x)(y) is a function call of function x or a cast of y to type
x.#
symbol. For example, #{var x:int = 3} defines a compile-time constant x and initializes
it to 3. One can also lift a var, const, or function declaration directly by
preceding it with a # symbol, so #var x:int = 3; would accomplish the same
thing.int in the preceding example is such a TypeExpression.#{#var x:int = 3}) is evaluated at
compile compile time, and so forth.# if ( Expression ) Statements [# else if ( Expression ) Statements] ... [# else Statements] # end if#'s can appear anywhere on a line.#if to conditionally exclude compile time code, etc.Note that because variable initializers are not evaluated at compile time, one has to use #var a = int rather
than var a = int to define an alias a for a type name int.
This sketch does not address many issues that would have to be resolved, such as how typed variables are handled after they are declared but before they are initialized (this problem doesn't arise in the dynamic execution model), how the lexical scopes of the run time pass would interact with scoping of the compile time pass, etc.
Both approaches solve Bob's problem, but they differ in other areas. In the sequel "conditional compilation" refers to the conditional compilation alternative described above.
|
JavaScript 2.0
Rationale
Member Lookup
|
Thursday, November 11, 1999
There have been much discussion in the TC39 subgroup about the meaning of a member lookup operation. Numerous considerations intersect here.
We will express a general unqualified member lookup operation as a.b, where a
is an expression and b is an identifier. We will also consider qualified member lookup operations and write them
as a.n::b, where n is an expression that evaluates to
some namespace. In almost all cases we will be interested in the dynamic type Td of a. In one scheme
we will also consider the static type Ts of the expression a. If the language is sound, we will always
have Td Ts.
In the simplest approach, we treat an object as merely an association table of member names and member values. In this
interpretation we simply look inside object a and check if there is a member named b. If there is, we return the
member's value; if not, we return undefined or signal an error.
There are a number of difficulties with this simple approach, and most object-oriented languages have not adopted it:
private or package-protected.Once we allow private or package-protected members, we must allow for the possibility that object
a will have more than one member named b -- abstraction considerations require that users of a class
C not be aware of expose C's private members, so, in particular, a user should be able to create a subclass
D of C and add members to D without knowing the names of C's private members.
Both C++ and Java allow this. We must also allow for the possibility that object a will have a member named b
but we are not allowed to access it. We will assume that access control is specified by lexical scoping, as is traditional
in modern languages.
Some of the criteria we would like the member lookup model to satisfy are:
private member outside the class
where the member is defined, nor does it allow access to a package member outside the package where the member
is defined. Furthermore, if a class C accesses its private member m, a hostile subclass D
of C cannot silently substitute a member m' that would masquerade as m inside C's
code.private and package package are invisible outside
their respective classes or packages. For programming in the large, a class can provide several public versions
to its importers, and public members of more recent versions are invisible to importers of older versions.
This is needed to provide robust libraries.private, package, or public, assuming, of
course, that that member is not used outside its new visibility.There are three main competing models for performing a general unqualified member lookup operation as a.b.
Let S be the set of members named b of the object obtained by evaluating expression a (hereafter
shortened to just "object a") that are accessible via the visibility
rules applied in the lexical scope where a.b is evaluated. All three models pick some
member s S. Clearly, if the
set S is empty, then the member lookup fails. In addition, the Spice and pure Static models may sometimes deliberately
fail even when set S is not empty. Except for such deliberate failures, if the set S contains only one
member s, all three models return that element s. If the set S contains multiple members,
the three models will likely choose different members.
Another interesting (and useful) tidbit is that the Static and Dynamic models always agree on the interpretation of member
lookup operations of the form this.b. All three models agree on on the interpretation of member lookup
operations of the form this.b in the case where b is a member defined in the current class.
A note about overriding: When a subclass D overrides a member m of its superclass C, then the definition of the member m is conceptually replaced in all instances of D. However, the three models are only concerned with the topmost class in which member m is declared. All three models handle overriding the way one would expect of an object-oriented language. They differ in the cases where class C has a member named m, subclass D of C has a member with the same name m, but D's m does not override C's m because C's m is not visible inside D (it's not well known, but such non-overriding does and must happen in C++ and Java as well).
In the Static model we look at the static type Ts of expression a. Let S1 be the subset of S whose class is either Ts or one of Ts's ancestors. We pick the member in S1 with the most derived class.
The pure static model above is implemented by Java and C++. It would not work well in that form in JavaScript because many,
if not most, expressions have type Any. Because type Any has no members, users would have to cast
expression a to a given type T before they could access members of type T. Because of this
we must extend the static model to handle the case where the subset S1 is empty, or, in other words, the static
lookup fails. (Rather than doing this, we could extend the static model in the case where the static type Ts is
some special type, but then we would have to decide which types are special and which ones are not. Any is clearly
special. What about Object? What about Array? It's hard to draw the line consistently.)
In whichever cases way we extend the static model, we also have a choice of which member we choose. We could back off to the dynamic model, we could choose the most derived member in S, or perhaps we could choose some other approach.
Constraints:
| Safety | Good within the pure static model. Problems in the extended static model (a subclass could silently shadow a member) that could perhaps be addressed by warnings. |
| Abstraction | Good. |
| Robustness | Very bad. Updating a function's or global variable return type silently changes the meaning of all code that uses that function or global variable; in a large project such a change would be quite difficult. Difficult to correctly split expressions into subexpressions. |
| Namespace independence | Good. |
| Compatibility | Bad within the pure static model (type casts needed everywhere). May be good in the extended static model, depending on the choice of how we extend it. |
| Other |
This model may be difficult to compile well because the compiler may have difficulty in determining the intermediate types in compound expressions. Languages based on the static model have traditionally been compiled off-line, and such compilers tend to be difficult to write for on-line compilation without requiring the programmer to predeclare all of his data structures (if there are any forward-referenced ones, then the compiler doesn't know whether they should have a type or not). A more dynamic execution model may actually help because it defers compilation until more information is known. |
In the Spice model we think of each member m defined in a class C as though it were a function definition for a (possibly overloaded) function whose first argument has type C. Definitions in an inner lexical scope shadow definitions in outer scopes. The Spice model does not consider the static type Ts of expression a.
Let L be the innermost lexical scope enclosing the member lookup expression a.b
such that some member named b is defined in L. Let Lb be the set of all members named b
defined in lexical scope L, and let S1 = S Lb
(the intersection of S and Lb). If S1 is empty, we fail. If S1 contains exactly
one member s, we use s. If S1 contains several members, we fail (this would only happen for
import conflicts).
Constraints:
| Safety | Good. |
| Abstraction | Good. |
| Robustness | Poor. Renaming a package-visible member may break code outside the class that defines that
member even if that code does not access that member. Converting a member from private to one of the other
two visibilities also can introduce conflicts in other, unrelated classes in the same package that just happen to have
an unrelated member with the same name. Fortunately these conflicts usually (but not always) result in errors rather
than silent changes to the meaning of the program, so one can often find them by exhaustively testing the program after
making a change. |
| Namespace independence | Bad. Members with the same name in unrelated classes often conflict. |
| Compatibility | Poor? Many existing programs rely on namespace independence and would have to be restructured. |
| Other |
Most object-oriented programmers would be confused by a violation of namespace independence. Programming without this assumption requires a different point of view than most programmers are used to. (I am not talking about Lisp and Self programmers, who are familiar with that way of thinking.) |
[There are numerous other variants of the Spice model as well.]
In the Dynamic model we pick the member s in S defined in the innermost lexical scope L
enclosing the member lookup expression a.b. We fail if the innermost such lexical
scope L contains more than one member in S (this would only happen for import conflicts).
Constraints:
| Safety | Good at the language level, but see "other" below. |
| Abstraction | Good. |
| Robustness | Good. All of these changes are easy to do. |
| Namespace independence | Good. |
| Compatibility | Good. |
| Other |
Packages using the dynamic model may be vulnerable to hijacking (coerced into doing something other than what the author intended) by a determined intruder. It is possible for a compiler to detect such vulnerabilities and warn about them. |
The various models make it possible to get into situations where either there is no way to access a visible member of an
object or it is not safe to do so (see member hijacking). In these cases we'd like to be able to
explicitly choose one of several potential members with the same name. The :: namespace syntax allows this. The
left operand of :: is an expression that evaluates to a package or class; we may also allow special keywords
such as public, package, or private instead of an expression here, or omit the expression
altogether. The right operand of :: is a name. The result is the name qualified by the namespace.
As we have seen, the name b in a member access expression a.b does not necessarily
refer to a unique accessible member of object a. In a qualified member access expression a.n::b,
the namespace n narrows the set of members considered, although it's possible that the set may still contain more
than one member, in which case the lookup model again disambiguates. Let S be the set of members named b
of object a that are accessible. The following table shows how a.n::b
subsets set S depending on n:
| n | Subset |
|---|---|
| None | Only the ad-hoc member named b, if any exists |
| A class C | The fixed member of C named b, if it exists; if not, try C's superclass instead, and so on up the chain |
| A package P | The subset of S containing all accessible members of P |
private |
The fixed member named b of the current class |
package |
The subset of S containing all accessible members that have package visibility |
public |
The subset of S containing all accessible members that have public visibility |
The :: operator serves a different role from the . operator. The :: operator produces
a qualified name, while the . operator produces a value. A qualified name can be used as
the right operand of .; a value cannot. If a qualified name is used in a place where a value is expected, the
qualified name is looked up using the lexical scoping rules to obtain the value (most likely a global variable).
All of the models above address only access to fixed members of a class. JavaScript also allows one to dynamically add
members to individual instances of a class. For simplicity we do not provide access control or versioning on these ad-hoc
members -- all of them are public and open to everyone. Because of the safety criterion, a member lookup
of a private or package-protected member must choose the private or package-protected
member even if there is an ad-hoc member of the same name. To satisfy the robustness criterion,
we should treat public members as similarly as possible to private or package-protected
members, so we always give preference to a fixed member when there is an ad-hoc member of the same name.
To access an ad-hoc member that is shadowed by a fixed member, we can either prefix the member's name with ::
or use an indirect member access.
How should we define the behavior of the expression a[b] (assuming the
[] operator is not overridden by a's class)? There are a couple
of possibilities:
"s" and
treat a[b] as though it were a.s. This
is essentially what JavaScript 1.5 does. Unfortunately it's hard to keep this behavior consistent with JavaScript 1.5
programs' expectations (they expect no more than one member with the same name, etc.), and this kind of indirection is
also vulnerable to hijacking. It may be possible to solve the hijacking problem by devising restricted
variants of the [] operator such as a.n::[b]
that follow the rules given in the namespaces section above."s" and
treat a[b] as though it were a.::s,
thus limiting our selection to ad-hoc members. Ad-hoc members are well-behaved, but this kind of behavior would violate
the compatibility criterion when JavaScript 1.5 scripts try to reflect a JavaScript 2.0 object
using the [] operator.In general it seems like it would be a bad idea to extend the syntax of the string "s"
to allow :: operators inside the string. Such strings are too easily forged to play the role of pointers to members.
[explain security attacks]
|
JavaScript 2.0
Compatibility
|
Thursday, November 11, 1999
JavaScript 2.0 is intended to be upwards compatible with JavaScript 1.5 and earlier scripts. The following are the current compatibility issues:
void expr by void(expr).[expr, expr] by expr[(expr,
expr)] because commas are now significant inside brackets.eval for identifiers.Object and String may not work.JavaScript 2.0 is still evolving, and some of these compatibility issues may be addressed as the language matures. They are not expected to be a problem in practice because a browser could distinguish JavaScript 1.5 and earlier scripts from JavaScript 2.0 scripts and behave compatibly on the earlier ones.
|
Waldemar Horwat Last modified Friday, November 12, 1999 |