|
JavaScript 2.0
|
Wednesday, February 16, 2000
A multi-page version of this document is also available.
JavaScript 2.0 is an experimental proposal maintained by waldemar for future changes in the JavaScript language. The eventual language may differ significantly from this proposal, but the goal is to move in the directions indicated here and do so via a coordinated plan rather than adding miscellaneous features ad hoc on a release-by-release basis.
JavaScript is Netscape's implementation of the ECMAScript standard. The development of JavaScript 2.0 is heavily coordinated with the ECMA TC39 modularity subgroup. The intent is to make JavaScript 2.0 and ECMAScript Edition 4 be the same language, and this document will evolve as necessary to accomplish this.
The following are recent major changes in this document:
| Date | Revisions |
|---|---|
| Feb 16, 2000 | Updated machine type and operator overloading pages. |
| Feb 15, 2000 | Updated grammar and discussions of concepts, types, expressions, statements, definitions, and variables, as well as the syntax rationale. |
| Dec 7, 1999 | Removed field, method, and constructor
from the semantics and replaced with creative uses of the static prefix. |
| Nov 11, 1999 | Continuing major reorganization of this document.... |
| Nov 5, 1999 | Reorganized the document's structure into chapters. Structured the core language chapter more in the bottom-up style of the ECMAScript standard than in the previous issue-oriented style. Combined and moved rationales and issues into an appendix. Added introduction page. Removed or reworded many obsolete paragraphs throughout the document. |
| Nov 2, 1999 | Modified the parser grammar: added [no line break] constraints, removed version
lists after public keywords, added box and user-defined
visibility keywords, and added named function arguments. |
| Oct 29, 1999 | Revised the execution model based on recent ECMA modularity group discussions. JavaScript 2.0 now has a hybrid execution model instead of a pure dynamic one, which allows for better compatibility with JavaScript 1.5. |
| Oct 20, 1999 | Added throw and try-catch
semantic operators to semantic notation and used them to signal syntax errors detected
by the semantics that would be impossible or too messy to detect in the grammars. Updated formal
description pages to match recent ECMA TC39 subcommittee decisions: eliminated octal numbers and escapes (both in
strings and in regular expressions) to match ECMAScript Edition 3, switched to using the Identifier : TypeExpression
syntax for type declarations, and added local blocks and the local
visibility specifier. Also simplified the parser grammar for definitions and removed the « and »
syntax for regular expression literals. |
| Jul 26, 1999 | Wrote description of semantic notation. Updated grammar
notation page to describe lookahead constraints. Updated regular expression
semantics to match ECMA working group decisions for ECMAScript Edition 3; one of these included changing the behavior
of (?= to not backtrack. |
| Jun 7, 1999 | Revised all grammars and semantics to simplify the grammars. Fixed several errors and omissions in the regular expression
grammar and semantics. Added support
for (?= and (?!. |
| May 16, 1999 | Added regular expression grammar and semantics. |
| May 12, 1999 | Added preliminary Formal Description chapter. |
| Mar 25, 1999 | Added Member Lookup page. Released second draft. |
| Mar 24, 1999 | Added many clarifications, discussion sections, and small changes throughout the pages. |
| Mar 23, 1999 | Rewrote Execution Model page and split it off from the Definitions
page. Added discussion of float to Machine Types. |
| Mar 22, 1999 | Removed numbered versions from the Versions page; added motivation, discussion,
and version aliasing using =. Removed angle brackets < and > from VersionsAndRenames. |
| Mar 16, 1999 | Rewrote Types page. Split off byte, ubyte, short,
ushort, int, uint, long, ulong into an optional Machine
Types library. |
| Feb 18, 1999 | Released first draft. |
Older drafts are also available:
|
JavaScript 2.0
Introduction
|
Thursday, November 11, 1999
JavaScript 2.0 is the next major step in the evolution of the JavaScript language. JavaScript 2.0 incorporates the following features in addition to those already found in JavaScript 1.5:
const and finalprivate, package, public,
and user-defined access controls+ and [ ]int for more faithful communication with other programming languagesThese facilities reinforce each other while remaining fairly small and simple. Unlike in Java, the philosophy behind them is to provide the minimal necessary facilities that other parties can use to write packages that specialize the language for particular domains rather than define these packages as part of the language core.
The versioning and access control mechanisms make the language is suitable for programming-in-the-large.
The language remains firmly in the dynamic camp. Classes can be declared statically or dynamically. JavaScript 2.0 provides introspection facilities. In some ways JavaScript 2.0 is more dynamic than JavaScript 1.5. For example, it is much easier to conditionally declare functions in JavaScript 2.0 than in 1.5: one simply defines a function inside a conditional.
The overridable basic operators can be used to implement numbers with attached units similar to the Spice proposals. Rather than implement the full unit model in the language core, JavaScript 2.0 provides the syntactic and semantic hooks to allow one to implement a unit library with whatever sophistication one's application requires.
|
JavaScript 2.0
Introduction
Motivation
|
Thursday, November 11, 1999
The main goals of JavaScript 2.0 are:
The following are specifically not goals of JavaScript 2.0:
JavaScript is not currently an all-purpose programming language. Its strengths are its quick execution from source (thus enabling it to be distributed in web pages in source form), its dynamism, and its interfaces to Java and other environments. JavaScript 2.0 is intended to improve upon these strengths, while adding others such as the abilities to reliably compose JavaScript programs out of components and libraries and to write object-oriented programs. On the other hand, it is not our intent to have JavaScript 2.0 supplant languages such as C++ and Java, which will still be more suitable for writing many kinds of applications, including very large, performance-critical, and low-level ones.
The proposed features are derived from the goals above. Consider, for example, the goals of writing modular and robust applications.
To achieve modularity we would like some kind of a library mechanism. The proposed package mechanism serves this purpose, but by itself it would not be enough. Unlike existing JavaScript programs which tend to be monolithic, packages and their clients are often written by different people at different times. Once we introduce packages, we encounter the problems of the author of a package not having access to all of its clients, or the author of a client not having access to all versions of the library it needs. If we add packages to the language without solving these problems, we will never be able to achieve robustness, so we must address these problems by creating facilities for defining abstractions between packages and clients.
To create these abstractions we make the language more disciplined by adding optional types and type-checking. We also introduce a coherent and disciplined syntax for defining classes and hierarchies and versioning of classes. Unlike JavaScript 1.5, the author of a class can guarantee invariants concerning its instances and can control access to its instances, making the package author's job tractable. The class syntax is also much more self-documenting than in JavaScript 1.5, making it easier to understand and use JavaScript 2.0 code. Defining subclasses is easy in JavaScript 2.0, while doing it robustly in JavaScript 1.5 is quite difficult.
To make packages work we need to make the language more robust in other areas as well. It would not be good if one package
redefined Object.toString or added methods to the Array prototype and thereby corrupted another
package. We can simplify the language by eliminating many idioms like these (except when running legacy programs, which would
not use packages) and provide better alternatives instead. This has the added advantage of speeding up the language's implementation
by eliminating thread synchronization points. Making the standard packages robust can also significantly reduce the memory
requirements and improve speed on servers by allowing packages to be shared among many different requests rather than having
to start with a clean set of packages for each request because some other request might have modified some property.
JavaScript 2.0 should interface with other languages even better than JavaScript 1.5 does. If the goal of integration is achieved, the user of an abstraction should not have to care much about whether the abstraction is written in JavaScript, Java, or another language. It should also be possible to make JavaScript abstractions that appear native to Java or other language users.
In order to achieve seamless interfacing with other languages, JavaScript should provide equivalents for the fundamental
data types of those languages. Details such as syntax do not have to be the same, but the concepts should be there. JavaScript
1.5 lacks support for integers, making it hard to interface with a Java method that expects a long.
JavaScript is appearing in a number of different application domains, many of which are evolving. Rather than support all of these domains in the core JavaScript, JavaScript 2.0 should provide flexible facilities that allow these application domains to define their own, evolving standards that are convenient to use without requiring continuous changes to the core of JavaScript. JavaScript 2.0 addresses this goal by letting user programs define facilities such as getters, setters, and alternative definitions of operators --facilities that could only be done by the core of the language in JavaScript 1.5.
|
JavaScript 2.0
Introduction
Notation
|
Thursday, November 11, 1999
This proposal uses the following conventions to denote literal characters:
Printable ASCII literal characters (values 20 through 7E hexadecimal) are in a blue monospaced font. Other
characters are denoted by enclosing their four-digit hexadecimal Unicode value between «u
and ». For example, the non-breakable space character would be denoted in this
document as «u00A0». A few of the common control characters are represented
by name:
| Abbreviation | Unicode Value |
|---|---|
«NUL» |
«u0000» |
«BS» |
«u0008» |
«TAB» |
«u0009» |
«LF» |
«u000A» |
«VT» |
«u000B» |
«FF» |
«u000C» |
«CR» |
«u000D» |
«SP» |
«u0020» |
A space character is denoted in this document either by a blank space where it's obvious from the context or by «SP»
where the space might be confused with some other notation.
Each LR(1) parser grammar and lexer grammar rule consists of a nonterminal, a , and one or more expansions of the nonterminal separated by vertical bars (|). The expansions are usually listed on separate lines but may be listed on the same line if they are short. An empty expansion is denoted as «empty».
Consider the sample rule:
... Identifier, ... IdentifierThis rule states that the nonterminal SampleList can represent one of four kinds of sequences of input tokens:
... followed by some expansion of the nonterminal Identifier;, and ... and an expansion of the nonterminal
Identifier.Input tokens are characters (and the special End placeholder) in the lexer
grammar and lexer tokens in the parser grammar. Spaces separate input tokens and nonterminals
from each other. An input token that consists of a space character is denoted as «SP».
Other non-ASCII or non-printable characters are denoted by also using « and »,
as described in the character notation section.
If the phrase "[lookahead set]" appears in the expansion of a production, it indicates that the production may not be used if the immediately following input terminal is a member of the given set. That set can be written as a list of terminals enclosed in curly braces. For convenience, set can also be written as a nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand.
For example, given the rules
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9the rule
n [lookahead {1, 3, 5, 7, 9}] DecimalDigitsmatches either the letter n followed by one or more decimal digits the first of which is even, or a decimal
digit not followed by another decimal digit.
These lookahead constraints do not make the grammars more theoretically powerful than LR(1), but they do allow these grammars to be written more simply. The semantic engine compiles grammars with lookahead constraints into parse tables that have the same format as those produced from ordinary LR(1) or LALR(1) grammars.
Many rules in the grammars occur in groups of analogous rules. Rather than list them individually, these groups have been summarized using the shorthand illustrated by the example below:
Metadefinitions such as
introduce grammar arguments a and b. If these arguments later parametrize the nonterminal on the left side of a rule, that rule is implicitly replicated into a set of rules in each of which a grammar argument is consistently substituted by one of its variants. For example, the sample rule
= AssignmentExpressionnormal,bexpands into the following four rules:
= AssignmentExpressionnormal,allowIn= AssignmentExpressionnormal,noIn= AssignmentExpressionnormal,allowIn= AssignmentExpressionnormal,noInAssignmentExpressionnormal,allowIn is now an unparametrized nonterminal and processed normally by the grammar.
Some of the expanded rules (such as the fourth one in the example above) may be unreachable from the grammar's starting nonterminal; these are ignored.
A few lexer rules have too many expansions to be practically listed. These are specified by descriptive text instead of a list of expansions after the .
Some lexer rules contain the metaword except. These rules match any expansion that is listed before the except
but that does not match any expansion after the except. All of these rules ultimately expand into single characters.
For exaple, the rule below matches any single UnicodeCharacter except the * and
/ characters:
A few parts of the main body of this proposal still use an informal syntax to describe language constructs, although this syntax is being phased out. An example is the following:
< VersionRange [: Identifier] , ... , VersionRange [: Identifier] >].. [Version]VersionsAndRenames and VersionRange are the names of the grammar
rules. The black square brackets represent optional items, and the black ... together with its neighbors represents optional
repetition of zero or more items, so a VersionsAndRenames can have zero or more sets of VersionRange [: Identifier]
separated by commas. A black | indicates that either its left or right alternative may be present, but not both; |'s have
the lowest metasymbol precedence. Syntactic tokens to be typed literally are in a bold blue monospaced
font. Grammar nonterminals are in green italic and correspond to the nonterminals in the
parser grammar or lexer grammar.
|
JavaScript 2.0
Core Language
|
Thursday, November 11, 1999
This chapter presents an informal description of the core language. The exact syntax and semantics are specified in the formal description. Libraries are also specified in a separate library chapter.
|
JavaScript 2.0
Core Language
Concepts
|
Tuesday, February 15, 2000
A value is an entity that can be stored in a variable, passed to a function, or returned from a function. Sample values include:
undefinednull5 (a number)true (a boolean)"Kilopi" (a string)[1, 5, false] (a three-element array){a:3, b:7} (an object with two properties)function (x) {return x*x} (a function)String (a class, a function, and a type)A type t represents two things:
The set S indicates which values are considered to be members of type t. We write v t to indicate that value v is a member of type t. The mapping M indicates how values may be coerced to type t. For each value v already in S, the mapping M must map v to itself.
A value can be a member of multiple sets, and, in general, a value belongs to more than one type. Thus, it is generally not useful to ask about the type of a value; one may ask instead whether a value belongs to some given type. There can also exist two different types with the same set of values but different coercion mappings.
On the other hand, a variable does have a particular type. If we declare a variable x of type t, then whatever value is held in x is guaranteed to have type t, and we can assign any value of type t to x. We may also be able to assign a value v t to x if type t's mapping specifies a coercion for value v; in this case the coerced value is stored in x.
Every type represents some set of values but not every set of values is represented by some type (this is required for logical consistency -- there are uncountably infinitely many sets of values but only countably infinitely many types).
Every type is also itself a value -- we can store a type in a variable, pass it to a function, or return it from a function.
If type a's set of values is a subset of type b's set of values, then we say that that type a is a subtype of type b. We denote this as a b.
Subtyping is transitive, so if a b and b c, then a c. Subtyping is also reflexive: a a. Also, if v t and t s, then v s.
The set of all values is represented by the type any, which is the supertype of all types. A variable with
type any can hold any value. The set of no values is represented by the type none, which is the
subtype of all types. A function with the return type none cannot return.
A class is a template for creating similar values, often called objects or instances. These instances generally share characteristics such as common methods and properties.
Every class is also a type and a value. When used as a type, a class represents the set of all possible instances of that class.
A class C can be derived from a superclass S. Class C can then inherit characteristics of class S. Every instance of C is also an instance of S, but not vice versa, which, by the definition of subtyping above, implies that C S when we consider C and S as types.
The subclass relation imposes a hierarchy relation on the set of all classes. JavaScript 2.0 currently does not support multiple inheritance, although this is a possible future direction. If multiple inheritance were allowed, the subclass relation would impose a partial order on the set of all classes.
A scope represents a region of JavaScript source code. The JavaScript statements or expressions package,
class, function, and scope{ } define scopes in the source code. The top level
of a JavaScript program is also a scope, called the global scope. A scope is a static entity that does not change while a
JavaScript program is running (except that if the program calls eval then new JavaScript source code will be
created which may share existing scopes or create its own scopes).
A scope may be contained inside another scope. If two scopes overlap, one must be contained entirely within the other,
so scopes form a hierarchy. There is a scope, called public, that encloses all other scopes, including global
scopes.
Scope information is used at run time to help with variable and property lookups and visibility checks.
A scope should not be confused with an activation frame, which is a runtime binding of local variables to values. A scope should also not be confused with a namespace, which is a binding of names to values.
A namespace maps names to values. When looking up property p of object o, the object's namespace is consulted for a binding of p. An object may have several different namespaces which are selected based on scope information (some properties of o may only be visible from the scope where o's class is defined) and whether a property is being read or written.
An activation frame contains a simple namespace that maps names of local variables to their getters and setters.
|
JavaScript 2.0
Core Language
Lexer
|
Tuesday, February 15, 2000
This section presents an informal overview of the JavaScript 2.0 lexer. See the stages and lexer semantics sections in the formal description chapter for the details.
The JavaScript 2.0 lexer behaves in the same way as the JavaScript 1.5 lexer except for the following:
}. In addition, the JavaScript 2.0 parser allows semicolons to be
omitted before the else of an if-else statement and before
the while of a do-while statement.JavaScript 2.0 source text consists of a sequence of UTF-16 Unicode version 2.1 or later characters normalized to Unicode Normalized Form C (canonical composition), as described in the Unicode Technical Report #15.
Comments and white space behave just like in JavaScript 1.5.
The following JavaScript 1.5 punctuation tokens are recognized in JavaScript 2.0:
! != !==
% %= &
&& &= (
) * *=
+ ++ +=
, - --
-= . /
/= : ::
; < <<
<<= <= =
== === >
>= >> >>=
>>> >>>= ?
[ ] ^
^= { |
|= || }
~
The following punctuation tokens are new in JavaScript 2.0:
# &&= ->
.. ... @
^^ ^^= ||=
The following reserved words are used in JavaScript 2.0:
break case catch
class const continue
default delete do
else eval extends
false final finally
for function if
in instanceof new
null package private
public return super
switch this throw
true try typeof
var while with
Out of these, the only word that was not reserved in JavaScript 1.5 is eval.
The following reserved words are reserved for future expansion:
abstract debugger enum
export goto implements
import interface native
protected synchronized throws
transient volatile
The following words have special meaning in some contexts in JavaScript 2.0 but are not reserved and may be used as identifiers:
get language set
The JavaScript 2.0 grammar explicitly makes semicolons optional in the following situations:
}else of an if-else statementwhile of a do-while statement (but not before the while
of a while statement)Semicolons are optional in these situations even if they would construct empty statements. Strict mode has no effect on semicolon insertion in the above cases.
In addition, sometimes line breaks in the input stream are turned into VirtualSemicolon tokens. Specifically, if the first through the nth tokens of a JavaScript program form are grammatically valid but the first through the n+1st tokens are not and there is a line break (or a comment including a line break) between the nth tokens and the n+1st tokens, then the parser tries to parse the program again after inserting a VirtualSemicolon token between the nth and the n+1st tokens. This kind of VirtualSemicolon insertion does not occur in strict mode.
See also the semicolon insertion syntax rationale.
Regular expression literals begin with a slash (/) character not immediately followed by another slash (two
slashes start a line comment). Like in JavaScript 1.5, regular expression literals are ambiguous with the division (/)
or division-assignment (/=) tokens. The lexer treats a / or /= as a division or division-assignment
token if either of these tokens would be allowed by the syntactic grammar as the next token; otherwise, the lexer treats a
/ or /= as starting a regular expression.
This unfortunate dependence of lexical parsing on grammatical parsing is inherited from JavaScript 1.5. See the regular expression syntax rationale for a discussion of the issues.
When a numeric literal is be immediately followed by an optional underscore and an identifier, the lexer drops the underscore if it is present and converts the identifier to a string literal. The parser then treats the number and string as a unit expression. There are no reserved word restrictions on the identifier in this case; any identifier that begins with a letter will work, even if it is a reserved word.
For example, 3in and 3_in are both converted to 3 "in". 5xena
is converted to 5 "xena". On the other hand, 0xena is converted to 0xe "na".
It is unwise to define unit names that begin with the letters e or E either alone or followed by
a decimal digit, or x or X followed by a hexadecimal digit because of potential ambiguities with
exponential or hexadecimal notation.
|
JavaScript 2.0
Core Language
Expressions
|
Tuesday, February 15, 2000
Most of the behavior of expressions is the same as in JavaScript 1.5. Differences are highlighted below. One general difference is that most expression operators can be overridden via operator overloading.
The above keywords are not reserved and may be used in identifiers.
Just like in ECMAScript Edition 3, an identifier evaluates to an internal data structure called a reference. However, JavaScript 2.0 references have several additional attributes, one of which is a namespace. The namespace is set to the value of the ParenthesizedExpression. If the ParenthesizedExpression is a simple Identifier or QualifiedIdentifier then the parentheses may be omitted.
nulltruefalsethissuperA Number literal or ParenthesizedExpression
followed by a String literal is a unit expression. The unit object specified by the String
is looked up; the result is called as a function and passed two arguments: the numeric value of the Number
literal or ParenthesizedExpression, and either null
(if a ParenthesizedExpression was provided) or the original
Number literal expressed as a string.
The string representation allows user-defined unit classes to define extended syntaxes for numbers. For instance, a long-integer
package might define a unit called "L" that treats the Number literal as
a full 64-bit number without rounding it to a double first.
++--The @ operator performs a type cast. The second operand specifies the type. Both the
. and the @ operators accept either a QualifiedIdentifier
or a ParenthesizedExpression as the second operand.
If it is a ParenthesizedExpression, the second operand
of . must evaluate to a string. a.(x) is a synonym for a[x]
except that the latter can be overridden via operator overloading.
The [] operator can take multiple (or even named) arguments. This allows users to define
data structures such as multidimensional arrays via operator overloading.
An ArgumentList can contain both positional and named arguments. Named arguments use the same syntax as object literals.
delete PostfixExpressiontypeof UnaryExpressioneval UnaryExpression++ PostfixExpression-- PostfixExpression+ UnaryExpression- UnaryExpression~ UnaryExpression! UnaryExpressionThe ^^ operator is a logical exclusive-or operator. It evaluates both operands. If
they both convert to true or both convert to false, then ^^ returns false; otherwise ^^
returns the unconverted value of whichever argument converted to true.
|
JavaScript 2.0
Core Language
Statements
|
Tuesday, February 15, 2000
Most of the behavior of statements is the same as in JavaScript 1.5. Differences are highlighted below.
;;;;;;A block can be annotated with attributes as follows:
{ Statement ... Statement }Such a block behaves like a regular block except that every declaration inside that block (but not inside any enclosed scope) by default uses the attributes given by the block.
Annotated blocks are useful to define several items without having to repeat attributes for each one. For example,
class foo {
field z:Integer;
public var a;
private var b;
public function f() {}
public function g(x:Integer):Boolean {}
}
is equivalent to:
class foo {
var z:Integer;
public {
var a;
private var b;
function f() {}
function g(x:Integer):Boolean {}
}
}
A scope block has the syntax:
scope { Statement ... Statement }A scope block behaves like a regular block except that it forms its own scope. Variable and function definitions without a Visibility prefix inside the scope block belong to that block instead of the enclosing scope.
A compiler block has the syntax:
compile { Statement ... Statement }The compile attribute is a hint that the block may be (but does not have to be) evaluated early. The statements inside this block should depend only on each other, on the results of earlier compiler blocks, and on properties of the environment that are designated as being available early. Other than perhaps being evaluated early, compiler blocks respect all of the scope rules and semantics of the enclosing program. Any definitions introduced by a compiler block are saved and reintroduced at normal evaluation time. On the other hand, side effects may or may not be reintroduced at normal evaluation time, so compiler blocks should not rely on side effects.
compile is an attribute, so it may also be applied to individual definitions without
enclosing them in a block.
As an example, after defining
compile var x = 2;
function f1() {
compile {
var y = 5;
var x = 1;
while (y) x *= y--;
}
return ++x;
}
function f2() {
compile {
var y = x;
}
return x+y;
}
the value of global x will still be 2, calling f1() will always return 121,
and calling f2() will return 4. If the statement x=5 is then evaluated at the global
level, f1() will still return 121 because it uses its own local x. On the other hand,
calling f2() may return either 7 or 10 at the implementation's discretion -- 7
if the implementation evaluated the compile block early and saved the value of y
or 10 if it didn't. As this example illustrates, it is poor technique to define variables inside compiler blocks;
constants are usually better.
A fully dynamic implementation of JavaScript 2.0 may choose to ignore the compile attribute
and evaluate all compiler blocks at normal evaluation time. A fully static implementation may require that all user-defined
types and attributes be defined inside compiler blocks.
Should const definitions with simple constant expressions such as const four = 2+2
be treated as though they were implicitly compiler definitions (compile const four = 2+2)?
if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNoShortIfThe semicolon is optional before the else.
The semicolon is optional before the closing while.
|
JavaScript 2.0
Core Language
Definitions
|
Tuesday, February 15, 2000
Definitions introduce new constants, variables, functions, and classes. All definitions can be preceded by zero or more attributes using the following syntax:
A definition attribute is an identifier that modifies the definition. Attributes can specify a definition's visibility, semantics, and other hints. A JavaScript program may also define and subsequently use its own attributes.
The table below summarizes the predefined attributes.
| Category | Attribute | Behavior |
|---|---|---|
| Visibility | local |
The definition is local in the enclosing block. |
scope |
The definition applies to the enclosing scope. | |
global |
The definition applies to the enclosing package and is visible only inside this package. | |
private |
The definition creates a member of the enclosing class. The defined member is visible only inside that
class. If there is no enclosing class, private is the same as global. |
|
package |
The definition creates a member of the enclosing class. The defined member is visible only inside the enclosing
package. If there is no enclosing class, package is the same as global. |
|
public |
The definition creates a member of the enclosing class. The defined member is visible anywhere. If there is no enclosing class, the definition applies to the enclosing package and is visible in any package that imports this package. | |
| Semantic | static |
The definition creates a global member (rather than an instance member) of the enclosing class. |
instance |
The definition creates an instance member (rather than a global member) of the enclosing class. | |
final |
The definition cannot be overridden in subclasses. | |
| Hint | override |
The definition overrides a member of a superclass. |
mayOverride |
The definition may override a member of a superclass. | |
compile |
Compiler hint that the definition may be processed at compile time. | |
unused |
Compiler hint that the definition is not used. |
A visibility attribute describes the scope to which a definition applies as well as the definition's visibility outside that scope. A visibility attribute may be user-defined, in which case it can also indicate that the definition is visible in other packages only when those packages import a specific version of this package.
The local attribute applies the definition to the enclosing Block.
If the enclosing block is a class, the definition does not appear as a member of that class.
The scope attribute applies the definition to the enclosing scope. If the
enclosing scope is a class, the definition will appear as a member of that class; that member will be visible only inside
the enclosing package (as though it had package visibility).
The global attribute applies the definition to the enclosing package.
The private, package, public, and user-defined version attributes apply the definition
to the enclosing class or to the current package if there is no enclosing class.
The default visibility is scope.
There is a slight syntactic ambiguity between using package as a block attribute and defining
a new package.
The static attribute makes the definition create a global member rather than an instance member of the enclosing
class. The instance attribute reverses this -- it makes the definition create an instance member of the enclosing
class. The final attribute prevents subclasses from overriding this definition.
These three attributes may only be used on definitions that apply to a class. They cannot be used on definitions that, for instance, create local variables inside a function.
The override and mayOverride attributes control warnings. Normally defining a class member with
the same name as a visible member of a superclass generates a warning. The override attribute reverses the sense
of the warning so that the warning will be generated if there is no visible member of a superclass with the same name. The
mayOverride attribute turns off this warning altogether.
The compile attribute is a hint that the definition may be evaluated early. See compiler
blocks.
The unused attribute is a hint that the definition is not referenced anywhere. Referencing it will generate
a warning.
Any constant defined in an enclosing scope is also a potential attribute. That constant's value must be an attribute object,
which can be obtained either from another attribute or by calling one of the attribute-creating functions such as Version.
For example, the following code creates aliases priv and loc of the attributes private
and local:
compile {
const priv = private;
const loc = local;
const V1 = Version("1.0","");
const V2 = Version("2.0","1.0");
}
class C {
priv var x;
V1 var simple;
V2 var complicated;
priv static const a:Array = new Array(10);
loc var i;
for (i = 0; i != 10; i++) a[i] = i;
}
An implementation may require that user-defined attributes be defined early (in compiler
blocks or using the compile attribute).
Each definition has a particular static and dynamic extent. The static extent of a definition is the region of source code where the definition is visible. The dynamic extent is the time interval during which the defined constant, variable, function, or class may be accessed.
The rules for determining the extent of a definition differ depending on whether the defined entity is a class member or not.
The static extent of a definition D is specified by its visibility attribute, which designates a scope (or set of scopes) A where the definition is visible. If there is a subscope B in A that defines an entity E with the same name as D and the definition E is actually executed, then the inner definition E shadows the outer one and definition D is not visible inside B.
In general, the dynamic extent of a definition D begins when the definition is executed and ends when its static extent scope is exited. There are a couple of exceptions to this rule for compatibility with JavaScript 1.5:
function definitions at the top level of a scope have a dynamic extent which includes the entire scope.var definitions without a type or attributes have a dynamic extent which includes the entire scope.Situations may arise where an inner definition will shadow an outer definition but the inner definition's static extent
has not yet begun. In the example below, function f shadows the global b but tries to access the
inner b before its dynamic extent begins (at the time the const b:Integer = 8 statement
is executed). This is illegal, but an implementation is not required to diagnose such an error (which may be difficult, especially
if the inner b is defined conditionally). The effects of executing such a program are undefined.
const b:Integer = 7;
function f():Integer {
function g():Integer {return b}
var a = g();
const b:Integer = 8;
return g() - a;
}
In general, it is not legal to define the same entity twice within a scope A without exiting A in the interim. There are a couple of exceptions:
var or const definition may be executed repeatedly. Here "the same" means
that the definition is in the same location in the source code, which can happen if the definition is located inside a
loop. Moreover, the definition's type, if any, must not change each time the definition is executed, and, if the definition
is of a const, then its value may not change either.var definitions without a type or attributes may be executed repeatedly on the same variable.In the example below the comments indicate the scope and visibility of each definition:
var a0; // Package-visible global variable
local var a1; // Package-visible global variable
private var a2 = true; // Package-visible global variable
package var a3; // Package-visible global variable
public var a4; // Public global variable
if (a1) {
var b0; // Package-visible global variable
local var b1; // Local to this block
private var b2; // Package-visible global variable
package var b3; // Package-visible global variable
public var b4; // Public global variable
}
public function F() { // Public global function
var c0; // Local to this function
local var c1; // Local to this function
private var c2; // Package-visible global variable
package var c3; // Package-visible global variable
public var c4; // Public global variable
}
function G() { // Package-visible global function
var d0; // Never defined because G isn't called
private var d1; // Never defined because G isn't called
package var d2; // Never defined because G isn't called
public var d3; // Never defined because G isn't called
}
class C { // Package-visible global class
var e0; // Package-visible class instance variable
private var e1; // Class-visible class instance variable
package var e2; // Package-visible class instance variable
public var e3; // Public class instance variable
static var e4; // Package-visible class-global variable
private static var e5;// Class-visible class-global variable
package static var e6;// Package-visible class-global variable
public static var e7; // Public class-global variable
local var e8; // Local to class C's block
function H() { // Package-visible class function
var f0; // Local to this function
private var f1; // Class-visible class variable
package var f2; // Package-visible class variable
public var f3; // Public class variable
}
public function I() {}// Public class method
H();
}
F();
A static subset of JavaScript 2.0 may disallow definitions inside a function F that define entities in a scope
outside F. This would disallow functions F, G, and H above.
Should we have a protected Visibility? It has been omitted
for now to keep the language simple, but there does not appear to be any fundamental reason why it could not be supported.
If we do support it, should we choose the C++ protected concept (visible only in class and subclasses) or the
Java protected concept (visible in class, subclasses, and the original class's package)?
|
JavaScript 2.0
Core Language
Variables
|
Wednesday, February 16, 2000
A variable defined with var can be modified, while one defined with const
is read-only. Identifier is the name of the variable
and TypeExpression is its type. Identifier
can be any non-reserved identifier. TypeExpression
is evaluated at the time the variable definition is evaluated and should evaluate to a type t.
If provided, AssignmentExpression gives
the variable's initial value v. If AssignmentExpression
is not provided in a var definition, then undefined is assumed; if undefined
cannot be coerced to type t then any attempt to read the variable
prior to writing a valid value into it will result in an error. AssignmentExpression
is evaluated just after the TypeExpression is
evaluated. The value v is then coerced to the variable's type t and stored in the variable. If the variable
is defined using var, any values subsequently assigned to the variable are also coerced
to type t at the time of each such assignment.
Multiple variables separated by commas can be defined in the same VariableDefinition. The values of earlier variables are available in the TypeExpressions and AssignmentExpressions of later variables.
If omitted, TypeExpression defaults to type
any. Thus, the definition
var a, b=3, c:Integer=7, d, e:Type=Boolean, f:Number, g:e, h:int;
is equivalent to:
var a:any=undefined; var b:any=3; var c:Integer=7; var d:Integer=undefined; // coerced to NaN var e:Type=Boolean; var f:Number=undefined; // coerced to NaN var g:Boolean=undefined; // coerced to false var h:int=undefined; // coerced to int(0)
const means that Identifier
cannot be written after its value is set. Its value can be set by an AssignmentExpression
if one is provided. If one is not provided then the constant can be written exactly once using a regular assignment statement;
any attempt to read the constant prior to writing its value will result in an error. For example:
const c:Integer;
function f(x) {return x+c}
f(3); // error: c's value is not defined
c = 5;
f(3); // returns 8
c = 5; // error: redefining c
Just like any other definition, a constant may be rebound after leaving its scope. For example, the following is legal;
j is local to the block, so a new j binding is created each time through the loop:
var k = 0;
for (var i = 0; i < 10; i++) {
local const j = i;
k += j;
}
|
JavaScript 2.0
Core Language
Functions
|
Friday, February 11, 2000
get [no line break] Identifierset [no line break] Identifiernew [no line break] Identifiernew...To define a function we use the following syntax:
function [get | set] Identifier ( Parameters ) [: TypeExpression] BlockIf Visibility is absent, the above declaration defines a local function within the current Block scope. If Visibility is present, the above declaration declares either a global function (if outside a ClassDefinition's Block) or a class function (if inside a ClassDefinition's Block) according to the declaration scope rules.
The function's result type is TypeExpression, which defaults to type Any if not
given. If the function does not return a value, it's good practice to set TypeExpression to void
to document this fact.
Block contains the function body and is evaluated only when the function is called.
Parameters has one of the following forms:
, ... , RequiredParameter [, OptionalParameter ... , OptionalParameter] [, ... [Identifier]]... [Identifier]If the ... is present, the function accepts more arguments than just the listed parameters.
If an Identifier is given after the ..., then that Identifier
is bound to an array of arguments given after the listed parameters. That Identifier is
declared locally as though by the declaration const array Identifier.
Individual parameters have the forms:
: TypeExpression]: TypeExpression] = AssignmentExpressionTypeExpression gives the parameter's type and defaults to type Any. If the parameter
name Identifier is followed by a =, then that parameter is
optional. If the nth parameter is optional and a call to this function provides fewer than n arguments,
then the nth parameter is set to the value of its AssignmentExpression, coerced to
the nth parameter's type if necessary. The nth parameter's AssignmentExpression
is evaluated only if fewer than n arguments are given in a call.
A RequiredParameter may not follow an OptionalParameter. If a
function has n RequiredParameters and m OptionalParameters
and no ... in its parameter list, then any call of that function must supply at least
n arguments and at most n+m arguments. If this function has a ...
in its parameter list, then any call of that function must supply at least n arguments. These restrictions do not
apply to traditional functions.
The parameters' Identifiers are local variables with types given by the corresponding TypeExpressions inside the function's Block. Code in the Block may read and write these variables. Arguments are passed by value, so writes to these variables do not affect the passed arguments' values in the caller.
In addition to local variables generated by the parameters' Identifiers, each function also
has a predefined arguments local variable which holds an array (of type const array) of all
arguments passed to this function.
When a function is called, the following list indicates the order of evaluation of the various expressions in a FunctionDefinition. These steps are taken only after all of the arguments have been evaluated.
...
followed by an Identifier, bind that Identifier to an array
comprised of the zero or more leftover arguments not already bound to a parameter.Note that later TypeExpressions and AssignmentExpressions can refer to previously bound arguments. Thus, the following is legal:
function choice(boolean a, type b, b c, b d=) b {
return a ? c : d;
}
The call choice(true,integer,8,4) would return 8, while choice(false,integer,6) would return
0 (undefined coerced to type integer).
Unless the function is a traditional function, the function definition using the above
syntax does not define a class; the function's name cannot be used in a new expression, and the function
does not have a this parameter. Any attempt to use this inside the function's body is an error.
To define a method that can access this, use the method
keyword.
If a FunctionDefinition is located at a class scope (either because it is located the top
level of a ClassDefinition's Block
or it has a Visibility prefix and is located inside a ClassDefinition's
Block), then the function is a static
method of the class. Unlike C++ or Java, JavaScript 2.0 does not use the static keyword to indicate such functions;
instead, instance methods (i.e. non-static methods) are defined using the method
keyword.
If a FunctionDefinition contains the keyword get or set,
then the defined function is a getter or a setter.
A getter must not take any parameters and cannot have a ... in its Parameters
list. Unlike an ordinary function, a getter is invoked by merely mentioning its name without an Arguments
list in any expression except as the destination of an assignment. For example, the following code returns the string “<2,3,1>”:
var x:integer = 0;
function get serialNumber():integer {return ++x}
var y = serialNumber;
return "<" + serialNumber + "," + serialNumber + "," + y + ">";
A setter must take exactly one required parameter and cannot have a ... in its Parameters
list. Unlike an ordinary function, a setter is invoked by merely mentioning its name (without an Arguments
list) on the left side of an assignment or as the target of a mutator such as ++ or --. The result
of the setter becomes the result of the assignment. For example, the following code returns the string “<1,2,43>”:
var x:integer = 0;
function get serialNumber():integer {return ++x}
function set serialNumber(n:integer):integer {return x=n}
var s = "<" + serialNumber + "," + serialNumber;
serialNumber = 42;
return s + "," + serialNumber + ">";
A setter can have the same name as a getter in the same lexical scope. A getter or setter cannot be extracted from its variable, so the notion of the type of a getter or setter is vacuous; a getter or setter can only be called.
Contrast the following:
var x:integer = 0;
function f():integer {return ++x}
function g():Function {return f}
function get h():Function {return f}
f; // Evaluates to function f
g; // Evaluates to function g
h; // Evaluates to function f (not h)
f(); // Evaluates to 1
g(); // Evaluates to function f
h(); // Evaluates to 2
g()(); // Evaluates to 3
We can use a getter and a setter to create an alias to another variable, as in:
function get myAlias() {return Pkg::var}
function set myAlias(x) {return Pkg::var = x}
myAlias = myAlias+4;
Traditional function definitions are provided for compatibility with JavaScript 1.5. The syntax is as follows:
traditional function Identifier ( Identifier , ... , Identifier ) BlockA function declared with the traditional keyword cannot have any argument or result
type declarations, optional arguments, or getter or setter
keyword. Such a function is treated as though every argument were optional and more arguments than just the listed ones were
allowed. Thus, the definition
traditional function Identifier ( Identifier , ... , Identifier ) Block
behaves like the following function definition:
function Identifier ( Identifier = , ... , Identifier = , ... ) Block
Furthermore, a traditional function defines its own class and treats this in the same manner as JavaScript
1.5.
Every function (except a getter or a setter) is also a value and has type Function. Like other values, it can
be stored in a variable, passed as an argument, and returned as a result. The identifiers in a function are all lexically
scoped.
We can use a variant of a function definition to define a function inside an expression. The syntax is:
function [Identifier] ( Parameters ) [: TypeExpression] BlockThis expression defines a function and returns it as a value of type Function. The function can be named by
providing the Identifier, but this name is only accessible from inside the function's Block.
To avoid confusion between a FunctionDefinition and a FunctionExpression, a Statement (and a few other grammar nonterminals) may not begin with a FunctionExpression. To place a FunctionExpression at the beginning of a Statement, enclose it in parentheses.
A FunctionDefinition is merely convenient syntax for a const variable definition
and a FunctionExpression:
[Visibility] function Identifier ( Parameters ) [: TypeExpression] Block
is equivalent to:
[Visibility] const Identifier : Function = function Identifier ( Parameters ) [: TypeExpression] Block ;
Unless a function is a getter or a setter, we call that function by listing its arguments in parentheses after the function expression, just as in JavaScript 1.5:
( AssignmentExpression , ... , AssignmentExpression )By consensus in the ECMA TC39 modularity subcommittee, we decided to use the above syntax for getters and setters instead of:
getter | setter] function Identifier ( Parameters ) [: TypeExpression] BlockThe decision was based on aesthetics; neither syntax is more difficult to implement than the other.
Do we want to have a named rest parameter (as in the proposal above), or only support the arguments
special local variable as in JavaScript 1.5? The main difference is in the handling of fixed arguments -- they must be added
to the arguments array but can be omitted from the rest array.
The traditional keyword is ugly, so let's take a look at some alternatives. Unless we want to continue to
make each function into a class (as JavaScript 1.5 does), we need some way to indicate which functions are also classes
and which ones are not. Also, we'd like to be able to indicate which functions can be called with more or fewer than the
desired number of arguments and which cannot.
One possibility would be to state that any function that uses a type annotation in its signature (either the parameter
list or the result type) is a new-style function and does not define a class; other functions would declare classes. Furthermore,
new-style functions would have to be called with the exact number of arguments unless some parameters are optional or a
... is present in the parameter list. These are analogous to the rules that ANSI C used to distinguish new-style
functions from traditional C functions. As with ANSI C, we have somewhat of a difficulty with functions that take no parameters;
such functions would need to specify a return type to be considered new-style.
C++ did away with the ANSI C treatment of traditional C functions. We could do the same by having a pragma (analogous
to Perl's use pragmas) that could indicate that all functions are to be considered new-style unless prefixed
by the traditional keyword. If we do this, we should decide whether the default setting of this pragma would
be on or off.
|
JavaScript 2.0
Core Language
Classes
|
Monday, February 14, 2000
In JavaScript 2.0 we define classes using the class keyword. Limited classes can also
be defined via JavaScript 1.5-style functions, but doing so is discouraged
for new code.
class Identifier [extends TypeExpression] Blockclass extends TypeExpression BlockThe first format declares a class with the name Identifier, binding Identifier
to this class in the scope specified by the Visibility
prefix (which usually includes the ClassDefinition's Block). Identifier
is a constant variable with type type and can be used anywhere a type expression is allowed.
When the first ClassDefinition format is evaluated, the following steps take place:
extends TypeExpression is given, TypeExpression
is evaluated to obtain a type s, which must be another class. If extends
TypeExpression is absent, type s defaults to the class Object.const,
var, function, constructor, and class declarations evaluated at its
top level (or placed at its top level by the scope rules) become class
members of type t. All field and method declarations evaluated at the Block's
top level (or placed at its top level by the scope rules) become instance
members of type t.A ClassDefinition's Block is evaluated just like any other Block, so it can contain expressions, statements, loops, etc. Such statements that do not contain declarations do not contribute members to the class being declared, but they are evaluated when the class is declared.
If a ClassDefinition omits the class name Identifier, it extends
the original class rather than creating a subclass. A class extension may define new methods and class constants and variables,
but it does not have special privileges in accessing the original class definition's private members (or package
members if in a separate package). A class extension may not override methods, and it may not define constructors or instance
variables.
Each instance of the original class is automatically also an instance of the extended class. Several extensions can apply to the same class.
An extension is useful to add methods to system classes, as in the following code in some user package P:
class extends string {
public method scramble() string {...}
public method unscramble() string {...}
}
var x = "abc".scramble();
Once the class extension is evaluated, methods scramble and unscramble become available on all
strings. There is no possibility of name clashes with extensions of class string in other, unrelated packages
because the names scramble and unscramble belong to package P and not the system package
that defines string. Any packages that import package P will also be able to call scramble
and unscramble on strings, but other packages will not.
A class has an associated set of class members and another set of instance members. Class members are properties of the class itself, while instance members are properties of each instance object of this class and have independent values for different instance objects.
Class members are one of the following:
const keyword.var keyword.function keyword.constructor keyword.class keyword.Instance members are one of the following:
field keyword.method keyword.Members can only be defined within the intersection of the lexical and dynamic extent of a ClassDefinition's Block. A few examples illustrate this rule.
The code
var bool extended = false;
function callIt(x) {return x()}
class C {
extended = true;
public function square(integer x) integer {return x*x}
if (extended) {
public function cube(integer x) integer {return x*x*x}
} else {
public function reciprocal(number x) number {return 1/x}
}
field string firstName, lastName;
method name() string {return firstName + lastName}
public function genMethod(boolean b) {
if (b) {
public field time = 0;
} else {
public field date = 0;
}
}
genMethod(true);
}
defines class C with members square (a class function), cube (a class function),
firstName (an instance variable), lastName (an instance variable), name (an instance
method), and genMethod (a class function).
On the other hand, executing the following code after the above example would be illegal due to three different errors:
genMethod(false); // Field date declared outside of C's block's dynamic extent
public field color; // Field declared outside a class's block
function genField() {
public field style;
}
class D {
genField(); // Field style declared outside D's block's lexical extent
}
While a ClassDefinition's Block is being evaluated, the already defined class members (other than constructors) are visible and usable by the code in that Block. Afterwards members can be accessed in one of several ways:
package or omitted), or anywhere within the current package or any package that imports the appropriate
version of the current package (if a member's Visibility is public) can access
class members by using the . operator on the class.package or omitted), or anywhere within the current package or any package that imports the appropriate
version of the current package (if a member's Visibility is public) can access
instance members by using the . operator on any of the class's instances.A subclass inherits all members except constructors from its superclass. Class variables have only one global value, not one value per subclass. A subclass may override visible methods, but it may not override or shadow any other visible members. On the other hand, imports and versioning can hide members' names from some or all users in importing packages, including subclasses in importing packages.
We have already seen the definition syntax for variables and constants, functions, and classes. Any of these defined at a ClassDefinition's Block's top level (or placed at its top level by the scope rules) become class members of the class.
Fields, methods, and constructor definitions have their own syntax described below. These definitions must be lexically enclosed by a ClassDefinition's Block.
field Identifier [: TypeExpression] [= AssignmentExpression] , ... , Identifier [: TypeExpression] [= AssignmentExpression] ;A FieldDefinition is similar to a VariableDefinition except that it defines an instance variable of the lexically enclosing class. Each new instance of the class contains a new, independent set of instance variables initialized to the values given by the AssignmentExpressions in the FieldDefinition.
Identifier is the name of the instance variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression is evaluated at the time the variable definition is evaluated and should evaluate to a type t. The TypeExpressions and AssignmentExpressions are evaluated once, at the time the FieldDefinition is evaluated, rather than every time an instance of the class is constructed; their values are saved for use in constructors.
If omitted, TypeExpression defaults to type any.
If provided, AssignmentExpression gives the instance variable's initial value v.
If not, undefined is assumed; an error occurs if undefined cannot be coerced
to type t. AssignmentExpression is evaluated just after the TypeExpression
is evaluated. The value v is then coerced to the variable's type t and stored in the instance variable.
Any values subsequently assigned to the instance variable are also coerced to type t at the time of each such assignment.
Multiple instance variables separated by commas can be defined in the same FieldDefinition.
A field cannot be overridden in a subclass.
final] [override] method [get | set] Identifier ( Parameters ) [: TypeExpression] Blockfinal] [override] method [get | set] Identifier ( Parameters ) [: TypeExpression] ;A MethodDefinition is similar to a FunctionDefinition except that it defines an instance method of the lexically enclosing class. Parameters, the result TypeExpression, and the body Block behave just like for function definitions, with the following differences:
this that refers to the instance object of the method's class on
which the method was called.. operator produces a function (more specifically, a closure) that is already dispatched and has this
bound to the left operand of the . operator.traditional syntax
for methods. Optional parameters must be specified explicitly.We call a regular method by combining the . operator with a function call. For example:
class C { returns
field x:integer = 3;
method m() {return x}
method n(x) {return x+4}
}
var c = new C;
c.m(); //3 returns
c.n(7); //11
var f:Function = c.m; //f is a zero-argument function with this
bound to c
returns
f(); //3
returns
c.x = 8;
f(); //8
A class c may override a method m defined in its superclass s. To do this, c
should define a method m' with the same name as m and use the override
keyword in the definition of m'. Overriding a method without using the override
keyword or using the override keyword when not overriding a method results in a warning
intended to catch misspelled method names. The warning is not an error to allow subclass c to either define a method
if it is not present in s or override it if it is present in s -- this situation can arise when s
is imported from a different package and provides several versions.
The overriding method m' does not have to have the same number or type of parameters as the overridden method m. In fact, since parameter types can be arbitrary expressions and are evaluated only during a call, checking for parameter type compatibility when the overriding method m is declared would require solving the halting problem. Moreover, defining overriding methods that are more general than overridden methods is useful.
A method defined with the final keyword cannot
be overridden (or further overridden) in subclasses.
If a MethodDefinition contains the keyword get or set,
then the defined method is a getter or a setter. These are analogous to getter
and setter functions in that they are invoked without listing the parentheses after the method name.
A getter or setter method cannot be overridden. We could relax this restriction, but then we'd also
have to allow overriding of fields by getters, setters, or other fields, and, as a corollary, allow fields to be declared
final.
constructor Identifier ( Parameters ) BlockA constructor is a class function that creates a new instance of the lexically enclosing class c. A constructor's
body Block is required to call one of c's superclass's constructors (when
and how?). Afterwards it may access the instance object under construction via the this local variable.
A constructor should not return a value with a return statement; the newly created object is returned automatically.
A constructor can have any non-reserved name, in which case we would invoke it as though it were a class function. In addition,
a constructor's Identifier can have the special name new, in which case we invoke
it using the new prefix operator syntax as in JavaScript 1.5.
|
JavaScript 2.0
Core Language
Packages
|
Wednesday, February 16, 2000
Packages are an abstraction mechanism for grouping and distributing related code. Packages are designed to be linked at run time to allow a program to take advantage of packages written elsewhere or provided by the embedding environment. JavaScript 2.0 offers a number of facilities to make packages robust for dynamic linking:
A package is a file (or analogous container) of JavaScript 2.0 code. There is no specific JavaScript statement that introduces or names a package -- every file is presumed to be a package. A package itself has no name, but it has a specific URI by which other packages can import it.
A package P typically starts with import statements that import other packages used by package
P. A package that is meant to be used by other packages typically has one or more version
declarations that declare versions available for export.
A package's body is described by the Program grammar nonterminal. A package is loaded (its body is evaluated) when the package is first imported or invoked directly (if, for example, the package is on an HTML web page). Some standard packages may also be loaded when the JavaScript engine first starts up.
Two attempts to load the same package in the same environment result in sharing of that package. What constitutes an environment is necessarily application-dependent. However, if package P1 loads packages P2 and P3, both of which load package P4, then P4 is loaded only once and thereafter its code and data is shared by P2 and P3.
When a package is loaded, all of its statements are evaluated in order, which may cause other packages to be loaded along
the way when import statements are encountered. A package's symbols are available for export to other packages
only after the package's body has been successfully evaluated. Unlike in Java, circularities are not allowed in the graph
of package imports.
To create packages A and B that access each others' symbols, we need to instead define a hidden package C that consists of all of the code that would have gone into A and B. Package C should define versions verA and verB and tag the symbols it exports with either verA or verB to indicate whether these symbols belong in package A or B. Package A should then be empty except for a directive (or several directives if there are multiple versions of A and verA) that reexports C's symbols tagged with verA. Similarly, package B should reexport C's symbols tagged with verB. To make this work we need a reexport directive. Is this really necessary? Also, do we want a mechanism for hiding package C from general view so that users can only use it through A or B?
We can export a symbol in a package by giving it public
visibility.
To import symbols from a package we use the import statement:
import ImportList ;import ImportList Blockimport ImportList Block else CodeStatement, ... , ImportItemprotected] Identifier =] NonAssignmentExpression [: Version]The first form of the import statement (without a Block) imports symbols into
the current lexical scope. The second and third forms import symbols into the lexical scope of the Block.
If the imports are unsuccessful, the first two forms of the import statement throw an exception, while the last
form executes the CodeStatement after the else keyword.
An import statement can import one or more packages separated by commas. Each ImportItem
specifies one package to be imported. The NonAssignmentExpression should evaluate to a string
that contains a URI where the package may be found. If present, Version indicates the version
of the package's exports to be imported; if not present, Version defaults to version 1.
An ImportItem can introduce a name for the imported package if the NonAssignmentExpression
is preceded by Identifier =. Identifier
becomes bound (either in the current lexical scope or in the Block's scope) to the imported package
as a whole. Individual symbols can be extracted from the package by using Identifier with the
:: operator. For example, if package at URI P has public symbols
a and b, then after the statement
import x=P;
P's symbols can be referenced as either a, b, x::a, or x::b.
If an ImportItem contains the keyword protected, then
the imported symbols can only be accessed using the :: operator. If we were to import
package P using
import protected x=P;
then we'd have to access P's symbols using either x::a or x::b.
If two imports in the same scope import packages with clashing symbols, then neither symbol is accessible unless qualified
using the :: operator. If an imported symbol clashes with a symbol declared in the same
scope, then the declared symbol shadows the imported symbol. Scope rules 3 and
4 apply here as well, so the following code is illegal because a is referenced and then redefined:
import x=P; References P's
var y=a; //a Redefines
const a=17; //a in same scope
Version names cannot be imported.
Do we want to use URIs to locate packages, or do we want to invent our own, separate mechanism to do this?
Should we make private illegal outside a class rather than making it equivalent to
package?
Should we introduce a local Visibility prefix that explicitly
means that the declaration is visible locally? This wouldn't provide any additional functionality but it would provide a
convenient name for talking about the four kinds of visibility prefixes.
What should the default visibilities be? The current defaults are loosely modeled after Java:
| Definition Location | Default visibility |
|---|---|
| Package top level | package (equivalent to local in this case) |
| Inside a statement outside a function or class | local |
| Function or method code's top level | local |
| Inside a statement inside a function or method | local |
| Class declaration block's top level | package |
| Inside a statement inside a class declaration block | local |
|
JavaScript 2.0
Core Language
Language Declarations
|
Friday, February 11, 2000
Language declarations allow a script writer to select the language to use for a script or a particular section of a script. A language denotes either a major language such as JavaScript 2.0 or a variation such as strict mode.
Developers often find it desirable to be able to write a single script that takes advantage of the latest features in a host environment such as a browser while at the same time working in older host environments that do not support these features. JavaScript 2.0's language declarations enable one to easily write such scripts. One may still need to use techniques such as the LANGUAGE HTML attribute to support pre-JavaScript 2.0 environments, but at least the number of such environments that will need to be special-cased will not increase in the future.
Language declarations are a dual of versioning: language declarations let a script run under a variety of historical hosts, while versioning lets a host run a variety of historical scripts.
;;A language declaration uses the syntax above. The keyword language is followed by one
or more language alternatives separated by vertical bars. Each language alternative consists of one or more identifiers or
numbers (language identifiers), except that, if there is more than one language alternative, the last one may be empty. The
semicolon at the end of the LanguageDeclaration cannot be
inserted by line-break semicolon insertion.
When a JavaScript environment is lexing and parsing a JavaScript program and it encounters a language
declaration, it checks whether any of the language alternatives can be satisfied. If at least one can, the environment picks
the first language alternative that can be satisfied and processes the rest of the containing block (until the closing }
or until the end of the program if at the top level) using that language. A subsequent language
declaration in the same block can further change the language.
If no language alternatives can be satisfied, then the JavaScript environment skips to the end of the containing block
(until the closing matching } or until the end of the program if at the top level). Further
language declarations in the same block are ignored. No error occurs unless the failing
language declaration is executed as a statement, in which case it throws a syntax error.
[See rationale for a discussion of some of the issues here.]
The following language identifiers are currently defined:
| Language Identifier | Language |
|---|---|
1.0 |
JavaScript 1.0 |
1.1 |
JavaScript 1.1 |
1.2 |
JavaScript 1.2 |
1.3 |
JavaScript 1.3 |
1.4 |
JavaScript 1.4 |
1.5 |
JavaScript 1.5 (ECMAScript Edition 3) |
2.0 |
JavaScript 2.0 |
strict |
Strict mode |
traditional |
Traditional mode (default) |
It is meaningless to combine two or more numeric language identifiers in the same alternative:
language 1.0 2.0;
will always fail. On the other hand, it is meaningful and useful to separate them with vertical bars. For example, one can indicate that one prefers JavaScript 2.1 but is willing to accept JavaScript 2.0 if 2.1 is not available:
language 2.1 | 2.0;
An empty alternative will always succeed. One can use it to indicate a preference for strict mode but willingness to work without it:
language strict |;
Language declarations are always lexically scoped and never extend past the end of the enclosing block.
This document specifies the 2.0 language and its strict and traditional modes. The consequences
of mixing in other languages are implementation-defined, but implementations are encouraged to do something reasonable.
Many parts of JavaScript 2.0 are relaxed or unduly convoluted due to compatibility requirements with JavaScript 1.5. Strict mode sacrifices some of this compatibility for simplicity and additional error checking. Strict mode is intended to be used in newly written JavaScript 2.0 programs, although existing JavaScript 1.5 programs may be retrofitted.
The opposite of strict mode is traditional mode, which is the default. A program can readily mix strict and traditional portions.
Strict mode has the following effects:
traditional
functions and in functions that explicitly allow a variable number of arguments. (The mode of the call site does not matter.)See also rationale.
|
JavaScript 2.0
Libraries
|
Thursday, November 11, 1999
This chapter presents the libraries that accompany the core language.
For the time being, only the libraries new to JavaScript 2.0 are described. The basic libraries such as String,
Array, etc. carry over from JavaScript 1.5.
|
JavaScript 2.0
Libraries
Types
|
Wednesday, February 16, 2000
The following types are predefined in JavaScript 2.0:
| Type | Set of Values | Coercions |
|---|---|---|
none |
No values | None |
void |
undefined |
Any value undefined |
Null |
null |
undefined null |
Boolean |
true and false |
undefined false |
Integer |
Double-precision IEEE floating-point numbers that are mathematical integers, including positive and negative zeroes, infinities, and NaN | undefined NaN |
Number |
Double-precision IEEE floating-point numbers, including positive and negative zeroes, infinities, and NaN | undefined NaN |
Character |
Single 16-bit unicode characters | None |
String |
Immutable strings of unicode characters | undefined "" |
Function |
All functions | None |
Array |
All arrays | undefined [] |
Type |
All types | undefined any |
any |
All values | None |
Unlike in JavaScript 1.5, there is no distinction between objects and primitive values. All values can have methods. Some values can be sealed, which disallows addition of ad-hoc properties. User-defined classes can be made to behave like primitives.
The above type names are not reserved words. They are considered to be defined in a scope that encloses a package's global scope, so a package could use these type names as identifiers. However, defining these identifiers for other uses might be confusing because it would shadow the corresponding type names (the types themselves would continue to exist, but they could not be accessed by name).
any is the supertype of all types. none is the subtype of all types. none is useful
to describe the return type of a function that cannot return normally because it either falls into an infinite loop or always
throws an exception. void is useful to describe the return type of a function that can return but that does not
produce a useful value.
A literal number is a member of the type Number; if that literal has an integral value, then it is also a
member of type Integer. A literal string is a member of the type String; if that literal has exactly
one 16-bit unicode character, then it is also a member of type Character.
We can use the following operators to construct more complex types. t is a type expression and u is a value expression in the table below.
| Type | Values | Coercions |
|---|---|---|
+ t |
null or any value belonging to type t |
null null; undefined
null (if undefined is not a
member of t); any other coercions already defined for t |
~ t |
undefined or any value belonging to type t |
undefined undefined;
any other coercions already defined for t |
singleton(u) |
Only the value u | None |
The language cannot syntactically distinguish type expressions from value expressions, so a type expression can also use
any other value operators such as !, |, and . (member access). Except for parentheses,
most of them are not very useful, though. See also the type
expression syntax rationale for other possible type constructors.
Any class defined using the class declaration is also a type that denotes the set of all of its and its descendants'
instances. These include the predefined classes, so Object, Date, etc. are all types. null
is not an instance of a user-defined class.
Types are generally used to restrict the set of objects that can be held in a variable or passed as a function argument. For example, the declaration
var x:Integer;
restricts the values that can be held in variable x to be integers.
Type declarations use the Pascal-style colon syntax. See the type declaration syntax rationale for an alternative.
A type declaration does not affect the semantics of reading the variable or accessing one of its members. Thus, as long
as expression new MyType() returns a value of type MyType, the following two code snippets are equivalent:
var x:MyType = new MyType(); x.foo();
var x = new MyType(); x.foo();
This equivalence always holds, even if these snippets are inside the declaration of class MyType and foo
is a private field of that class. As a corollary, adding true type annotations does not change the meaning of a program.
A type is also a value (whose type is type) and can be used in expressions, assigned to variables, passed
to functions, etc. For example, the code
const Z:Type = Integer;
function abs_val(i:Z):Z {
return i<0 ? -i : i;
}
is equivalent to:
function abs_val(i:Integer):Integer {
return i<0 ? -i : i;
}
As another example, the following method takes a type and returns an instance of that type:
method QueryInterface(T:Type):T { ... }
Coercions can take place in the following situations:
@t operator.In any of these cases, if v t, then v is passed unchanged. If v t, then if t defines a mapping for value v then that mapped v is used; otherwise an error occurs.
@ OperatorOne can explicitly request a coercion in an expression by using the @ operator. This operator has the same
precedence as . and coerces its left operand to the right operand, which must be a type. ... v@t ...
can be used in an expression and has the same effect as:
function coerce_to_t(a:t):t {return a} //
Declared at the top level
... coerce_to_t(v) ...
assuming that coerce_to_t is an identifier not used anywhere else. The @
operator is useful as a type assertion as in w@Window. It's a postfix operator to simplify cascading expressions:
w@Window.child@Window.pos
is equivalent to:
(((w@Window).child)@Window).pos
A type cast performs more aggressive transformations than a type coercion. To cast a value to a given type, we use the type as a function, passing it the value as an argument:
type(value)
For example, Integer(258.1) returns the integer 258, and String(2+2==4) returns
the string "true".
If value is already a member of type, the type cast returns value unchanged. If value can be coerced to type, the type cast returns the result of the coercion. Otherwise, the effect of a type cast depends on type.
Need to specify the semantics of type casts. They are intended to mimic the current ToNumber, ToString, etc. methods.
|
JavaScript 2.0
Libraries
Versions
|
Tuesday, February 15, 2000
As a package evolves over time it often becomes necessary to change its exported interface. Most of these changes involve adding symbols (global and class members), although occasionally a symbol may be deleted or renamed. In a monolithic environment where all JavaScript source code comes preassembled from the same source, this is not a problem. On the other hand, if packages are dynamically linked from several sources then versioning problems are likely to arise.
One of the most common avoidable problems is collision of symbols. Unless we solve this problem, an author of a library will not be able to add even one symbol in a future version of his library because that symbol could already be in use by some client or some other library that a client also links with. This problem occurs both in the global namespace and in the namespaces within classes from which clients are allowed to inherit.
Here's an example of how such a collision can arise. Suppose that a library provider creates a library called BitTracker
that exports a class Data. This library becomes so successful that it is bundled with all web browsers produced
by the BrowsersRUs company:
package BitTracker;
public class Data {
public field author;
public field contents;
function save() {...}
};
function store(d) {
...
storeOnFastDisk(d);
}
Now someone else writes a web page W that takes advantage of BitTracker. The class Picture
derives from Data and adds, among other things, a method called size that returns the dimensions
of the picture:
import BitTracker;
class Picture extends Data {
public method size() {...}
field palette;
};
function orientation(d) {
if (d.size().h >= d.size().v)
return "Landscape";
else
return "Portrait";
}
The author of the BitTracker library, who hasn't seen W, decides in response to customer requests
to add a method called size that returns the number of bytes of data in a Data object. He then releases
the new and improved BitTracker library. BrowsersRUs includes this library with its latest NavigatorForInternetComputing
17.0 browser:
package BitTracker;
public class Data {
public field author;
public field contents;
public method size() {...}
function save() {...}
};
function store(d) {
...
if (d.size() > limit)
storeOnSlowDisk(d);
else
storeOnFastDisk(d);
}
An unsuspecting user U upgrades his old BrowsersRUs browser to the latest NavigatorForInternetComputing 17.0
browser and a week later is dismayed to find that page W doesn't work anymore. U's granddaughter Alyssa
P. Hacker tries to explain to U that he's experiencing a name conflict on the size methods, but U
has no idea what she is talking about. U attempts to contact the author of W, but she has moved on to
other pursuits and is on a self-discovery mission to sub-Saharan Africa. Now U is steaming at BrowsersRUs, which
in turn is pointing its finger at the author of BitTracker.
How could the author of BitTracker have avoided this problem? Simply choosing a name other than size
wouldn't work, because there could be some other page W2 that conflicts with the new name. There are several possible
approaches:
com_netscape_length
method while MIT's objects used the edu_mit_length method.The last approach appears to be the most desirable because it places the smallest burden on casual users of the language, who merely have to import the packages they use and supply the current version numbers in the import statements. A package author has to be careful not to disturb the set of visible prior-version symbols when releasing an updated package, but authors of dynamically linkable packages are assumed to be more sophisticated users of the language and could be supplied with tools to automatically check updated packages' consistency.
The versioning system in JavaScript 2.0 only affects exports of symbols. The concept of a version does not apply to a package's internal code; it is up to package developers to ensure that newer releases of their packages continue to behave compatibly with older ones.
A version describes the API of a package. A release refers to the entirety of a package, including its code. One release can export many versions of its API. A package developer should make sure that multiple releases of a package that export version V export exactly the same set of symbols in version V.
As an example, suppose that a developer wrote a sorting package P with functions sort and merge
that called bubble sort in version "1.0". In the next release the developer adds a function called
stablesort and includes it in version "2.0". In a subsequent release the developer changes
the sort algorithm to a quicksort that calls stablesort as a subroutine. That last release of the
package might look like:
compile {
const V1_0 = Version("1.0",""); // The "" makes version "1.0" be the default
const V2_0 = Version("2.0","1.0");
}
public var serialNumber;
public function sort(compare: Function, array: Array):Array {...}
public function merge(compare: Function, array1: Array, array2: Array):Array {...}
V2_0 function stablesort(compare: Function, array: Array):Array {...}
Suppose, further, that client package C1 imports version "1.0" of P, client
package C2 simultaneously imports version "2.0" of P, and a search for P
yields the latest release described above. There would be only one instance of P running -- the latest release.
Both clients would get the same sort and merge functions, and both would see the same serialNumber
variable (in particular, if client C1 wrote to serialNumber, then client C2 would see the
updated value), but only client package C2 would see the stablesort function. Both clients would get
the quicksort release of sort. If client package C1 defined its own stablesort function,
then that function would not conflict with P's stablesort; furthermore, P's sort
would still refer to P's stablesort in its internal subroutine call.
Had only the first release of P been available, client package C2 would obtain an error because version
2 of P's API would not be available. Client C1 could run normally, although the sort function
it calls would use bubble sort instead of the quicksort.
Note that the last release of P did not change the API so it did not need a new version. Of course, it could define a new version if for some reason it wanted clients to be able to demand the last release of P even though its API is the same as the second release.
A version name Version is a quoted string literal such as "1.2" or
"Private Interface 2.0". Two version names are equal if their strings are equal. A special version
whose name is the empty string "" is called the default version.
A package must declare every version it uses except "", which is declared by default if not explicitly
declared. A version must be declared before its first use. A given version name may be declared only once per package. A package
declares a version name Version using the version declaration:
version Version [> VersionList] ;version Version [= Version] ;, ... , VersionA version declaration cannot be nested inside a ClassDefinition's Block.
If Visibility is present, it must be either private, package,
or public (without VersionsAndRenames). Unlike in other declarations,
the default is public, which makes Version accessible by
other packages. A private or package Visibility
hides its Version from other packages; such a Version can be used
only by being included in the VersionList of another Version. Also
unlike other declarations, all Version declarations are global.
If the Version being declared is followed by a > and
a VersionList, then the Version is said to be greater than
all of the Versions in the VersionList. We write v1 :>
v2 to indicate that v1 is greater than v2 and v1 :
v2 to indicate that either v1 and v2 are the same version or v1 :> v2.
Order is transitive, which means that if v1 :> v2 and v2 :> v3, then v1
:> v3. This order induces a partial order on the set of all versions. It is possible for two versions to be
unordered with respect to each other, in which case they are not equal and neither is greater than the other.
If the Version v1 being declared is followed by a =
and another Version v2, then v1 becomes an alias for v2, and
they may be used interchangeably.
A VersionRange specifies a subset of all versions. This subset contains all versions that are both greater than or equal to a given Version1 and less than or equal to a given Version2. A VersionRange can have either of the following forms:
.. [Version2]The first form specifies the one-element set {Version}. The second form specifies the set of all Versions v such that v : Version1 and Version2 : v. If Version1 is omitted, the condition v : Version1 is dropped. If Version2 is omitted, the condition Version2 : v is dropped.
The original version of this specification allowed both strings and numbers as Version names.
Two version names were equal if their toString representations were identical, so version names 2.0
and "2" were identical but 2.0 and "2.0" were not. In addition, numbered versions
had an implicit order: For any two versions v1 and v2 whose names could be represented as numbers,
v1 :> v2 if and only if v1 was numerically greater than v2. Additionally,
every version except 0 was greater than version 0. It was an error to define explicit version
containment relations that would violate this default order, directly or indirectly.
Numbered Version names were dropped for simplicity and to avoid confusion with versions
such as 1.2.3 (which would be a syntax error unless quoted).
Another, simpler, approach is to require all Version names to be nonnegative integers (without quotes). Versions would not need to be declared, and all versions would be totally ordered in numerical order. A disadvantage of this approach is that the total order keeps versions from being branched.
Currently version definitions are fixed. These could be turned into function calls that define versions and list their
relationships. If we can get a variable or constant to hold a set of version names, then we could use these variables rather
than specific version names in the VersionsAndRenames lists after public keywords.
This would provide another level of abstraction and flexibility.
Yet another approach is to consolidate all of the information in VersionsAndRenames into
a set of export statements, say, at the top of the file rather than being interspersed throughout a package
along with public declarations. This would make it easier to see all of the identifiers exported by a particular
version of the package, but it would also likely lead to inconsistencies when someone forgets to update an export
statement after inserting another variable, function, field, or method definition. Such errors would likely be caught after
a package has been released.
|
JavaScript 2.0
Libraries
Machine Types
|
Wednesday, February 16, 2000
The machine types library is an optional library that provides additional low-level types for use in JavaScript 2.0 programs.
On implementations that support this library, these types provide faster, Java-style integer operations that are useful for
communicating between JavaScript 2.0 and other programming languages and for performance-critical code. These types are not
intended to replace Number and Integer for general-purpose scripting.
When the machine types library is imported via an import of MachineTypes version 1, the following types become
available:
|
Type |
Unit |
Values |
|---|---|---|
byte |
B |
Machine integers between -128 and 127 inclusive |
ubyte |
UB |
Machine integers between 0 and 255 inclusive |
short |
S |
Machine integers between -32768 and 32767 inclusive |
ushort |
US |
Machine integers between 0 and 65535 inclusive |
int |
I |
Machine integers between -2147483648 and 2147483647 inclusive |
uint |
UI |
Machine integers between 0 and 4294967295 inclusive |
long |
L |
Machine integers between -9223372036854775808 and 9223372036854775807 inclusive |
ulong |
UL |
Machine integers between 0 and 18446744073709551615 inclusive |
float |
F |
Single-precision IEEE floating-point numbers, including positive and negative zeroes, infinities, and NaN |
Values belonging to the nine machine types above are distinct from each other and from values of type integer.
A literal may be written by using one of the units provided: 7B is the same as byte(7), which is
distinct from 7I, which in turn is distinct from the plain integer 7. A float NaN is
distinct from the regular Number NaN. However, the coercions listed below often hide these distinctions.
No subtype relations hold between the machine types.
The above type names are not reserved words.
The units are defined using the standard unit facility. They may be overridden by the user.
The following coercions take place:
byte| =
|ubyte| = 256, |short| = |ushort| = 65536, |int| = |uint|
= 232, and |long| = |ulong| = 264.Integer or Number.
The result is the closest IEEE double-precision floating-point value using the IEEE round-to-nearest mode. 0 always becomes
+0. Due to the possibility of an inexact result, awarning is generated if type M is long or
ulong unless this coercion is done as a cast.float. The
result is the closest IEEE single-precision floating-point value using the IEEE round-to-nearest mode. 0 always becomes
+0. Due to the possibility of an inexact result, awarning is generated if type M is int, uint,
long, or ulong unless this coercion is done as a cast.float value m can be coerced to type Number. The result is always exact.The following casts can be used:
float or Number value v can be cast to one of the machine integer types M.
First v is truncated to an integer i, truncating towards zero. Then, if i is not within
range of the target type M, it is treated modulo |M|. The result is i with the machine
type M. +0, -0, Infinity, -Infinity, and NaN
all cast to the machine integer 0.Number value v can be cast to type float. If inexact, the cast is done using
the IEEE round-to-nearest mode. +0, -0, Infinity, -Infinity, and NaN
all cast to their float equivalents.Of course, any coercion can also be used as a cast.
When applied to a value with machine type M, the unary negation operator - always returns a value
of the same type M. If the result is not within range of type M, it is treated modulo |M|.
Machine integers support the binary arithmetic operators +, -, *, /,
% and bitwise logical operations ~, &, |, ^. If supplied
two operands of different machine integer types M1 and M2,
all of these binary operators first coerce both operands to the same type M. If M1
appears before M2 in the list byte, ubyte, short,
ushort, int, uint, long, ulong, then M is M2;
otherwise M is M1. Then these operators perform the operation and finally
return the result as a value of type M. If the result is not within range of the target type M, it is
treated modulo |M|.
If one of the operands of +, -, *, /, % is a machine integer
m of type M and the other is a Number or float value, then m is first
coerced to type Number or float. Next, if both operands are floats, then the result
is a float; otherwise the result is a Number.
Machine integers also support bitwise shifts <<, >>, and >>>.
The result has the same as the first operand. The second operand's type can be Number or any machine type and
does not affect the type of the result. Right shifts using >> are signed if the first operand has type
byte, short, int, or long, and unsigned if it has type ubyte,
ushort, uint, or ulong. Right shifts using >>> are always unsigned.
If passed a float argument, the bitwise logical operations ~, &, |,
^ first coerce the float to a Number. If passed a float as the first argument,
the bitwise shifts <<, >>, >>> first coerce the float
to a Number.
The comparison operators ==, !=, <, >, <=, =>
allow any combination of machine type or Number operands. They always compare the exact mathematical values without
first converting one operand's type to the other's. Comparisons involving NaNs are always false, and positive and negative
zeros compare equal.
The identity comparisons === and !== treat all nine machine type values as disjoint from each
other and from regular Number values. Thus, 7B !== 7.
The unary operator !v behaves the same as v!=0 when v has any
machine type.
These rules are designed to permit machine integer operations to be implemented as single instructions on most processor
architectures yet give predictable results. Overflows wrap around instead of signaling errors because such behavior is useful
for many bit-manipulation algorithms and permits much better optimization of performance-critical code. Code that is concerned
about overflows should be using regular Integer instead of the machine integer types.
Why are values of the eight machine integer types distinct? This was done because of a desire to allow arithmetic operators
to only support 32 bits when operating on int values. Let's take a look at the alternative:
Suppose we unify the values of all eight machine types so that 2000000000I is indistinguishable from 2000000000L.
To what precision should an operator like + calculate its results? Clearly, if we're adding two long
values and the result is within the range of long values, then we'd expect to get the right result. In particular,
2000000000L + 2000000000L should yield 4000000000L. However, we assumed
that 2000000000L is indistinguishable from 2000000000I, so 2000000000I +
2000000000I should also yield 4000000000L, which is not representable as an int
value. Thus, even if both operands are known to be int values, the + operator has to use 64-bit
arithmetic.
If a has type int and we compute a+1I, then we have to use 64-bit arithmetic
because the result could be 2147483648. However, if we compute var r:int = a+1I instead, then a smart compiler
could make do with 32-bit arithmetic because the result is treated modulo 232. However, this trick would not
work with an expression such as if (a+1I > 0).
The alternative is viable but it leads to more demand for 64-bit arithmetic. It does have the advantage that one does not need to worry about intermediate overflows as long as the values don't approach 264.
|
JavaScript 2.0
Libraries
Operator Overloading
|
Wednesday, February 16, 2000
Operator overloading is useful to implement Spice-style units without having to add units to the core of the JavaScript 2.0 language. Operator overloading is done via an optional library that, when imported, exposes several additional functions and methods. This library is analogous to the internationalization library in that it does not have to be present on all implementations of JavaScript 2.0; implementations without this library do not support operator overloading.
To override operators, import package Operators, version 1.
After importing package Operators, the following methods become available on all objects. Override these to
override the behavior of unary operators.
| Method | Operator |
|---|---|
Operator::plus() |
+expr |
Operator::minus() |
-expr |
Operator::bitwiseNot() |
~expr |
Operator::preIncrement() |
++expr |
Operator::postIncrement() |
expr++ |
Operator::preDecrement() |
--expr |
Operator::postDecrement() |
expr-- |
Operator::call(a1, ..., an) |
expr(a1,
..., an) |
Operator::construct(a1, ..., an) |
new expr(a1,
..., an) |
Operator::lookup(a1, ..., an) |
expr[a1,
..., an] |
Operator::toBoolean():Boolean |
if (expr) ...,
etc. |
The preIncrement, postIncrement, preDecrement, and postDecrement operators
should return a two-element array; the first element should be the result of the operator, while the second should be a new
value to be stored as the new value of the incremented or decremented variable. The other operators should return a result
of the expression.
The call, construct, and lookup operators also take argument lists. If desired,
these argument lists can include optional or ... arguments.
The !, ||,
^^, &&, and ? :
operators cannot be overridden directly, but they are affected by any redefinition of toBoolean.
After importing package Operators, the following global functions become available to override binary operators:
| Function | Operator |
|---|---|
defineAdd(T1:Type, T2:Type, F:Function) |
+ |
defineSubtract(T1:Type, T2:Type, F:Function) |
- |
defineMultiply(T1:Type, T2:Type, F:Function) |
* |
defineDivide(T1:Type, T2:Type, F:Function) |
/ |
defineRemainder(T1:Type, T2:Type, F:Function) |
% |
defineLeftShift(T1:Type, T2:Type, F:Function) |
<< |
defineRightShift(T1:Type, T2:Type, F:Function) |
>> |
defineLogicalRightShift(T1:Type, T2:Type, F:Function) |
>>> |
defineBitwiseOr(T1:Type, T2:Type, F:Function) |
| |
defineBitwiseXor(T1:Type, T2:Type, F:Function) |
^ |
defineBitwiseAnd(T1:Type, T2:Type, F:Function) |
& |
defineLess(T1:Type, T2:Type, F:Function) |
< |
defineLessOrEqual(T1:Type, T2:Type, F:Function) |
<= |
defineEqual(T1:Type, T2:Type, F:Function) |
== |
defineIdentical(T1:Type, T2:Type, F:Function) |
=== |
Each of these functions defines the meaning of an operator for the case where its first operand has type T1
and the second operand has type T2. At least one of these types must be a class defined in the current package.
F is a function that takes two arguments (of type T1 and T2) and produces the operator's
result. The function F used to override the <, <=,
==, and === operators should return a Boolean;
the results of the other operators are unrestricted.
When one of the operators op above is invoked in an expression a op b, the most specific definition of op that matches a and b is invoked. A definition of op for types t1 and t2 matches if the value of a is a member of t1 and the value of b is a member of t2. A definition of op for types t1 and t2 is most specific if it matches and if every other matching definition of op for types s1 and s2 satisfies t1 s1 and t2 s2. If there is no most specific matching definition of op then an error occurs.
After an operator is defined for a particular pair of types T1 and T2 it cannot be changed. A
static implementation may restrict calls to the above define... functions to occur only in compiler
blocks.
The >, >=,
!=, and !== operators cannot be overridden directly;
instead, they are defined in terms of <, <=,
==, and ===:
| Expression | Definition |
|---|---|
a > b |
b < a |
a >= b |
b <= a |
a != b |
!(a == b) |
a !== b |
!(a === b) |
|
JavaScript 2.0
Formal Description
|
Thursday, November 11, 1999
This chapter presents the formal syntax and semantics of JavaScript 2.0. The syntax notation and semantic notation sections explain the notation used for this description. A simple metalanguage based on a typed lambda calculus is used to specify the semantics.
The syntax and semantic sections are available in both HTML 4.0 and Microsoft Word 98 RTF formats. In the HTML versions each use of a grammar nonterminal or metalanguage value, type, or field is hyperlinked to its definition, making the HTML version preferred for browsing. On the other hand, the RTF version looks much better when printed. The fonts, colors, and other formatting of the various grammar and semantic elements are all encoded as CSS (in HTML) or Word (in RTF) styles and can be altered if desired.
The syntax and semantics sections are machine-generated from code supplied to a small engine that can type-check and execute the semantics directly. This engine is in the CVS tree at mozilla/js/semantics; the input files are at mozilla/js/semantics/JS20.
|
JavaScript 2.0
Formal Description
Semantic Notation
|
Thursday, November 11, 1999
To precisely specify the semantics of JavaScript 2.0, we use the notation described below to define the behavior of all JavaScript 2.0 constructs and their interactions.
The semantics describe the meaning of a JavaScript 2.0 program in terms of operations on simpler objects borrowed from mathematics collectively called semantic values. Semantic values can be held in semantic variables and passed to semantic functions. The kinds of semantic values used in this specification are summarized in the table below and explained in the next few sections:
| Semantic Value Examples | Description |
|---|---|
| The result of a nonterminating computation | |
| syntaxError | The result of a computation that returns by throwing a semantic exception |
| The result of a semantic function that does not return a useful value | |
| true, false | Booleans |
| -3, 0, 1, 2, 93 | Mathematical integers |
| 1/2, -12/7 | Mathematical rational numbers |
| 1.0, 3.5, 2.0e-10, -0.0, -, NaN | Double-precision IEEE floating-point numbers |
‘A’,
‘b’,
‘«LF»’,
‘«uFFFF»’ |
Characters (Unicode 16-bit code points) |
| [value0, ... , valuen-1] | Vectors indexed lists of semantic values |
“”,
“abc”
, “1«TAB»5”
|
Strings |
| {value1, value2, ... , valuen} | Mathematical sets of semantic values |
| name1 value1, name2 value2, ... , namen valuen | Tuples with named member semantic values |
| name or name value | Tagged semantic values |
| function(n: Integer) n*n | Semantic functions |
There is a special semantic value (pronounced as "bottom") that represents the result of an inconsistent or nonterminating computation. Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to or calling a semantic function with as any argument also yields without evaluating any remaining operands or arguments (in technical terms, semantic functions and operators are strict in all of their arguments unless specified otherwise).
If interpreting a JavaScript program according to the semantics here gives a result, an actual implementation executing that JavaScript program will either fail to terminate or throw an exception because it runs out of memory or stack space.
Semantic values of the form value represents the result of a computation that throws a semantic exception. value is the exception's value (which must be a member of the SemanticException semantic type). Unless specified otherwise, applying any semantic operator (such as +, *, etc.) to value or calling a semantic function with value as any argument also yields value (with the same value) without evaluating any remaining operands or arguments.
The throw statement takes a value v and returns v. The catch statement converts v back to v.
Semantic functions that do not return a useful value return the semantic value . There are no operations defined on .
The semantic values true and false are booleans. The not, and, or, and xor operators operate on booleans. Like most other operators, and, or, and xor evaluate both operands before returning a result; these operators do not short-circuit.
Unless specified otherwise, numbers in the semantics written without a slash or decimal point are mathematical integers: ..., -3, -2, -1, 0, 1, 2, 3, .... The usual mathematical operators +, -, *, and unary - can be used on integers. Integers can be compared using =, , <, , >, and .
Numbers in the semantics written with a slash are mathematical rational numbers. Every integer is also a rational. Rational numbers include, for example, 0, 1, 2, -1, 1/2, -12/7, and -24/14; the last two are different ways of writing the same rational number. The usual mathematical operators +, -, *, /, and unary - can be used on rationals. Rationals can be compared using =, , <, , >, and .
Numbers in the semantics written with a decimal point are double-precision IEEE floating-point numbers (often abbreviated as doubles), including distinct +0.0, -0.0, +, -, and NaN. Doubles are distinct from integers and rationals; when writing doubles in the semantics, we always include a decimal point to distinguish them from integers and rationals.
Doubles other than +, -, and NaN are called finite. We define the significand of a finite double d as follows:
Characters are single Unicode 16-bit code points. We write them enclosed in single quotes ‘
and ’. There are exactly 65536 characters: ‘«u0000»’,
‘«u0001»’,
...,‘A’,
‘B’,
‘C’,
..., ‘«uFFFF»’
(see also notation for non-ASCII characters). Unicode surrogates are considered
to be pairs of characters for the purpose of this specification.
The characterToCode and codeToCharacter semantic functions convert between characters and their integer Unicode values.
A semantic vector contains zero or more elements indexed by integers starting from zero. We write a vector value by enclosing a comma-separated list of values inside bold brackets:
[element0, element1, ... , elementn-1]
For example, the following semantic value is a vector whose elements are four strings:
[“parsley”, “sage”, “rosemary”, “thyme”]
The empty vector is written as [].
Let u = [e0, e1, ... , en-1] and v = [f0, f1, ... , fm-1] be vectors, i and j be integers, and x be a value. The following notations describe common operations on vectors:
| Notation | Result Value |
|---|---|
| u v | The concatenated vector [e0, e1, ... , en-1, f0, f1, ... , fm-1] |
| |u| | The length n of the vector |
| u[i] | The ith element ei, or if i<0 or in |
| u[i ... j] | The vector slice [ei, ei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive, or if i<0, jn, or j<i-1. The result is the empty vector [] if j=i-1. |
| u[i ...] | The vector slice [ei, ei+1, ... , en-1] consisting of all elements of u between the ith and the end, or if i<0 or i>n. The result is the empty vector [] if i=n. |
| u[i x] | The vector [e0, ... , ei-1, x, ei+1, ... , en-1] with the ith element replaced by the value x and the other elements unchanged, or if i<0 or in |
Semantic vectors are functional; there is no notation for modifying a semantic vector in place.
A semantic string is merely a vector of characters. For notational convenience we can write a string literal as zero or more characters enclosed in double quotes. Thus,
“Wonder«LF»”
is equivalent to:
[‘W’, ‘o’, ‘n’, ‘d’, ‘e’, ‘r’, ‘«LF»’]
In addition to all of the other vector operations, we can use =, , <, , >, and to compare two strings.
A semantic set is an unordered collection of values. Each value may occur at most once in a set. There must be a well-defined = semantic operator defined on all pairs of values in the set, and that operator must induce an equivalence relation.
A semantic set is denoted by enclosing a comma-separated list of values inside braces:
{element1, element2, ... , elementn}
The empty set is written as {}.
For example, the following set contains seven integers:
{3, 0, 10, 11, 12, 13, -5}
When using elements such as integers and characters that have an obvious total order, we can also write sets by using the ... range operator. For example, we can rewrite the above set as:
{0, -5, 3 ... 3, 10 ... 13}
If the beginning of the range is equal to the end of the range, then the range consists of only one element: {7 ... 7} is the same as {7}. If the end of the range is one "less" than the beginning, then the range contains no elements: {7 ... 6} is the same as {}. If the end of the range is more than one "less" than the beginning, then the set is .
Let A and B be sets and x be a value. The following notations describe common operations on sets:
| Notation | Result Value |
|---|---|
| |A| | The number of elements in the set A; if A has infinitely many elements |
| min A | If there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite descending sequence of elements with no lower bound in A) |
| max A | If there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite ascending sequence of elements with no upper bound in A) |
| A B | The intersection of sets A and B (the set of all values that are present both in A and in B) |
| A B | The union of sets A and B (the set of all values that are present in at least one of A or B) |
| A - B | The difference of sets A and B (the set of all values that are present in A but not B) |
| x A | Return true if x is an element of set A and false if not |
| A = B | Return true if the two sets A and B are equal and false otherwise. Sets A and B are equal if every element of A is also in B and every element of B is also in A. |
min and max are only defined for sets whose elements can be compared with <.
A semantic tuple is an aggregate of several named semantic values. Tuples are sometimes called records or structures in other languages. A tuple is denoted by a comma-separated list of names and values between bold triangular brackets:
name1 value1, name2 value2, ... , namen valuen
Each namei valuei pair is called a field. The order of fields in a tuple is irrelevant, so x 3, y 4 is the same as y 4, x 3. A tuple's names must all be distinct.
Let w be an expression that evaluates to a tuple name1 value1, name2 value2, ... , namen valuen. We can extract the value of the field named namei from w by using the notation w.namei. w is required to have this field. For example, x 3, y 4.x is 3.
In the HTML versions of the semantics, each use of namei is linked back to its tuple type's definition.
A semantic oneof is a pair consisting of a name (called the tag) and a value. Oneofs are sometimes called variants or tagged unions in other languages. A oneof is denoted by writing the tag followed by the value:
name value
For brevity, when value is , we can omit it altogether, so red is the same as red .
Let o be an expression that evaluates to some oneof n v. We can perform the following operations on o:
| Notation | Result Value |
|---|---|
| o.name | The value v if n is name; otherwise |
| o is name | true if n is name; false otherwise |
For example, (red 5) is blue evaluates to false, while (red 5) is red evaluates to true. (red 5).red evaluates to 5.
In addition to the operators above, the case statement evaluates one of several expressions based on a oneof tag.
In the HTML versions of the semantics, each use of name is linked back to its oneof type's definition.
A semantic function receives zero or more arguments, performs computations, and returns a result. We write a semantic function as follows:
function(param1: type1, ... , paramn: typen) body
Here param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, and body is an expression that computes the function's result. When the function is called with argument values v1 through vn, the function's body is evaluated and the resulting value returned to the caller. body can refer to the parameters param1 through paramn; each reference to a parameter parami evaluates to the corresponding argument value vi. Arguments are passed by value (which in this language is equivalent to passing them by reference because there is no way to write to a parameter).
Function parameters are statically scoped. When functions are nested and an inner function f defines a parameter with the same name as a parameter of an outer function g, then f's parameter shadows g's parameter inside f.
The only operation allowed on a semantic function f is calling it, which we do using the f(arg1, ..., argn) syntax. In the presence of side effects, f is evaluated first, followed by the argument expressions arg1 through argn, in left-to-right order. If the result of evaluating f or any of the argument expressions is , then the call immediately returns without evaluating the following argument expressions, if any. If the result of evaluating f or any of the argument expressions is v for some value v, then the call immediately returns that v without evaluating the following argument expressions, if any. Otherwise, f's body is evaluated and the resulting value returned to the caller.
A semantic type is a possibly infinite set of semantic values. Names of semantic types are shown in Capitalized Red Small Caps, and compound semantic type expressions are in red.
We use semantic types to make the semantics more readable by declaring the semantic type of each semantic variable (including function argument variables). Each such declaration states that the only values that will be stored in a semantic variable will be members of that variable's semantic type. These declarations can be proven statically. The JavaScript semantics have been machine type-checked to ensure that every type declaration holds, so, for example, if the semantics state that variable x has type Integer then there does not exist any place that could assign the value true to x.
Semantic type annotations allow us to restrict the description of each semantic operator and function to only describe its behavior on arguments that are members of the arguments' semantic types. Thus, for example, we need not describe the behavior of the + semantic operator when passed the semantic values true and as operands because we can prove that this case cannot arise.
Every semantic type includes the values and v for all values v whose semantic type is SemanticException. For brevity we do not list and v in the tables below.
The following are the basic semantic types:
The type Rational includes Integer as a subtype because every integer is also a rational number. Except for and v, the types Rational and Double are disjoint.
We can construct compound semantic types using the notation below. Here t, t1, t2, ..., tn represent some existing semantic types.
| Type | Set of Values |
|---|---|
| t[] | All vectors [v0, ... , vn-1] all of whose elements v0, ... , vn-1 have type t. Note that the empty vector [] is a member of every vector type t[]. |
| {t} | All sets {v1, v2, ... , vn} all of whose elements v1, ... , vn have type t. Note that the empty set {} is a member of every set type {t}. |
| tuple {name1: t1; ... ; namen: tn} | All tuples name1 v1, ... , namen vn for which each vi has type ti for 1 i n. The namei's must be distinct; the order in which the namei: ti fields are listed does not matter. |
| oneof {name1: t1; ... ; namen: tn} | All oneofs of the form namei v, where 1 i n and v has type ti. If tk is Void, then namek: tk can be abbreviated as simply namek in the oneof semantic type syntax. The namei's must be distinct; the order in which the namei: ti alternatives are listed does not matter. |
| t1 t2 ... tn t | Some* functions that take n arguments of types t1
through tn respectively and produce a result of type t.
If n is zero (the function takes no arguments), we write this type as () t. * Technically speaking, this semantic type includes only functions that are continuous in the domain-theoretical sense; this avoids set-theoretical paradoxes. |
| () t |
The type constructors earlier in the table bind tighter than ones later in the table, so, for example, Integer[] Rational[] is equivalent to (Integer[]) (Rational[]) (a function that takes a vector of Integers and returns a vector of Rationals) rather than ((Integer[]) Rational)[] (a vector of functions, each of which takes a vector of Integers and returns a Rational). In the rare cases where this is needed, parentheses are used to override precedence.
The table below lists the semantic operators in order from the highest precedence (tightest-binding) to the lowest precedence (loosest-binding). Operators under the same heading of the table have the same precedence and associate left-to-right, so, for example, 7-3+2-1 is interpreted as ((7-3)+2)-1 instead of 7-(3+(2-1)) or (7-(3+2))-1. When needed, parentheses can be used to group expressions.
The type signatures of the operators are also listed. Some operators are polymorphic; t, t1, t2, ..., and tn can represent any semantic types. The types of some operators are underdetermined; for example, [] can have type t[] for any type t. In these cases the particular choice of type is inferred from the context.
Each operator in the table below is strict: it evaluates all of its operands left-to-right, and if any operand evaluates to , then the operator immediately returns without evaluating the following operands, if any. If any operand evaluates to v for some value v, then the operator immediately returns that v without evaluating the following operands, if any.
| Operator | Signatures | Description |
|---|---|---|
| Nonassociative Operators | ||
| (x) | t t | Return x. Parentheses are used to override operator precedence. |
| |u| | t[] Integer | u is a vector [e0, e1, ... , en-1]. Return the length n of that vector. |
| {t} Integer | The number of elements in the set u; if u has infinitely many elements | |
| [x0, x1, ... , xn-1] | t ... t t[] | Return a vector with the elements x0, x1, ... , xn-1. |
| {x1, x2, ... , xn} | t ... t {t} | Return a set with the elements x1, x2, ... , xn. Any duplicate elements are included only once in the set. When t is Integer or Character, we can also replace any of the xi's by a range xi ... yi that contains all integers or characters greater than or equal to xi and less than or equal to yi. yi must not be less than xi "minus" one. |
| name1 x1, ... , namen xn | t1 ... tn tuple {name1: t1; ... ; namen: tn} | Return a tuple with the fields name1 x1, ... , namen xn. |
| name | oneof {name; name2: t2; ... ; namen: tn} | Return a oneof value with tag name and value . |
| Action[nonterminali] | Determined by Action's declaration | This notation can only be used inside an action definition for a grammar production that has nonterminal nonterminal on the production's right side. Return the value of action Action invoked on the ith instance of nonterminal nonterminal on the right side of . The subscript i can be omitted if there is only one instance of nonterminal nonterminal in . |
| nonterminali | Character | This notation can only be used inside an action definition for a grammar production that has
nonterminal nonterminal on
the production's left or right side. Furthermore, every complete expansion of grammar nonterminal nonterminal must
expand it into a single character. Return the character to which the ith instance of nonterminal nonterminal on the right side of expands. The subscript i can be omitted if there is only one instance of nonterminal nonterminal in . If the subscript is omitted and nonterminal nonterminal appears on the left side of , then this expression returns the single character to which this whole production expands. |
| Suffix Operators | ||
| u[i] | t[] Integer t | u is a vector [e0, e1, ... , en-1]. Return the ith element ei, or if i<0 or in. |
| u[i ... j] | t[] Integer Integer t[] | u is a vector [e0, e1, ... , en-1]. Return the vector slice [ei, ei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive, or if i<0, jn, or j<i-1. The result is the empty vector [] if j=i-1. |
| u[i ...] | t[] Integer t[] | u is a vector [e0, e1, ... , en-1]. Return the vector slice [ei, ei+1, ... , en-1] consisting of all elements of u between the ith and the end, or if i<0 or i>n. The result is the empty vector [] if i=n. |
| u[i x] | t[] Integer t t[] | u is a vector [e0, e1, ... , en-1]. Return the vector [e0, ... , ei-1, x, ei+1, ... , en-1] with the ith element replaced by the value x and the other elements unchanged, or if i<0 or in. |
| w.namei | tuple {name1: t1; ... ; namen: tn} ti | w is a tuple name1 v1, ... , namen vn. Return the value vi of w's field named namei. |
| oneof {name1: t1; ... ; namen: tn} ti | w is a oneof namek v for some k between 1 and n inclusive. Return the value v if namei is namek, or if not. | |
| f(x1, ..., xn) | (t1 ... tn t) t1 ... tn t | Call the function f with the arguments x1 through xn and return the result. |
| Prefix Operators | ||
| -x | Integer Integer
or Rational Rational |
The mathematical negation of x |
| min A | {t} t | Return the minimal element of set A. Specifically, if there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite descending sequence of elements with no lower bound in A). The type t must have = and < operations that define a total order. |
| max A | {t} t | Return the maximal element of set A. Specifically, if there exists a value m that satisfies both m A and for all elements x A, x m, then return m; otherwise return (this could happen either if A is empty or if A has an infinite ascending sequence of elements with no upper bound in A). The type t must have = and < operations that define a total order. |
| name x | t oneof {name: t; name2: t2; ... ; namen: tn} | Return a oneof value with tag name and value x. |
| Multiplicative Operators | ||
| x * y | Integer Integer Integer
or Rational Rational Rational |
The mathematical product of x and y |
| x / y | Rational Rational Rational | The mathematical quotient of x and y; if y=0 |
| A B | {t} {t} {t} | The intersection of sets A and B (the set of all values that are present both in A and in B) |
| Additive Operators | ||
| x + y | Integer Integer Integer
or Rational Rational Rational |
The mathematical sum of x and y |
| x - y | The mathematical difference of x and y | |
| u v | t[] t[] t[] | u is a vector [e0, e1, ... , en-1] and v is a vector [f0, f1, ... , fm-1]. Return the concatenated vector [e0, e1, ... , en-1, f0, f1, ... , fm-1]. |
| A B | {t} {t} {t} | The union of sets A and B (the set of all values that are present in at least one of A or B) |
| A - B | {t} {t} {t} | The difference of sets A and B (the set of all values that are present in A but not B) |
| Comparison Operators | ||
| x = y | Rational Rational Boolean
or Character Character Boolean or String String Boolean or {t} {t} Boolean |
Comparisons return true if the relation holds or false
if not. Rationals are compared mathematically. Characters are compared according to their code points. Two strings are equal when they have the same lengths and contain exactly the same sequences of characters. A string x is less than string y when either x is the empty string and y is not empty, the first character of x is less than the first character of y, or the first character of x is equal to the first character of y and the rest of string x is less than the rest of string y. Two sets x and y are equal if every element of x is also in y and every element of y is also in x. Only = and can be used to compare sets. |
| x y | ||
| x < y | ||
| x y | ||
| x > y | ||
| x y | ||
| x A | t {t} Boolean | Return true if x is an element of set A and false if not |
| o is namei | oneof {name1: t1; ... ; namen: tn} Boolean | o is a oneof namek v for some k between 1 and n inclusive. Return true if namei is namek, or false otherwise. |
| Logical Negation | ||
| not a | Boolean Boolean | true if a is false; false if a is true |
| Logical Conjunction | ||
| a and b | Boolean Boolean Boolean | true if both a and b are true; false if at least one of a and b is false |
| Logical Disjunction | ||
| a or b | Boolean Boolean Boolean | true if at least one of a and b is true; false if both a and b are false |
| a xor b | true if a is true and b is false or a is false and b is true; false if both a and b are true or both a and b are false | |
Semantic statements are similar to the semantic operators above in that they are also used to construct expressions, take zero or more operands, and return a value. Unlike other semantic operators, semantic statements are usually non-strict: they do not always evaluate all of their operands. Semantic statements have lower precedence than any of the semantic operators above.
Some semantic statements are syntactic sugars, which means that they are defined as macros that expand into other, simpler statements and operators.
function(param1: type1, ... , paramn: typen) body
See the description of function values.let var1: type1 = expr1; ... ; varn: typen = exprn in body
Evaluate expr1 through exprn in order and save the results. If any expri evaluates to , then immediately return without evaluating the following expr's. If any expri evaluates to v for some value v, then immediately return that v without evaluating the following expr's. Otherwise evaluate body with new local variable bindings of var1 through varn bound to the saved results of evaluating expr1 through exprn, respectively. Return the result of evaluating body.
type1 through typen are the local variables' respective semantic types. The type of the entire let expression is the type of its body.
The let expression above is syntactic sugar for:
(function(var1: type1, ... , varn: typen) body)(expr1, ... , exprn)
if expr then bodytrue else bodyfalse
Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to either true or false. If it evaluated to true, then evaluate bodytrue and return its result. If expr evaluated to false, then evaluate bodyfalse and return its result.
expr must have type Boolean. The entire if expression has any type t such that both bodytrue has type t and bodyfalse has type t.
case expr of
name1(var1: type1): body1;
...
namen(varn: typen): bodyn;
end
Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to a oneof name v where name matches namei for some i between 1 and n inclusive. Evaluate the corresponding bodyi with a new local variable vari bound to v. Return bodyi's result.
If we are not interested in using the oneof's value for a particular bodyi, we can shorten that bodyi's clause from:
namei(vari: typei): bodyi
to:
namei: bodyi
In this case no local variable is bound while evaluating bodyi.
expr must have type oneof {name1: type1; ... ; namen: typen}. The entire case expression has any type t such that all of its bodyi's have type t. The namei's must be distinct. The order in which the case clauses are listed does not matter.
throw expr
Evaluate expr. If it evaluates to , then immediately return . If expr evaluates to v for some value v, then immediately return that v. Otherwise expr must evaluate to some value v, in which case return v.
expr must have type SemanticException. The entire throw expression has any type whatsoever (because every semantic type includes v).
try
bodytry
catch (var: SemanticException)
bodyhandler
Evaluate bodytry to obtain a value w. If w does not have the form v for some value v, then return w. Otherwise w is v for some value v. In this case evaluate bodyhandler with a new local variable var bound to v and return bodyhandler's result.
The type of var is always SemanticException. The entire try-catch expression has any type t such that both bodytry has type t and bodyhandler has type t.
The sections below list the predefined semantic functions, their type signatures, and short descriptions. All functions are strict and evaluate their arguments left-to-right.
These functions perform bitwise operations on integers. The integers are treated as though they were written in binary notation, with each 1 bit representing true and 0 bit representing false. The integers must be nonnegative.
| Function | Signature | Description |
|---|---|---|
| rationalToDouble(r) | Rational Double | The rational number r rounded to the nearest IEEE double-precision floating-point value as follows: Consider the set of all doubles, with -0.0, +, -, and NaN removed and with two additional values added to it that are not representable as doubles, namely 21024 and -21024. Choose the member of this set that is closest in value to r. If two values of the set are equally close, choose the one with an even significand; for this purpose, the two extra values 21024 and -21024 are considered to have even significands. Finally, if 21024 was chosen, replace it with +; if -21024 was chosen, replace it with -; if +0.0 was chosen, replace it with -0.0 if and only if r < 0; any other chosen value is used unchanged. The result is the value of rationalToDouble(r). This procedure corresponds exactly to the behavior of the IEEE 754 "round to nearest" mode. |
| Function | Signature | Description |
|---|---|---|
| characterToCode(c) | Character Integer | The number of the Unicode code point c |
| codeToCharacter(i) | Integer Character | The Unicode code point number i, or if i<0 or i>65535 |
The function digitValue is defined as follows:
digitValue(c: Character) : Integer
= if c {‘0’ ... ‘9’}
then characterToCode(c) - characterToCode(‘0’)
else if c {‘A’ ... ‘Z’}
then characterToCode(c) - characterToCode(‘A’) + 10
else if c {‘a’ ... ‘z’}
then characterToCode(c) - characterToCode(‘a’) + 10
else
| Function | Signature | Description |
|---|---|---|
| isOrdinaryInitialIdentifierCharacter(c) | Character Boolean | Return true if the nonterminal OrdinaryInitialIdentifierCharacter can expand into c and false otherwise |
| isOrdinaryContinuingIdentifierCharacter(c) | Character Boolean | Return true if the nonterminal OrdinaryContinuingIdentifierCharacter can expand into c and false otherwise |
We can define a global semantic constant named var as follows:
var : type = expr
expr should evaluate to a value of type type. expr should not have side effects, and it should not evaluate to .
In the HTML versions of the semantics, each reference to the global semantic constant var is linked to var's definition.
We can define a global semantic function named f as follows:
f(param1: type1, ... , paramn: typen) : type = body
param1 through paramn are the function's parameters, type1 through typen are the parameters' respective semantic types, type is the function result's semantic type, and body is an expression that computes the function's result.
The above definition is syntactic sugar for the global constant definition:
f : type1 type2 ... typen type = function(param1: type1, ... , paramn: typen) body
In the HTML versions of the semantics, each reference to the global semantic function f is linked to f's definition.
For example, the function definition
square(x: Integer) : Integer = x*x
defines a function named square that takes an Integer parameter x and returns an Integer that is the square of x. This is equivalent to the following global definition:
square : Integer Integer = function(x: Integer) x*x
We can give a new name to a semantic type t by using the type definition, which has the form:
type name = t
For example, the following notation defines RegExp as a shorthand for tuple {reBody: String; reFlags: String}:
type RegExp = tuple {reBody: String; reFlags: String}
In the HTML versions of the semantics, each reference to the semantic type name name is linked to name's definition.
Semantic actions tie together the grammar and the semantics. A semantic action ascribes semantic meaning to a grammar production.
To illustrate the use of semantic actions, we shall look at an example, followed by a detailed description of the notation for specifying semantic actions.
Consider the following grammar, with the start nonterminal Numeral:
This grammar defines the syntax of an acceptable input: “37”,
“33#4”
and “30#2”
are acceptable syntactically, while “1a”
is not. However, the grammar does not indicate what these various inputs mean. That is the job of the semantics, which are
defined in terms of actions on the parse tree of grammar rule expansions. Consider the following sample set of actions defined
on this grammar, with a starting Numeral action called (in this example)
Value:
type SemanticException = oneof {syntaxError}
action Value[Digit] : Integer = digitValue(Digit)
action DecimalValue[Digits] : Integer
DecimalValue[Digits Digit] = Value[Digit]
DecimalValue[Digits Digits1 Digit] = 10*DecimalValue[Digits1] + Value[Digit]
action BaseValue[Digits] : Integer Integer
BaseValue[Digits Digit](base: Integer)
= let d: Integer = Value[Digit]
in if d < base
then d
else throw syntaxError
BaseValue[Digits Digits1 Digit](base: Integer)
= let d: Integer = Value[Digit]
in if d < base
then base*BaseValue[Digits1](base) + d
else throw syntaxError
action Value[Numeral] : Integer
Value[Numeral Digits] = DecimalValue[Digits]
Value[Numeral Digits1 # Digits2]
= let base: Integer = DecimalValue[Digits2]
in if base 2 and base 10
then BaseValue[Digits1](base)
else throw syntaxError
Action names are written in violet cursive type. The last action
definition states in the example above that the action Value can be applied to any expansion
of the nonterminal Numeral, and the result is an Integer.
This action maps all acceptable inputs to integers or syntaxError.
If the result is syntaxError, then the input satisfies the grammar but
contains an error detected by the semantics; this is the case for the input “30#2”.
A result of would indicate a nonterminating computation; this
cannot happen in this example.
There are two definitions of the Value action on Numeral,
one for each grammar production that expands Numeral. Each definition
of an action is allowed to call actions on the terminals and nonterminals on the right side of the expansion. For example,
Value applied to the first Numeral production
(the one that expands Numeral into Digits)
simply applies the DecimalValue action to the expansion of the nonterminal Digits
and returns the result. On the other hand, Value applied to the second Numeral
production (the one that expands Numeral into Digits # Digits)
performs a computation using the results of the DecimalValue and BaseValue
applied to the two expansions of the Digits nonterminals. In this case
there are two identical nonterminals Digits on the right side of the
expansion, so we use subscripts to indicate on which one we're calling the actions DecimalValue
and BaseValue.
The BaseValue action illustrates a syntactic sugar for defining an action that is a function; this syntactic sugar is analogous to that for defining global functions.
The Value action on Digit illustrates the direct use of a nonterminal in a semantic expression: digitValue(Digit). Here the Digit semantic expression evaluates to the character into which the Digit grammar rule expands.
We can fully evaluate the semantics on our sample inputs to get the following results:
| Input | Semantic Result |
|---|---|
37 |
37 |
33#4 |
15 |
30#2 |
syntaxError |
action Action[nonterminal] : type
This declaration states that action Action is defined on nonterminal nonterminal. Any reference to action Action[nonterminal] in a semantic expression returns a value of type type. The values of action Action must be defined using action definitions for each grammar production that has nonterminal on the left side.
Action[nonterminal expansion] = expr
This notation defines the value of action Action on nonterminal nonterminal in the case where nonterminal nonterminal expands to the given expansion. expansion can contain zero or more terminals and nonterminals (as well as other notations allowed on the right side of a grammar production). Furthermore, the terminals and nonterminals of expansion can be subscripted to allow them to be unambiguously referenced by action references or nonterminal references inside expr.
The type of action Action on nonterminal nonterminal must be declared using an action declaration. expr must have the type given by that action declaration.
nonterminal expansion must be one of the productions in the grammar.
Action[nonterminal expansion](param1: type1, ... , paramn: typen) = body
This notation is a syntactic sugar for defining an action whose value is a function. This notation is equivalent to:
Action[nonterminal expansion] =
function(param1: type1, ... , paramn: typen) body
action Action[nonterminal] : type = expr
This declaration is sometimes used when all expansions of nonterminal nonterminal share the same action semantics. This declaration states both the type type of action Action on nonterminal nonterminal as well as that action's value expr. Note that the expansions are not given between the square brackets, and expr can refer only to the nonterminal nonterminal on the left side of grammar productions. No additional action definitions are needed for nonterminal nonterminal.
See the Value action on Digit in the example above for an example of this declaration.
|
JavaScript 2.0
Formal Description
Stages
|
Thursday, November 11, 1999
The source code is processed in the following stages:
Processing stage 2 is done as follows:
If an implementation encounters an error while lexing, it is permitted to either report the error immediately or defer it until the affected token would actually be used by the parser. This flexibility allows an implementation to do lexing at the same time it parses the source program.
Provide language prohibiting an identifier from immediately following a number. This will fall out of the revised definition of QuantityLiteral.
Show mapping from Token structures to parser grammar terminals (obvious, but needs to be written).
To be provided
|
JavaScript 2.0
Formal Description
Lexer Grammar
|
Monday, December 6, 1999
This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.
This document is also available as a Word 98 rtf file.
The start symbols are:
NextTokenunit
if the previous token was a number;
NextTokenre
if the previous token was not a number and a / should be interpreted as a regular
expression; and
NextTokendiv
if the previous token was not a number and a / should be interpreted as a division or
division-assignment operator.
«TAB» | «VT» | «FF» | «SP» | «u00A0»«u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»«u2008» | «u2009» | «u200A» | «u200B»«u3000»!! =! = =#%% =&& && & =& =()** =++ ++ =,-- -- =- >.. .. . .:: :;<< << < =< === == = =>> => >> > => > >> > > =?@[]^^ =^ ^^ ^ ={|| =| || | =}~. Fraction|
JavaScript 2.0
Formal Description
Lexer Semantics
|
Monday, December 6, 1999
The lexer semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexer grammar is repeated here. See also the description of the semantic notation.
This document is also available as a Word 98 rtf file.
The start symbols are:
NextTokenunit
if the previous token was a number;
NextTokenre
if the previous token was not a number and a / should be interpreted as a regular
expression; and
NextTokendiv
if the previous token was not a number and a / should be interpreted as a division or
division-assignment operator.
type SemanticException = oneof {syntaxError}
«TAB» | «VT» | «FF» | «SP» | «u00A0»«u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»«u2008» | «u2009» | «u200A» | «u200B»«u3000»action DecimalValue[ASCIIDigit] : Integer = digitValue(ASCIIDigit)
action Token[NextTokent] : Token
Token[NextTokenre WhiteSpace Tokenre] = Token[Tokenre]
Token[NextTokendiv WhiteSpace Tokendiv] = Token[Tokendiv]
Token[NextTokenunit [lookahead{OrdinaryContinuingIdentifierCharacter, \}] WhiteSpace Tokendiv]
= Token[Tokendiv]
Token[NextTokenunit [lookahead{_}] IdentifierName] = string Name[IdentifierName]
Token[NextTokenunit _ IdentifierName] = string Name[IdentifierName]
type RegExp = tuple {reBody: String; reFlags: String}
type Quantity = tuple {amount: Double; unit: String}
type Token
= oneof {
lineBreak;
identifier: String;
keyword: String;
punctuator: String;
number: Double;
string: String;
regularExpression: RegExp;
end}
Token[Tokent LineBreaks] = lineBreak
Token[Tokent IdentifierOrReservedWord] = Token[IdentifierOrReservedWord]
Token[Tokent Punctuator] = punctuator Punctuator[Punctuator]
Token[Tokendiv DivisionPunctuator] = punctuator Punctuator[DivisionPunctuator]
Token[Tokent NumericLiteral] = number DoubleValue[NumericLiteral]
Token[Tokent StringLiteral] = string StringValue[StringLiteral]
Token[Tokenre RegExpLiteral] = regularExpression REValue[RegExpLiteral]
Token[Tokent EndOfInput] = end
action Name[IdentifierName] : String
Name[IdentifierName InitialIdentifierCharacter]
= [CharacterValue[InitialIdentifierCharacter]]
Name[IdentifierName IdentifierName1 ContinuingIdentifierCharacter]
= Name[IdentifierName1] [CharacterValue[ContinuingIdentifierCharacter]]
action ContainsEscapes[IdentifierName] : Boolean
ContainsEscapes[IdentifierName InitialIdentifierCharacter]
= ContainsEscapes[InitialIdentifierCharacter]
ContainsEscapes[IdentifierName IdentifierName1 ContinuingIdentifierCharacter]
= ContainsEscapes[IdentifierName1] or ContainsEscapes[ContinuingIdentifierCharacter]
action CharacterValue[InitialIdentifierCharacter] : Character
CharacterValue[InitialIdentifierCharacter OrdinaryInitialIdentifierCharacter]
= OrdinaryInitialIdentifierCharacter
CharacterValue[InitialIdentifierCharacter \ HexEscape]
= if isOrdinaryInitialIdentifierCharacter(CharacterValue[HexEscape])
then CharacterValue[HexEscape]
else throw syntaxError
action ContainsEscapes[InitialIdentifierCharacter] : Boolean
ContainsEscapes[InitialIdentifierCharacter OrdinaryInitialIdentifierCharacter] = false
ContainsEscapes[InitialIdentifierCharacter \ HexEscape] = true
action CharacterValue[ContinuingIdentifierCharacter] : Character
CharacterValue[ContinuingIdentifierCharacter OrdinaryContinuingIdentifierCharacter]
= OrdinaryContinuingIdentifierCharacter
CharacterValue[ContinuingIdentifierCharacter \ HexEscape]
= if isOrdinaryContinuingIdentifierCharacter(CharacterValue[HexEscape])
then CharacterValue[HexEscape]
else throw syntaxError
action ContainsEscapes[ContinuingIdentifierCharacter] : Boolean
ContainsEscapes[ContinuingIdentifierCharacter OrdinaryContinuingIdentifierCharacter]
= false
ContainsEscapes[ContinuingIdentifierCharacter \ HexEscape] = true
reservedWords : String[]
= [“abstract”,
“break”,
“case”,
“catch”,
“class”,
“const”,
“continue”,
“debugger”,
“default”,
“delete”,
“do”,
“else”,
“enum”,
“eval”,
“export”,
“extends”,
“false”,
“final”,
“finally”,
“for”,
“function”,
“goto”,
“if”,
“implements”,
“import”,
“in”,
“instanceof”,
“native”,
“new”,
“null”,
“package”,
“private”,
“protected”,
“public”,
“return”,
“static”,
“super”,
“switch”,
“synchronized”,
“this”,
“throw”,
“throws”,
“transient”,
“true”,
“try”,
“typeof”,
“var”,
“volatile”,
“while”,
“with”]
nonReservedWords : String[]
= [“box”,
“constructor”,
“field”,
“get”,
“language”,
“local”,
“method”,
“override”,
“set”,
“version”]
keywords : String[] = reservedWords nonReservedWords
member(id: String, list: String[]) : Boolean
= if |list| = 0
then false
else if id = list[0]
then true
else member(id, list[1 ...])
action Token[IdentifierOrReservedWord] : Token
Token[IdentifierOrReservedWord IdentifierName]
= let id: String = Name[IdentifierName]
in if member(id, keywords) and not ContainsEscapes[IdentifierName]
then keyword id
else identifier id
!! =! = =#%% =&& && & =& =()** =++ ++ =,-- -- =- >.. .. . .:: :;<< << < =< === == = =>> => >> > => > >> > > =?@[]^^ =^ ^^ ^ ={|| =| || | =}~action Punctuator[Punctuator] : String
Punctuator[Punctuator !] = “!”
Punctuator[Punctuator ! =] = “!=”
Punctuator[Punctuator ! = =] = “!==”
Punctuator[Punctuator #] = “#”
Punctuator[Punctuator %] = “%”
Punctuator[Punctuator % =] = “%=”
Punctuator[Punctuator &] = “&”
Punctuator[Punctuator & &] = “&&”
Punctuator[Punctuator & & =] = “&&=”
Punctuator[Punctuator & =] = “&=”
Punctuator[Punctuator (] = “(”
Punctuator[Punctuator )] = “)”
Punctuator[Punctuator *] = “*”
Punctuator[Punctuator * =] = “*=”
Punctuator[Punctuator +] = “+”
Punctuator[Punctuator + +] = “++”
Punctuator[Punctuator + =] = “+=”
Punctuator[Punctuator ,] = “,”
Punctuator[Punctuator -] = “-”
Punctuator[Punctuator - -] = “--”
Punctuator[Punctuator - =] = “-=”
Punctuator[Punctuator - >] = “->”
Punctuator[Punctuator .] = “.”
Punctuator[Punctuator . .] = “..”
Punctuator[Punctuator . . .] = “...”
Punctuator[Punctuator :] = “:”
Punctuator[Punctuator : :] = “::”
Punctuator[Punctuator ;] = “;”
Punctuator[Punctuator <] = “<”
Punctuator[Punctuator < <] = “<<”
Punctuator[Punctuator < < =] = “<<=”
Punctuator[Punctuator < =] = “<=”
Punctuator[Punctuator =] = “=”
Punctuator[Punctuator = =] = “==”
Punctuator[Punctuator = = =] = “===”
Punctuator[Punctuator >] = “>”
Punctuator[Punctuator > =] = “>=”
Punctuator[Punctuator > >] = “>>”
Punctuator[Punctuator > > =] = “>>=”
Punctuator[Punctuator > > >] = “>>>”
Punctuator[Punctuator > > > =] = “>>>=”
Punctuator[Punctuator ?] = “?”
Punctuator[Punctuator @] = “@”
Punctuator[Punctuator [] = “[”
Punctuator[Punctuator ]] = “]”
Punctuator[Punctuator ^] = “^”
Punctuator[Punctuator ^ =] = “^=”
Punctuator[Punctuator ^ ^] = “^^”
Punctuator[Punctuator ^ ^ =] = “^^=”
Punctuator[Punctuator {] = “{”
Punctuator[Punctuator |] = “|”
Punctuator[Punctuator | =] = “|=”
Punctuator[Punctuator | |] = “||”
Punctuator[Punctuator | | =] = “||=”
Punctuator[Punctuator }] = “}”
Punctuator[Punctuator ~] = “~”
action Punctuator[DivisionPunctuator] : String
Punctuator[DivisionPunctuator / [lookahead{/, *}]] = “/”
Punctuator[DivisionPunctuator / =] = “/=”
action DoubleValue[NumericLiteral] : Double
DoubleValue[NumericLiteral DecimalLiteral]
= rationalToDouble(RationalValue[DecimalLiteral])
DoubleValue[NumericLiteral HexIntegerLiteral [lookahead{HexDigit}]]
= rationalToDouble(IntegerValue[HexIntegerLiteral])
expt(base: Rational, exponent: Integer) : Rational
= if exponent = 0
then 1
else if exponent < 0
then 1/expt(base, -exponent)
else base*expt(base, exponent - 1)
. Fractionaction RationalValue[DecimalLiteral] : Rational
RationalValue[DecimalLiteral Mantissa] = RationalValue[Mantissa]
RationalValue[DecimalLiteral Mantissa LetterE SignedInteger]
= RationalValue[Mantissa]*expt(10, IntegerValue[SignedInteger])
action RationalValue[Mantissa] : Rational
RationalValue[Mantissa DecimalIntegerLiteral] = IntegerValue[DecimalIntegerLiteral]
RationalValue[Mantissa DecimalIntegerLiteral .] = IntegerValue[DecimalIntegerLiteral]
RationalValue[Mantissa DecimalIntegerLiteral . Fraction]
= IntegerValue[DecimalIntegerLiteral] + RationalValue[Fraction]
RationalValue[Mantissa . Fraction] = RationalValue[Fraction]
action IntegerValue[DecimalIntegerLiteral] : Integer
IntegerValue[DecimalIntegerLiteral 0] = 0
IntegerValue[DecimalIntegerLiteral NonZeroDecimalDigits]
= IntegerValue[NonZeroDecimalDigits]
action IntegerValue[NonZeroDecimalDigits] : Integer
IntegerValue[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit]
IntegerValue[NonZeroDecimalDigits NonZeroDecimalDigits1 ASCIIDigit]
= 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[ASCIIDigit]
action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)
action RationalValue[Fraction] : Rational
RationalValue[Fraction DecimalDigits]
= IntegerValue[DecimalDigits]/expt(10, NDigits[DecimalDigits])
action IntegerValue[SignedInteger] : Integer
IntegerValue[SignedInteger DecimalDigits] = IntegerValue[DecimalDigits]
IntegerValue[SignedInteger + DecimalDigits] = IntegerValue[DecimalDigits]
IntegerValue[SignedInteger - DecimalDigits] = -IntegerValue[DecimalDigits]
action IntegerValue[DecimalDigits] : Integer
IntegerValue[DecimalDigits ASCIIDigit] = DecimalValue[ASCIIDigit]
IntegerValue[DecimalDigits DecimalDigits1 ASCIIDigit]
= 10*IntegerValue[DecimalDigits1] + DecimalValue[ASCIIDigit]
action NDigits[DecimalDigits] : Integer
NDigits[DecimalDigits ASCIIDigit] = 1
NDigits[DecimalDigits DecimalDigits1 ASCIIDigit] = NDigits[DecimalDigits1] + 1
action IntegerValue[HexIntegerLiteral] : Integer
IntegerValue[HexIntegerLiteral 0 LetterX HexDigit] = HexValue[HexDigit]
IntegerValue[HexIntegerLiteral HexIntegerLiteral1 HexDigit]
= 16*IntegerValue[HexIntegerLiteral1] + HexValue[HexDigit]
action HexValue[HexDigit] : Integer = digitValue(HexDigit)
action StringValue[StringLiteral] : String
StringValue[StringLiteral ' StringCharssingle '] = StringValue[StringCharssingle]
StringValue[StringLiteral " StringCharsdouble "] = StringValue[StringCharsdouble]
action StringValue[StringCharsq] : String
StringValue[StringCharsq «empty»] = “”
StringValue[StringCharsq StringCharsq1 StringCharq]
= StringValue[StringCharsq1] [CharacterValue[StringCharq]]
action CharacterValue[StringCharq] : Character
CharacterValue[StringCharq LiteralStringCharq] = LiteralStringCharq
CharacterValue[StringCharq \ StringEscape] = CharacterValue[StringEscape]
action CharacterValue[StringEscape] : Character
CharacterValue[StringEscape ControlEscape] = CharacterValue[ControlEscape]
CharacterValue[StringEscape ZeroEscape] = CharacterValue[ZeroEscape]
CharacterValue[StringEscape HexEscape] = CharacterValue[HexEscape]
CharacterValue[StringEscape IdentityEscape] = IdentityEscape
action CharacterValue[ControlEscape] : Character
CharacterValue[ControlEscape b] = ‘«BS»’
CharacterValue[ControlEscape f] = ‘«FF»’
CharacterValue[ControlEscape n] = ‘«LF»’
CharacterValue[ControlEscape r] = ‘«CR»’
CharacterValue[ControlEscape t] = ‘«TAB»’
CharacterValue[ControlEscape v] = ‘«VT»’
action CharacterValue[ZeroEscape] : Character
CharacterValue[ZeroEscape 0 [lookahead{ASCIIDigit}]] = ‘«NUL»’
action CharacterValue[HexEscape] : Character
CharacterValue[HexEscape x HexDigit1 HexDigit2]
= codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])
CharacterValue[HexEscape u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
= codeToCharacter(
4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
HexValue[HexDigit4])
action REValue[RegExpLiteral] : RegExp
REValue[RegExpLiteral RegExpBody RegExpFlags]
= reBody REBody[RegExpBody], reFlags REFlags[RegExpFlags]
action REFlags[RegExpFlags] : String
REFlags[RegExpFlags «empty»] = “”
REFlags[RegExpFlags RegExpFlags1 ContinuingIdentifierCharacter]
= REFlags[RegExpFlags1] [CharacterValue[ContinuingIdentifierCharacter]]
action REBody[RegExpBody] : String
REBody[RegExpBody / [lookahead{*}] RegExpChars /] = REBody[RegExpChars]
action REBody[RegExpChars] : String
REBody[RegExpChars RegExpChar] = REBody[RegExpChar]
REBody[RegExpChars RegExpChars1 RegExpChar]
= REBody[RegExpChars1] REBody[RegExpChar]
action REBody[RegExpChar] : String
REBody[RegExpChar OrdinaryRegExpChar] = [OrdinaryRegExpChar]
REBody[RegExpChar \ NonTerminator] = [‘\’, NonTerminator]
|
JavaScript 2.0
Formal Description
Regular Expression Grammar
|
Thursday, November 11, 1999
This LR(1) grammar describes the regular expression syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.
This document is also available as a Word 98 rtf file.
*+?.\ AtomEscapeA | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Za | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z|
JavaScript 2.0
Formal Description
Regular Expression Semantics
|
Thursday, November 11, 1999
The regular expression semantics describe the actions the regular expression engine takes in order to transform a regular expression pattern into a function for matching against input strings. For convenience, the regular expression grammar is repeated here. See also the description of the semantic notation.
This document is also available as a Word 98 rtf file.
The regular expression semantics below are working (except for case-insensitive matches) and have been tried on sample cases, but they could be formatted better.
type SemanticException = oneof {syntaxError}
lineTerminators : {Character} = {‘«LF»’, ‘«CR»’, ‘«u2028»’, ‘«u2029»’}
reWhitespaces : {Character} = {‘«FF»’, ‘«LF»’, ‘«CR»’, ‘«TAB»’, ‘«VT»’, ‘ ’}
reDigits : {Character} = {‘0’ ... ‘9’}
reWordCharacters : {Character} = {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’, ‘_’}
type REInput = tuple {str: String; ignoreCase: Boolean; multiline: Boolean}
Field str is the input string. ignoreCase and multiline are the corresponding regular expression flags.
type REResult = oneof {success: REMatch; failure}
type REMatch = tuple {endIndex: Integer; captures: Capture[]}
A REMatch holds an intermediate state during the pattern-matching process. endIndex is the index of the next input character to be matched by the next component in a regular expression pattern. If we are at the end of the pattern, endIndex is one plus the index of the last matched input character. captures is a zero-based array of the strings captured so far by capturing parentheses.
type Capture = oneof {present: String; absent}
type Continuation = REMatch REResult
A Continuation is a function that attempts to match the remaining portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. If a match is possible, it returns a success result that contains the final REMatch state; if no match is possible, it returns a failure result.
type Matcher = REInput REMatch Continuation REResult
A Matcher is a function that attempts to match a middle portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. Since the remainder of the pattern heavily influences whether (and how) a middle portion will match, we must pass in a Continuation function that checks whether the rest of the pattern matched. If the continuation returns failure, the matcher function may call it repeatedly, trying various alternatives at pattern choice points.
The REInput parameter contains the input string and is merely passed down to subroutines.
type MatcherGenerator = Integer Matcher
A MatcherGenerator is a function executed at the time the regular expression is compiled that returns a Matcher for a part of the pattern. The Integer parameter contains the number of capturing left parentheses seen so far in the pattern and is used to assign static, consecutive numbers to capturing parentheses.
characterSetMatcher(acceptanceSet: {Character}, invert: Boolean) : Matcher
= function(t: REInput, x: REMatch, c: Continuation)
let i: Integer = x.endIndex;
s: String = t.str
in if i = |s|
then failure
else if s[i] acceptanceSet xor invert
then c(endIndex (i + 1), captures x.captures)
else failure
characterSetMatcher returns a Matcher that matches a single input string character. If invert is false, the match succeeds if the character is a member of the acceptanceSet set of characters (possibly ignoring case). If invert is true, the match succeeds if the character is not a member of the acceptanceSet set of characters (possibly ignoring case).
characterMatcher(ch: Character) : Matcher = characterSetMatcher({ch}, false)
characterMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is the same as ch (possibly ignoring case).
action Exec[RegularExpressionPattern] : REInput Integer REResult
Exec[RegularExpressionPattern Disjunction]
= let match: Matcher = GenMatcher[Disjunction](0)
in function(t: REInput, index: Integer)
match(
t,
endIndex index, captures fillCapture(CountParens[Disjunction]),
successContinuation)
successContinuation(x: REMatch) : REResult = success x
fillCapture(i: Integer) : Capture[]
= if i = 0
then []Capture
else fillCapture(i - 1) [absent]
action GenMatcher[Disjunction] : MatcherGenerator
GenMatcher[Disjunction Alternative] = GenMatcher[Alternative]
GenMatcher[Disjunction Alternative | Disjunction1](parenIndex: Integer)
= let match1: Matcher = GenMatcher[Alternative](parenIndex);
match2: Matcher = GenMatcher[Disjunction1](parenIndex + CountParens[Alternative])
in function(t: REInput, x: REMatch, c: Continuation)
case match1(t, x, c) of
success(y: REMatch): success y;
failure: match2(t, x, c)
end
action CountParens[Disjunction] : Integer
CountParens[Disjunction Alternative] = CountParens[Alternative]
CountParens[Disjunction Alternative | Disjunction1]
= CountParens[Alternative] + CountParens[Disjunction1]
action GenMatcher[Alternative] : MatcherGenerator
GenMatcher[Alternative «empty»](parenIndex: Integer)
= function(t: REInput, x: REMatch, c: Continuation)
c(x)
GenMatcher[Alternative Alternative1 Term](parenIndex: Integer)
= let match1: Matcher = GenMatcher[Alternative1](parenIndex);
match2: Matcher = GenMatcher[Term](parenIndex + CountParens[Alternative1])
in function(t: REInput, x: REMatch, c: Continuation)
let d: Continuation
= function(y: REMatch)
match2(t, y, c)
in match1(t, x, d)
action CountParens[Alternative] : Integer
CountParens[Alternative «empty»] = 0
CountParens[Alternative Alternative1 Term]
= CountParens[Alternative1] + CountParens[Term]
action GenMatcher[Term] : MatcherGenerator
GenMatcher[Term Assertion](parenIndex: Integer)
= function(t: REInput, x: REMatch, c: Continuation)
if TestAssertion[Assertion](t, x)
then c(x)
else failure
GenMatcher[Term Atom] = GenMatcher[Atom]
GenMatcher[Term Atom Quantifier](parenIndex: Integer)
= let match: Matcher = GenMatcher[Atom](parenIndex);
min: Integer = Minimum[Quantifier];
max: Limit = Maximum[Quantifier];
greedy: Boolean = Greedy[Quantifier]
in if
(case max of
finite(m: Integer): m < min;
infinite: false
end)
then throw syntaxError
else repeatMatcher(match, min, max, greedy, parenIndex, CountParens[Atom])
action CountParens[Term] : Integer
CountParens[Term Assertion] = 0
CountParens[Term Atom] = CountParens[Atom]
CountParens[Term Atom Quantifier] = CountParens[Atom]
*+?type Limit = oneof {finite: Integer; infinite}
resetParens(x: REMatch, p: Integer, nParens: Integer) : REMatch
= if nParens = 0
then x
else let y: REMatch = endIndex x.endIndex, captures x.captures[p absent]
in resetParens(y, p + 1, nParens - 1)
repeatMatcher(body: Matcher, min: Integer, max: Limit, greedy: Boolean, parenIndex: Integer, nBodyParens: Integer)
: Matcher
= function(t: REInput, x: REMatch, c: Continuation)
if
(case max of
finite(m: Integer): m = 0;
infinite: false
end)
then c(x)
else let d: Continuation
= function(y: REMatch)
if min = 0 and y.endIndex = x.endIndex
then failure
else let newMin: Integer
= if min = 0
then 0
else min - 1;
newMax: Limit
= case max of
finite(m: Integer): finite (m - 1);
infinite: infinite
end
in repeatMatcher(
body,
newMin,
newMax,
greedy,
parenIndex,
nBodyParens)(t, y, c);
xr: REMatch = resetParens(x, parenIndex, nBodyParens)
in if min 0
then body(t, xr, d)
else if greedy
then case body(t, xr, d) of
success(z: REMatch): success z;
failure: c(x)
end
else case c(x) of
success(z: REMatch): success z;
failure: body(t, xr, d)
end
action Minimum[Quantifier] : Integer
Minimum[Quantifier QuantifierPrefix] = Minimum[QuantifierPrefix]
Minimum[Quantifier QuantifierPrefix ?] = Minimum[QuantifierPrefix]
action Maximum[Quantifier] : Limit
Maximum[Quantifier QuantifierPrefix] = Maximum[QuantifierPrefix]
Maximum[Quantifier QuantifierPrefix ?] = Maximum[QuantifierPrefix]
action Greedy[Quantifier] : Boolean
Greedy[Quantifier QuantifierPrefix] = true
Greedy[Quantifier QuantifierPrefix ?] = false
action Minimum[QuantifierPrefix] : Integer
Minimum[QuantifierPrefix *] = 0
Minimum[QuantifierPrefix +] = 1
Minimum[QuantifierPrefix ?] = 0
Minimum[QuantifierPrefix { DecimalDigits }] = IntegerValue[DecimalDigits]
Minimum[QuantifierPrefix { DecimalDigits , }] = IntegerValue[DecimalDigits]
Minimum[QuantifierPrefix { DecimalDigits1 , DecimalDigits2 }]
= IntegerValue[DecimalDigits1]
action Maximum[QuantifierPrefix] : Limit
Maximum[QuantifierPrefix *] = infinite
Maximum[QuantifierPrefix +] = infinite
Maximum[QuantifierPrefix ?] = finite 1
Maximum[QuantifierPrefix { DecimalDigits }] = finite IntegerValue[DecimalDigits]
Maximum[QuantifierPrefix { DecimalDigits , }] = infinite
Maximum[QuantifierPrefix { DecimalDigits1 , DecimalDigits2 }]
= finite IntegerValue[DecimalDigits2]
action IntegerValue[DecimalDigits] : Integer
IntegerValue[DecimalDigits DecimalDigit] = DecimalValue[DecimalDigit]
IntegerValue[DecimalDigits DecimalDigits1 DecimalDigit]
= 10*IntegerValue[DecimalDigits1] + DecimalValue[DecimalDigit]
action DecimalValue[DecimalDigit] : Integer = digitValue(DecimalDigit)
action TestAssertion[Assertion] : REInput REMatch Boolean
TestAssertion[Assertion ^](t: REInput, x: REMatch)
= if x.endIndex = 0
then true
else t.multiline and t.str[x.endIndex - 1] lineTerminators
TestAssertion[Assertion $](t: REInput, x: REMatch)
= if x.endIndex = |t.str|
then true
else t.multiline and t.str[x.endIndex] lineTerminators
TestAssertion[Assertion \ b](t: REInput, x: REMatch)
= atWordBoundary(x.endIndex, t.str)
TestAssertion[Assertion \ B](t: REInput, x: REMatch)
= not atWordBoundary(x.endIndex, t.str)
atWordBoundary(i: Integer, s: String) : Boolean = inWord(i - 1, s) xor inWord(i, s)
inWord(i: Integer, s: String) : Boolean
= if i = -1 or i = |s|
then false
else s[i] reWordCharacters
.\ AtomEscapeaction GenMatcher[Atom] : MatcherGenerator
GenMatcher[Atom PatternCharacter](parenIndex: Integer)
= characterMatcher(PatternCharacter)
GenMatcher[Atom .](parenIndex: Integer) = characterSetMatcher(lineTerminators, true)
GenMatcher[Atom \ AtomEscape] = GenMatcher[AtomEscape]
GenMatcher[Atom CharacterClass](parenIndex: Integer)
= let a: {Character} = AcceptanceSet[CharacterClass]
in characterSetMatcher(a, Invert[CharacterClass])
GenMatcher[Atom ( Disjunction )](parenIndex: Integer)
= let match: Matcher = GenMatcher[Disjunction](parenIndex + 1)
in function(t: REInput, x: REMatch, c: Continuation)
let d: Continuation
= function(y: REMatch)
let updatedCaptures: Capture[]
= y.captures[parenIndex
present t.str[x.endIndex ... y.endIndex - 1]]
in c(endIndex y.endIndex, captures updatedCaptures)
in match(t, x, d)
GenMatcher[Atom ( ? : Disjunction )] = GenMatcher[Disjunction]
GenMatcher[Atom ( ? = Disjunction )](parenIndex: Integer)
= let match: Matcher = GenMatcher[Disjunction](parenIndex)
in function(t: REInput, x: REMatch, c: Continuation)
case match(t, x, successContinuation) of
success(y: REMatch): c(endIndex x.endIndex, captures y.captures);
failure: failure
end
GenMatcher[Atom ( ? ! Disjunction )](parenIndex: Integer)
= let match: Matcher = GenMatcher[Disjunction](parenIndex)
in function(t: REInput, x: REMatch, c: Continuation)
case match(t, x, successContinuation) of
success(y: REMatch): failure;
failure: c(x)
end
action CountParens[Atom] : Integer
CountParens[Atom PatternCharacter] = 0
CountParens[Atom .] = 0
CountParens[Atom \ AtomEscape] = 0
CountParens[Atom CharacterClass] = 0
CountParens[Atom ( Disjunction )] = CountParens[Disjunction] + 1
CountParens[Atom ( ? : Disjunction )] = CountParens[Disjunction]
CountParens[Atom ( ? = Disjunction )] = CountParens[Disjunction]
CountParens[Atom ( ? ! Disjunction )] = CountParens[Disjunction]
action GenMatcher[AtomEscape] : MatcherGenerator
GenMatcher[AtomEscape DecimalEscape](parenIndex: Integer)
= let n: Integer = EscapeValue[DecimalEscape]
in if n = 0
then characterMatcher(‘«NUL»’)
else if n > parenIndex
then throw syntaxError
else backreferenceMatcher(n)
GenMatcher[AtomEscape CharacterEscape](parenIndex: Integer)
= characterMatcher(CharacterValue[CharacterEscape])
GenMatcher[AtomEscape CharacterClassEscape](parenIndex: Integer)
= characterSetMatcher(AcceptanceSet[CharacterClassEscape], false)
backreferenceMatcher(n: Integer) : Matcher
= function(t: REInput, x: REMatch, c: Continuation)
case nthBackreference(x, n) of
present(ref: String):
let i: Integer = x.endIndex;
s: String = t.str
in let j: Integer = i + |ref|
in if j > |s|
then failure
else if s[i ... j - 1] = ref
then c(endIndex j, captures x.captures)
else failure;
absent: c(x)
end
nthBackreference(x: REMatch, n: Integer) : Capture = x.captures[n - 1]
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Za | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | zaction CharacterValue[CharacterEscape] : Character
CharacterValue[CharacterEscape ControlEscape] = CharacterValue[ControlEscape]
CharacterValue[CharacterEscape c ControlLetter]
= codeToCharacter(bitwiseAnd(characterToCode(ControlLetter), 31))
CharacterValue[CharacterEscape HexEscape] = CharacterValue[HexEscape]
CharacterValue[CharacterEscape IdentityEscape] = IdentityEscape
action CharacterValue[ControlEscape] : Character
CharacterValue[ControlEscape f] = ‘«FF»’
CharacterValue[ControlEscape n] = ‘«LF»’
CharacterValue[ControlEscape r] = ‘«CR»’
CharacterValue[ControlEscape t] = ‘«TAB»’
CharacterValue[ControlEscape v] = ‘«VT»’
action EscapeValue[DecimalEscape] : Integer
EscapeValue[DecimalEscape DecimalIntegerLiteral [lookahead{DecimalDigit}]]
= IntegerValue[DecimalIntegerLiteral]
action IntegerValue[DecimalIntegerLiteral] : Integer
IntegerValue[DecimalIntegerLiteral 0] = 0
IntegerValue[DecimalIntegerLiteral NonZeroDecimalDigits]
= IntegerValue[NonZeroDecimalDigits]
action IntegerValue[NonZeroDecimalDigits] : Integer
IntegerValue[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit]
IntegerValue[NonZeroDecimalDigits NonZeroDecimalDigits1 DecimalDigit]
= 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[DecimalDigit]
action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)
action CharacterValue[HexEscape] : Character
CharacterValue[HexEscape x HexDigit1 HexDigit2]
= codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])
CharacterValue[HexEscape u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
= codeToCharacter(
4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
HexValue[HexDigit4])
action HexValue[HexDigit] : Integer = digitValue(HexDigit)
action AcceptanceSet[CharacterClassEscape] : {Character}
AcceptanceSet[CharacterClassEscape s] = reWhitespaces
AcceptanceSet[CharacterClassEscape S] = {‘«NUL»’ ... ‘«uFFFF»’} - reWhitespaces
AcceptanceSet[CharacterClassEscape d] = reDigits
AcceptanceSet[CharacterClassEscape D] = {‘«NUL»’ ... ‘«uFFFF»’} - reDigits
AcceptanceSet[CharacterClassEscape w] = reWordCharacters
AcceptanceSet[CharacterClassEscape W] = {‘«NUL»’ ... ‘«uFFFF»’} - reWordCharacters
action AcceptanceSet[CharacterClass] : {Character}
AcceptanceSet[CharacterClass [ [lookahead{^}] ClassRanges ]]
= AcceptanceSet[ClassRanges]
AcceptanceSet[CharacterClass [ ^ ClassRanges ]] = AcceptanceSet[ClassRanges]
action Invert[CharacterClass] : Boolean
Invert[CharacterClass [ [lookahead{^}] ClassRanges ]] = false
Invert[CharacterClass [ ^ ClassRanges ]] = true
action AcceptanceSet[ClassRanges] : {Character}
AcceptanceSet[ClassRanges «empty»] = {}Character
AcceptanceSet[ClassRanges NonemptyClassRangesdash]
= AcceptanceSet[NonemptyClassRangesdash]
action AcceptanceSet[NonemptyClassRangesd] : {Character}
AcceptanceSet[NonemptyClassRangesd ClassAtomdash] = AcceptanceSet[ClassAtomdash]
AcceptanceSet[NonemptyClassRangesd ClassAtomd NonemptyClassRangesnoDash1]
= AcceptanceSet[ClassAtomd] AcceptanceSet[NonemptyClassRangesnoDash1]
AcceptanceSet[NonemptyClassRangesd ClassAtomd1 - ClassAtomdash2 ClassRanges]
= let range: {Character}
= characterRange(AcceptanceSet[ClassAtomd1], AcceptanceSet[ClassAtomdash2])
in range AcceptanceSet[ClassRanges]
characterRange(low: {Character}, high: {Character}) : {Character}
= if |low| 1 or |high| 1
then throw syntaxError
else let l: Character = min low;
h: Character = min high
in if l h
then {l ... h}
else throw syntaxError
action AcceptanceSet[ClassAtomd] : {Character}
AcceptanceSet[ClassAtomd ClassCharacterd] = {ClassCharacterd}
AcceptanceSet[ClassAtomd \ ClassEscape] = AcceptanceSet[ClassEscape]
action AcceptanceSet[ClassEscape] : {Character}
AcceptanceSet[ClassEscape DecimalEscape]
= if EscapeValue[DecimalEscape] = 0
then {‘«NUL»’}
else throw syntaxError
AcceptanceSet[ClassEscape b] = {‘«BS»’}
AcceptanceSet[ClassEscape CharacterEscape] = {CharacterValue[CharacterEscape]}
AcceptanceSet[ClassEscape CharacterClassEscape] = AcceptanceSet[CharacterClassEscape]
|
JavaScript 2.0
Formal Description
Parser Grammar
|
Tuesday, February 15, 2000
This LALR(1) grammar describes the syntax of the JavaScript 2.0 proposal. The starting nonterminal is Program. See also the description of the grammar notation.
This document is also available as a Word 98 rtf file.
General tokens: Identifier Number RegularExpression String VirtualSemicolon
Punctuation tokens: !
!= !==
% %=
& &&
&&=
&= (
) *
*= +
++ +=
, -
-- -=
. ...
/ /=
: ::
; <
<< <<=
<= =
== ===
> >=
>> >>=
>>>
>>>= ?
@ [
] ^
^= ^^
^^= {
| |=
|| ||=
} ~
Future punctuation tokens: #
->
Reserved words: break
case catch
class const
continue default
delete do
else eval
extends false
final finally
for function
if in
instanceof new
null package
private public
return super
switch this
throw true
try typeof
var while
with
Future reserved words: abstract
debugger enum
export goto
implements import
interface
native protected
synchronized
throws transient
volatile
Non-reserved words: get
language set
nulltruefalsethissuper++--delete PostfixExpressiontypeof UnaryExpressioneval UnaryExpression++ PostfixExpression-- PostfixExpression+ UnaryExpression- UnaryExpression~ UnaryExpression! UnaryExpression;;;;;;if ParenthesizedExpression StatementabbrevNoShortIf else StatementabbrevNoShortIfget [no line break] Attributesset [no line break] Attributeslanguage [no line break] AttributesThe third through sixth Attributes productions are merely the result of manually inlining the Identifier rule inside Attributes Identifier [no line break] Attributes. Without manually inlining the Identifier rule here the grammar would not be LR(1).
get [no line break] Identifierset [no line break] Identifiernew [no line break] Identifiernew...;;get LanguageIdsRestset LanguageIdsRestlanguage LanguageIdsRestThe first through fourth LanguageIds productions are merely the result of manually inlining the Identifier rule inside LanguageIds Identifier LanguageIdsRest. Without manually inlining the Identifier rule here the grammar would not be LR(1).
|
JavaScript 2.0
Rationale
|
Thursday, November 11, 1999
This chapter discusses the decisions made in desigining JavaScript 2.0. Rationales are presented together with descriptions of other alternatives that were/are considered. Currently outstanding issues are in red.
|
JavaScript 2.0
Rationale
Syntax
|
Tuesday, February 15, 2000
The term semicolon insertion informally refers to the ability to write programs while omitting semicolons between statements. In both JavaScript 1.5 and JavaScript 2.0 there are two kinds of semicolon insertion:
} and the end of the program are optional in both JavaScript
1.5 and 2.0. In addition, the JavaScript 2.0 parser allows semicolons to be omitted before the else
of an if-else statement and before the while of a do-while
statement.Grammatical semicolon insertion is implemented directly by the parser grammar's productions, which simply do not require a semicolon in the aforementioned cases. Line breaks in the source code are not relevant to grammatical semicolon insertion.
Line-break semicolon insertion cannot be easily implemented in the parser's grammar. This kind of semicolon insertion turns a syntactically incorrect program into a correct program and relies on line breaks in the source code.
Grammatical semicolon insertion is harmless. On the other hand, line-break semicolon insertion suffers from the following problems:
The first problem presents difficulty for some preprocessors such as the one for XML attributes which turn line breaks into spaces. The second and third ones are more serious. Users are confused when they discover that the program
a = b + c (d + e).print()
doesn't do what they expect:
a = b + c; (d + e).print();
Instead, that program is parsed as:
a = b + c(d + e).print();
The third problem is the most serious. New features are added to the language turn illegal syntax into legal syntax. If
an existing program relies on the illegal syntax to trigger line-break semicolon insertion, then the program will silently
change behavior once the feature is added. For example, the juxtaposition of a numeric literal followed by a string literal
(such as 4 "in") is illegal in JavaScript 1.5. JavaScript 2.0 makes this legal syntax for expressions with
units. This syntax extension has the unfortunate consequence of silently changing the meaning of the following JavaScript
1.5 program:
a = b + 4 "in".print()
from:
a = b + 4; "in".print();
to:
a = b + 4"in".print();
JavaScript 2.0 gets around this incompatibility by adding a [no line break] restriction in the grammar that requires the numeric and string literals to be on the same line. Unfortunately, this compatibility is a double-edged sword. Due to JavaScript 1.5 compatibility, JavaScript 2.0 has to have a large number of these [no line break] restrictions. It is hard to remember all of them, and forgetting one of them often silently causes a JavaScript 2.0 program to be reinterpreted. Users will be dismayed to find that:
local
function f(x) {return x*x}
turns into:
local;
function f(x) {return x*x}
(where local; is an expression statement) instead of:
local function f(x) {return x*x}
An earlier version of JavaScript 2.0 disallowed line-break semicolon insertion. The current version allows it but only in non-strict mode. Strict mode removes all [no line break] restrictions, simplifying the language again. As a side effect, it is possible to write a program that does different things in strict and non-strict modes (the last example above is one such program), but this is the price to pay to achieve simplicity.
JavaScript 2.0 retains compatibility with JavaScript 1.5 by adopting the same rules for detecting regular expression literals. This complicates the design of programs such as syntax-directed text editors and machine scanners because it makes it impossible to find all of the tokens in a JavaScript program without parsing the program.
Making JavaScript 2.0's lexical grammar independent of its syntactic grammar significantly would have allowed tools to
easily process a JavaScript program and escape all instances of, say, </ to properly embed a JavaScript 2.0
or later program in an HTML page. The full parser changes for each version of JavaScript. To illustrate the difficulties,
compare such JavaScript 1.5 gems as:
for (var x = a in foo && "</x>" || mot ? z:/x:3;x<5;y</g/i) {xyz(x++);}
for (var x = a in foo && "</x>" || mot ? z/x:3;x<5;y</g/i) {xyz(x++);}
One idea explored early in the design of JavaScript 2.0 was providing an alternate, unambiguous syntax for regular expressions
and encouraging the use of the new syntax. A RegularExpression could have been specified unambiguously
using « and » as its opening and closing delimiters instead of / and /.
For example, «3*» would be a regular expression that matches zero or more 3's. Such
a regular expression could be empty: «» is a regular expression that matches only the empty string,
while // starts a comment. To write such a regular expression using the slash syntax one needs to write /(?:)/.
Syntactic resynchronization occurs when the lexer needs to find the end of a block (the matching })
in order to skip a portion of a program written in a future version of JavaScript. Ordinarily this would not be a problem,
but regular expressions complicate matters because they make lexing dependent on parsing. The rules for recognizing regular
expression literals must be changed for those portions of the program. The rule below might work, or a simplified parse might
be performed on the input to determine the locations of regular expressions. This is an area that needs
further work.
During syntax resynchronization JavaScript 2.0 determines whether a / starts a regular expression or is a
division (or /=) operator solely based on the previous token:
/ interpretation |
Previous token |
|---|---|
/ or /= |
Identifier Number RegularExpression
String) ++ --
] }false null super
this trueconstructor getter method
override setter traditional
versionAny other punctuation |
| RegularExpression | ! != !==
# % %=
& && &&=
&= ( *
*= + +=
, - -=
-> . ..
... / /=
: :: ;
< << <<=
<= = ==
=== > >=
>> >>= >>>
>>>= ? @
[ ^ ^=
^^ ^^= {
| |= ||
||= ~abstract break case
catch class const
continue debugger default
delete do else
enum eval export
extends field final
finally for function
goto if implements
import in instanceof
native new package
private protected public
return static switch
synchronized throw throws
transient try typeof
var volatile while
with |
Regardless of the previous token, // is interpreted as the beginning of a comment.
The only controversial choices are ) and }. A /
after either a ) or } token can be either a division
symbol (if the ) or } closes a subexpression or an
object literal) or a regular expression token (if the ) or }
closes a preceding statement or an if, while, or for expression). Having /
be interpreted as a RegularExpression in expressions such as (x+y)/2 would be problematic,
so it is interpreted as a division operator after ) or }.
If one wants to place a regular expression literal at the very beginning of an expression statement, it's best to put the
regular expression in parentheses. Fortunately, this is not common since one usually assigns the result of the regular expression
operation to a variable.
The current JavaScript 2.0 proposal uses Pascal-style colons to introduce types in declarations. For example:
var x:integer = 7;
function square(a:number):number {return a*a}
This is due to a consensus decision of the ECMA working group, with Waldemar the only dissenter.
We could allow modified C-style type declarations as long as a function's return type is listed after its parameters:
var integer x = 7;
function square(number a) number {return a*a}
A function's return type cannot be listed before the parameters because this would make the grammar ambiguous.
In fact, an implementation could unambiguously admit both the Pascal-style and the modified C-style declarations by replacing the TypedIdentifierb and ResultSignature grammar rules with the ones listed below. The resulting grammar is still LALR(1).
: TypeExpressionb: TypeExpressionallowIn{}] TypeExpressionallowInA few advantages of using the modified C-style syntax:
{a:17, b:33}).
The latter would present a conundrum if we ever wanted to declare field types in an object literal. Some users have been
using these as a convenient facility for passing named arguments to functions.We could define other useful type operators such as union, intersection, and difference as listed in the table below. s and t are type expressions.
| Type | Values | Coercion of value v |
|---|---|---|
s + t |
All values belonging to either type s or type t or both | If vs+t,
then use v; otherwise, if v@s is defined then use v@s;
otherwise, if v@t is defined then use v@t. |
s * t |
All values simultaneously belonging to both type s and type t | If v@s@t is defined and is a member of s*t,
then use v@s@t. |
s / t |
All values belonging to type s but not type t | If v@s is defined and is a member of s/t,
then use v@s. |
The following subtype and type equivalence relations hold. r, s, and t represent arbitrary types.
s s +
t |
s * t
s |
t + t = t |
t * t = t |
(r + s) + t = r +
(s + t) |
(r * s) * t = r *
(s * t) |
none t |
t any |
JavaScript 2.0 uses the same syntax for type expressions as for value expressions for the following reasons:
(expr1)(expr2),
is expr1 a type or a value expression? If the two have the same syntax, it doesn't matter.An alternative to language declarations that was considered early was to report syntax errors at the time the relevant
statement was executed rather than at the time it was parsed. This way a single program could include parts written in a future
version of JavaScript without getting an error unless it tries to execute those portions on a system that does not understand
that version of JavaScript. If a program part that contains an error is never executed, the error never breaks the script.
For example, the following function finishes successfully if whizBangFeature is false:
function move(integer x, integer y, integer d) {
x += 10;
y += 3;
if (whizBangFeature) {
simulate{@x and #y} along path
} else {
x += d; y += d;
}
return [x,y];
}
The code simulate{@x and #y} along path is a syntax error, but this error does not break the script unless
the script attempts to execute that piece of code.
One problem with this approach is that it frustrates debugging; a script author benefits from knowing about syntax errors at compile time rather than at run time.
|
JavaScript 2.0
Rationale
Execution Model
|
Thursday, November 11, 1999
When does a declaration (of a value, function, type, class, method, pragma, etc.) take effect? When are expressions evaluated? The answers to these questions distinguish among major kinds of programming languages. Let's consider the following function definition in a language with C++ or Java-like syntax:
gadget f(widget x) {
if ((gizmo)(x) != null)
return (gizmo)(x);
return x.owner;
}
In a static language such as Java or C++, all type expressions are evaluated at compile time. Thus, in this example widget
and gadget would be evaluated at compile time. If gizmo were a type, then it too would be evaluated
at compile time ((gizmo)(x) would become a type cast). Note that we must be able to statically distinguish identifiers
used for variables from identifiers used for types so we can decide whether (gizmo)(x) is a one-argument function
call (in which case gizmo would be evaluated at run time) or a type cast (in which case gizmo would
be evaluated at compile time). In most cases, in a static language a declaration is visible throughout its enclosing scope,
although there are exceptions that have been deemed too complicated for a compiler to handle such as the following C++:
typedef int *x;
class foo {
typedef x *y;
typedef char *x;
}
Many dynamic languages can construct, evaluate, and manipulate type expressions at run time. Some dynamic languages (such
as Common Lisp) distinguish between compile time and run time and provide constructs (eval-when) to evaluate
expressions early. The simplest dynamic languages (such as Scheme) process input in a single pass and do not distinguish between
compile time and run time. If we evaluated the above function in such a simple language, widget and gadget
would be evaluated at the time the function is called.
JavaScript is a scripting language. Many programmers wish to write JavaScript scripts embedded in web pages that work in a variety of environments. Some of these environments may provide libraries that a script would like to use, while on other environments the script may have to emulate those libraries. Let's take a look at an example of something one would expect to be able to easily do in a scripting language:
Bob is writing a script for a web page that wants to take advantage of an optional package MacPack that is
present on some environments (Macintoshes) but not on others. MacPack provides a class HyperWindoid
from which Bob wants to subclass his own class BobWindoid. On other platforms Bob has to define an emulation
class BobWindoid' that is implemented differently from BobWindoid -- it has a different set of private
methods and fields. There also is a class WindoidGuide in Bob's package; the code and method signatures of classes
BobWindoid and BobWindoid' refer to objects of type WindoidGuide, and class WindoidGuide's
code refers to objects of type BobWindoid (or BobWindoid' as appropriate).
Were JavaScript to use a dynamic execution model (described below), declarations take effect only when executed, and Bob
can implement his package as shown below. The package keyword in front of both definitions of class BobWindoid
lifts these definitions from the local if scope to the top level of Bob's package.
class WindoidGuide; // forward declaration
if (onMac()) {
import "MacPack";
package class BobWindoid extends HyperWindoid {
private field x;
field g:WindoidGuide;
private method speck() {...};
public method zoom(a:WindoidGuide, uncle:HyperWindoid = null):WindoidGuide {...};
}
} else {
// emulation class BobWindoid'
package class BobWindoid {
private field i:integer, j:integer;
field g:WindoidGuide;
private method advertise(h:WindoidGuide):WindoidGuide {...};
private method subscribe(h:WindoidGuide):WindoidGuide {...};
public method zoom(a:WindoidGuide):WindoidGuide {...};
}
}
class WindoidGuide {
field currentWindoid:BobWindoid;
method introduce(arg:BobWindoid):BobWindoid {...};
}
On the other hand, if the language were static (meaning that types are compile-time expressions), Bob would run into problems.
How could he declare the two alternatives for the class BobWindoid?
Bob's first thought was to split his package into three HTML SCRIPT tags (containing BobWindoid,
BobWindoid', and WindoidGuide) and turn one of the first two off depending on the platform. Unfortunately
this doesn't work because he gets type errors if he separates the definition of class BobWindoid (or BobWindoid')
from the definition of WindoidGuide because these classes mutually refer to each other. Furthermore, Bob would
like to share the script among many pages, so he'd like to have the entire script in a single BobUtilities.js file.
Note that this problem would be newly introduced by JavaScript 2.0 if it were to evaluate type expressions at compile time. JavaScript 1.5 does not suffer from this problem because it does not have a concept of evaluating an expression at compile time, and it is relatively easy to conditionally define a class (which is merely a function) by declaring a single global variable g and conditionally assigning either one or another anonymous function to it.
There exist other alternatives in between the dynamic execution model and the static model that also solve Bob's problem. One of them is described at the end of this chapter.
In a pure dynamic execution model the entire program is processed in one pass. Declarations take effect only when they are executed. A declaration that is never executed is ignored. Scheme follows this model, as did early versions of Visual Basic.
The dynamic execution model considerably simplifies the language and allows an interpreter to treat programs read from a file identically to programs typed in via an interactive console. Also, a dynamic execution model interpreter or just-in-time compiler may start to execute a script even before it has finished downloading all of it.
One of the most significant advantages of the dynamic execution model is that it allows JavaScript 2.0 scripts to turn parts of themselves on and off based on dynamically obtained information. For example, a script or library could define additional functions and classes if it runs on an environment that supports CSS unit arithmetic while still working on environments that do not.
The dynamic execution model requires identifiers naming functions and variables to be defined before they are used. A
use occurs when an identifier is read, written, or called, at which point that identifier is resolved to a variable or a function
according to the scoping rules. A reference from within a control statement such as if and while
located outside a function is resolved only when execution reaches the reference. References from within the body of a function
are resolved only after the function is called; for efficiency, an implementation is allowed to resolve all references within
a function or method that does not contain eval at the first time the function is called.
According to these rules, the following program is correct and would print 7:
function f(a:integer):integer {
return a+b;
}
var b:integer = 4;
print(f(3));
Assuming that variable b is predefined by the host if featurePresent is true, this program would
also work:
function f(a:integer):integer {
return a+b;
}
if (!featurePresent) {
package var b:integer = 4;
}
print(f(3));
On the other hand, the following program would produce an error because f is referenced before it is defined:
print(f(3));
function f(a:integer):integer {
return a*2;
}
Defining mutually recursive functions is not a problem as long as one defines all of them before calling them.
JavaScript 1.5 does not follow the pure dynamic execution model, and, for reasons of compatibility, JavaScript 2.0 strays from that model as well, adopting a hybrid execution model instead. Specifically, JavaScript 2.0 inherits the following static execution model aspects from JavaScript 1.5:
local prefix, variable declarations of variables at the global scope
cause the variables to be created at the time the program is entered rather than at the time the declaractions are evaluated.local prefix, variable declarations of local variables inside a function
cause the variables to be created at the time the function is entered rather than at the time the declaractions are evaluated.In addition to the above, the evaluation of class declarations has special provisions for delayed evaluation to allow mutually-referencing classes.
The second condition above allows the following program to work in JavaScript 2.0:
const b:string = "Bee";
function square(a:integer):integer {
b = a; // Refers to local b defined below, not global b
return b*a;
var b:integer;
}
While allowed, using variables ahead of declaring them, such as in the above example, is considered bad style and may generate a warning.
The third condition above makes the last example from the pure execution model section work:
print(f(3));
function f(a:integer):integer {
return a*2;
}
Again, actually calling a function at the top level before declaring it is considered bad style and may generate a warning. It also will not work with classes.
Perhaps the easiest way to compile a script under the dynamic execution model is to accumulate function definitions unprocessed and compile them only when they are first called. Many JITs do this anyway because this lets them avoid the overhead of compiling functions that are never called. This process does not impose any more of an overhead than the static model would because under the static model the compiler would need to either scan the source code twice or save all of it unprocessed during the first pass for processing in the second pass.
Compiling a dynamic execution model script off-line also does not present special difficulties as long as eval is
restricted to not introduce additional declarations that shadow existing ones (if eval is allowed to do this,
it would present problems for any execution model, including the static one). Under the dynamic execution model, once
the compiler has reached the end of a scope it can assume that that scope is complete; at that point all identifiers inside
that scope can be resolved to the same extent that they would be in the static model.
Bob's problem could also be solved by using conditional compilation similar in spirit to C's preprocessor. If we do this, we have to ask about how expressive the conditional compilation meta-language should be. C's preprocessor is too weak. In JavaScript applications we'd often find that we need the full power of JavaScript so that we can inspect the DOM, the environment, etc. when deciding how to control compilation. Besides, using JavaScript as the meta-language would reduce the number of languages that a programmer would have to learn.
Here's one sketch of how this could be done:
(x)(y) is a function call of function x or a cast of y to type
x.#
symbol. For example, #{var x:int = 3} defines a compile-time constant x and initializes
it to 3. One can also lift a var, const, or function declaration directly by
preceding it with a # symbol, so #var x:int = 3; would accomplish the same
thing.int in the preceding example is such a TypeExpression.#{#var x:int = 3}) is evaluated at
compile compile time, and so forth.# if ( Expression ) Statements [# else if ( Expression ) Statements] ... [# else Statements] # end if#'s can appear anywhere on a line.#if to conditionally exclude compile time code, etc.Note that because variable initializers are not evaluated at compile time, one has to use #var a = int rather
than var a = int to define an alias a for a type name int.
This sketch does not address many issues that would have to be resolved, such as how typed variables are handled after they are declared but before they are initialized (this problem doesn't arise in the dynamic execution model), how the lexical scopes of the run time pass would interact with scoping of the compile time pass, etc.
Both approaches solve Bob's problem, but they differ in other areas. In the sequel "conditional compilation" refers to the conditional compilation alternative described above.
|
JavaScript 2.0
Rationale
Member Lookup
|
Wednesday, February 16, 2000
There have been much discussion in the TC39 subgroup about the meaning of a member lookup operation. Numerous considerations intersect here.
We will express a general unqualified member lookup operation as a.b, where a
is an expression and b is an identifier. We will also consider qualified member lookup operations and write them
as a.n::b, where n is an expression that evaluates to
some namespace. In almost all cases we will be interested in the dynamic type Td of a. In one scheme
we will also consider the static type Ts of the expression a. If the language is sound, we will always
have Td Ts.
In the simplest approach, we treat an object as merely an association table of member names and member values. In this
interpretation we simply look inside object a and check if there is a member named b. If there is, we return the
member's value; if not, we return undefined or signal an error.
There are a number of difficulties with this simple approach, and most object-oriented languages have not adopted it:
private or package-protected.Once we allow private or package-protected members, we must allow for the possibility that object
a will have more than one member named b -- abstraction considerations require that users of a class
C not be aware of expose C's private members, so, in particular, a user should be able to create a subclass
D of C and add members to D without knowing the names of C's private members.
Both C++ and Java allow this. We must also allow for the possibility that object a will have a member named b
but we are not allowed to access it. We will assume that access control is specified by lexical scoping, as is traditional
in modern languages.
Some of the criteria we would like the member lookup model to satisfy are:
private member outside the class
where the member is defined, nor does it allow access to a package member outside the package where the member
is defined. Furthermore, if a class C accesses its private member m, a hostile subclass D
of C cannot silently substitute a member m' that would masquerade as m inside C's
code.private and package package are invisible outside
their respective classes or packages. For programming in the large, a class can provide several public versions
to its importers, and public members of more recent versions are invisible to importers of older versions.
This is needed to provide robust libraries.private, package, or public, assuming, of
course, that that member is not used outside its new visibility.There are three main competing models for performing a general unqualified member lookup operation as a.b.
Let S be the set of members named b of the object obtained by evaluating expression a (hereafter
shortened to just "object a") that are accessible via the visibility
rules applied in the lexical scope where a.b is evaluated. All three models pick some
member s S. Clearly, if the
set S is empty, then the member lookup fails. In addition, the Spice and pure Static models may sometimes deliberately
fail even when set S is not empty. Except for such deliberate failures, if the set S contains only one
member s, all three models return that element s. If the set S contains multiple members,
the three models will likely choose different members.
Another interesting (and useful) tidbit is that the Static and Dynamic models always agree on the interpretation of member
lookup operations of the form this.b. All three models agree on on the interpretation of member lookup
operations of the form this.b in the case where b is a member defined in the current class.
A note about overriding: When a subclass D overrides a member m of its superclass C, then the definition of the member m is conceptually replaced in all instances of D. However, the three models are only concerned with the topmost class in which member m is declared. All three models handle overriding the way one would expect of an object-oriented language. They differ in the cases where class C has a member named m, subclass D of C has a member with the same name m, but D's m does not override C's m because C's m is not visible inside D (it's not well known, but such non-overriding does and must happen in C++ and Java as well).
In the Static model we look at the static type Ts of expression a. Let S1 be the subset of S whose class is either Ts or one of Ts's ancestors. We pick the member in S1 with the most derived class.
The pure static model above is implemented by Java and C++. It would not work well in that form in JavaScript because many,
if not most, expressions have type Any. Because type Any has no members, users would have to cast
expression a to a given type T before they could access members of type T. Because of this
we must extend the static model to handle the case where the subset S1 is empty, or, in other words, the static
lookup fails. (Rather than doing this, we could extend the static model in the case where the static type Ts is
some special type, but then we would have to decide which types are special and which ones are not. Any is clearly
special. What about Object? What about Array? It's hard to draw the line consistently.)
In whichever cases way we extend the static model, we also have a choice of which member we choose. We could back off to the dynamic model, we could choose the most derived member in S, or perhaps we could choose some other approach.
Constraints:
| Safety | Good within the pure static model. Problems in the extended static model (a subclass could silently shadow a member) that could perhaps be addressed by warnings. |
| Abstraction | Good. |
| Robustness | Very bad. Updating a function's or global variable return type silently changes the meaning of all code that uses that function or global variable; in a large project such a change would be quite difficult. Difficult to correctly split expressions into subexpressions. |
| Namespace independence | Good. |
| Compatibility | Bad within the pure static model (type casts needed everywhere). May be good in the extended static model, depending on the choice of how we extend it. |
| Other |
This model may be difficult to compile well because the compiler may have difficulty in determining the intermediate types in compound expressions. Languages based on the static model have traditionally been compiled off-line, and such compilers tend to be difficult to write for on-line compilation without requiring the programmer to predeclare all of his data structures (if there are any forward-referenced ones, then the compiler doesn't know whether they should have a type or not). A more dynamic execution model may actually help because it defers compilation until more information is known. |
In the Spice model we think of each member m defined in a class C as though it were a function definition for a (possibly overloaded) function whose first argument has type C. Definitions in an inner lexical scope shadow definitions in outer scopes. The Spice model does not consider the static type Ts of expression a.
Let L be the innermost lexical scope enclosing the member lookup expression a.b
such that some member named b is defined in L. Let Lb be the set of all members named b
defined in lexical scope L, and let S1 = S Lb
(the intersection of S and Lb). If S1 is empty, we fail. If S1 contains exactly
one member s, we use s. If S1 contains several members, we fail (this would only happen for
import conflicts).
Constraints:
| Safety | Good. |
| Abstraction | Good. |
| Robustness | Poor. Renaming a package-visible member may break code outside the class that defines that
member even if that code does not access that member. Converting a member from private to one of the other
two visibilities also can introduce conflicts in other, unrelated classes in the same package that just happen to have
an unrelated member with the same name. Fortunately these conflicts usually (but not always) result in errors rather
than silent changes to the meaning of the program, so one can often find them by exhaustively testing the program after
making a change. |
| Namespace independence | Bad. Members with the same name in unrelated classes often conflict. |
| Compatibility | Poor? Many existing programs rely on namespace independence and would have to be restructured. |
| Other |
Most object-oriented programmers would be confused by a violation of namespace independence. Programming without this assumption requires a different point of view than most programmers are used to. (I am not talking about Lisp and Self programmers, who are familiar with that way of thinking.) |
[There are numerous other variants of the Spice model as well.]
In the Dynamic model we pick the member s in S defined in the innermost lexical scope L
enclosing the member lookup expression a.b. We fail if the innermost such lexical
scope L contains more than one member in S (this would only happen for import conflicts).
Constraints:
| Safety | Good at the language level, but see "other" below. |
| Abstraction | Good. |
| Robustness | Good. All of these changes are easy to do. |
| Namespace independence | Good. |
| Compatibility | Good. |
| Other |
Packages using the dynamic model may be vulnerable to hijacking (coerced into doing something other than what the author intended) by a determined intruder. It is possible for a compiler to detect such vulnerabilities and warn about them. |
The various models make it possible to get into situations where either there is no way to access a visible member of an
object or it is not safe to do so (see member hijacking). In these cases we'd like to be able to
explicitly choose one of several potential members with the same name. The :: namespace syntax allows this. The
left operand of :: is an expression that evaluates to a package or class; we may also allow special keywords
such as public, package, or private instead of an expression here, or omit the expression
altogether. The right operand of :: is a name. The result is the name qualified by the namespace.
As we have seen, the name b in a member access expression a.b does not necessarily
refer to a unique accessible member of object a. In a qualified member access expression a.n::b,
the namespace n narrows the set of members considered, although it's possible that the set may still contain more
than one member, in which case the lookup model again disambiguates. Let S be the set of members named b
of object a that are accessible. The following table shows how a.n::b
subsets set S depending on n:
| n | Subset |
|---|---|
| None | Only the ad-hoc member named b, if any exists |
| A class C | The fixed member of C named b, if it exists; if not, try C's superclass instead, and so on up the chain |
| A package P | The subset of S containing all accessible members of P |
private |
The fixed member named b of the current class |
package |
The subset of S containing all accessible members that have package visibility |
public |
The subset of S containing all accessible members that have public visibility |
The :: operator serves a different role from the . operator. The :: operator produces
a qualified name, while the . operator produces a value. A qualified name can be used as
the right operand of .; a value cannot. If a qualified name is used in a place where a value is expected, the
qualified name is looked up using the lexical scoping rules to obtain the value (most likely a global variable).
All of the models above address only access to fixed members of a class. JavaScript also allows one to dynamically add
members to individual instances of a class. For simplicity we do not provide access control or versioning on these ad-hoc
members -- all of them are public and open to everyone. Because of the safety criterion, a member lookup
of a private or package-protected member must choose the private or package-protected
member even if there is an ad-hoc member of the same name. To satisfy the robustness criterion,
we should treat public members as similarly as possible to private or package-protected
members, so we always give preference to a fixed member when there is an ad-hoc member of the same name.
To access an ad-hoc member that is shadowed by a fixed member, we can either prefix the member's name with ::
or use an indirect member access.
How should we define the behavior of the expression a[b] (assuming the
[] operator is not overridden by a's class)? There are a couple
of possibilities:
"s" and
treat a[b] as though it were a.s. This
is essentially what JavaScript 1.5 does. Unfortunately it's hard to keep this behavior consistent with JavaScript 1.5
programs' expectations (they expect no more than one member with the same name, etc.), and this kind of indirection is
also vulnerable to hijacking. It may be possible to solve the hijacking problem by devising restricted
variants of the [] operator such as a.n::[b]
that follow the rules given in the namespaces section above."s" and
treat a[b] as though it were a.::s,
thus limiting our selection to ad-hoc members. Ad-hoc members are well-behaved, but this kind of behavior would violate
the compatibility criterion when JavaScript 1.5 scripts try to reflect a JavaScript 2.0 object
using the [] operator.In general it seems like it would be a bad idea to extend the syntax of the string "s"
to allow :: operators inside the string. Such strings are too easily forged to play the role of pointers to members.
[explain security attacks]
|
JavaScript 2.0
Compatibility
|
Thursday, November 11, 1999
JavaScript 2.0 is intended to be upwards compatible with JavaScript 1.5 and earlier scripts. The following are the current compatibility issues:
void expr by void(expr).[expr, expr] by expr[(expr,
expr)] because commas are now significant inside brackets.eval for identifiers.Object and String may not work.JavaScript 2.0 is still evolving, and some of these compatibility issues may be addressed as the language matures. They are not expected to be a problem in practice because a browser could distinguish JavaScript 1.5 and earlier scripts from JavaScript 2.0 scripts and behave compatibly on the earlier ones.
|
Waldemar Horwat Last modified Wednesday, February 16, 2000 |