JavaScript 2.0 Lexer Grammar

JavaScript 2.0

Formal Description

Lexer Grammar

Monday, June 7, 1999

This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

The start symbols are NextToken^re and NextToken^div depending on whether a / should be interpreted as a regular expression or division.

Unicode Character Classes

UnicodeCharacter Any Unicode character

UnicodeInitialAlphabetic Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)

UnicodeAlphanumeric Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)

WhiteSpaceCharacter

«TAB» | «VT» | «FF» | «SP» | «u00A0»

| «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»

| «u2008» | «u2009» | «u200A» | «u200B»

| «u3000»

LineTerminator «LF» | «CR» | «u2028» | «u2029»

ASCIIDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Comments

LineComment / / LineCommentCharacters

LineCommentCharacters

«empty»

| LineCommentCharacters NonTerminator

NonTerminator UnicodeCharacter except LineTerminator

BlockComment / * BlockCommentCharacters * /

BlockCommentCharacters

«empty»

| BlockCommentCharacters NonSlash

| PreSlashCharacters /

PreSlashCharacters

«empty»

| BlockCommentCharacters NonAsteriskOrSlash

| PreSlashCharacters /

NonSlash UnicodeCharacter except /

NonAsteriskOrSlash UnicodeCharacter except * | /

White space

WhiteSpace

«empty»

| WhiteSpace WhiteSpaceCharacter

| WhiteSpace LineTerminator

| WhiteSpace LineComment LineTerminator

| WhiteSpace BlockComment

Tokens

t {re, div}

NextToken^t WhiteSpace Token^t

Token^re

IdentifierOrReservedWord

| Punctuator

| NumericLiteral

| QuantityLiteral

| StringLiteral

| RegExpLiteral^slash

| RegExpLiteral^guillemet

| EndOfInput

Token^div

IdentifierOrReservedWord

| Punctuator

| DivisionPunctuator

| NumericLiteral

| QuantityLiteral

| StringLiteral

| RegExpLiteral^guillemet

| EndOfInput

EndOfInput

End

| LineComment End

Keywords and identifiers

IdentifierName

InitialIdentifierCharacter

| IdentifierName ContinuingIdentifierCharacter

InitialIdentifierCharacter

OrdinaryInitialIdentifierCharacter

| \ HexEscape

OrdinaryInitialIdentifierCharacter UnicodeInitialAlphabetic | $ | _

ContinuingIdentifierCharacter

OrdinaryContinuingIdentifierCharacter

| \ HexEscape

OrdinaryContinuingIdentifierCharacter UnicodeAlphanumeric | $ | _

IdentifierOrReservedWord IdentifierName

Punctuators

Punctuator

PunctuatorRE

| PunctuatorDiv

PunctuatorRE

!

| ! =

| ! = =

| #

| %

| % =

| &

| & &

| & & =

| & =

| (

| *

| * =

| +

| + =

| ,

| -

| - =

| - >

| .

| . .

| . . .

| :

| : :

| ;

| <

| < <

| < < =

| < =

| =

| = =

| = = =

| >

| > =

| > >

| > > =

| > > >

| > > > =

| ?

| @

| [

| ^

| ^ =

| ^ ^

| ^ ^ =

| {

| |

| | =

| | |

| | | =

| ~

PunctuatorDiv

)

| + +

| - -

| ]

| }

DivisionPunctuator

/

| / =

Numeric literals

NumericLiteral

DecimalLiteral

| HexIntegerLiteral [lookahead{HexDigit}]

| OctalIntegerLiteral

DecimalLiteral

Mantissa

| Mantissa LetterE SignedInteger

LetterE E | e

Mantissa

DecimalIntegerLiteral

| DecimalIntegerLiteral .

| DecimalIntegerLiteral . Fraction

| . Fraction

DecimalIntegerLiteral

0

| NonZeroDecimalDigits

NonZeroDecimalDigits

NonZeroDigit

| NonZeroDecimalDigits ASCIIDigit

NonZeroDigit 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Fraction DecimalDigits

SignedInteger

DecimalDigits

| + DecimalDigits

| - DecimalDigits

DecimalDigits

ASCIIDigit

| DecimalDigits ASCIIDigit

HexIntegerLiteral

0 LetterX HexDigit

| HexIntegerLiteral HexDigit

LetterX X | x

HexDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

OctalIntegerLiteral

0 OctalDigit

| OctalIntegerLiteral OctalDigit

OctalDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7

Quantity literals

QuantityLiteral NumericLiteral QuantityName

QuantityName [lookahead{LetterE, LetterX}] IdentifierName

String literals

q {single, double}

StringLiteral

' StringChars^single '

| " StringChars^double "

StringChars^q

«empty»

| StringChars^q StringChar^q

StringChar^q

LiteralStringChar^q

| \ StringEscape

LiteralStringChar^single UnicodeCharacter except ' | \ | LineTerminator

LiteralStringChar^double UnicodeCharacter except " | \ | LineTerminator

StringEscape

ControlEscape

| OctalEscape

| HexEscape

| IdentityEscape

IdentityEscape NonTerminator except UnicodeAlphanumeric

ControlEscape

b

| f

| n

| r

| t

| v

OctalEscape

OctalDigit [lookahead{OctalDigit}]

| ZeroToThree OctalDigit [lookahead{OctalDigit}]

| FourToSeven OctalDigit

| ZeroToThree OctalDigit OctalDigit

ZeroToThree 0 | 1 | 2 | 3

FourToSeven 4 | 5 | 6 | 7

HexEscape

x HexDigit HexDigit

| u HexDigit HexDigit HexDigit HexDigit

Regular expression literals

r {slash, guillemet}

RegExpLiteral^r RegExpBody^r RegExpFlags

RegExpFlags

«empty»

| RegExpFlags ContinuingIdentifierCharacter

RegExpBody^slash / RegExpFirstChar RegExpChars^slash /

RegExpBody^guillemet «u00AB» RegExpChars^guillemet «u00BB»

RegExpFirstChar

OrdinaryRegExpFirstChar

| \ NonTerminator

OrdinaryRegExpFirstChar NonTerminator except \ | / | *

RegExpChars^r

«empty»

| RegExpChars^r RegExpChar^r

RegExpChar^r

OrdinaryRegExpChar^r

| \ NonTerminator

OrdinaryRegExpChar^slash NonTerminator except \ | /

OrdinaryRegExpChar^guillemet NonTerminator except \ | «u00BB»

Waldemar Horwat
Last modified Monday, June 7, 1999