JavaScript 2.0
Formal Description
Lexer Grammar
previousupnext

Monday, June 7, 1999

This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

The start symbols are NextTokenre and NextTokendiv depending on whether a / should be interpreted as a regular expression or division.

Unicode Character Classes

UnicodeCharacter  Any Unicode character
UnicodeInitialAlphabetic  Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator  «LF» | «CR» | «u2028» | «u2029»
ASCIIDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Comments

LineComment  / / LineCommentCharacters
LineCommentCharacters 
   «empty»
|  LineCommentCharacters NonTerminator
NonTerminator  UnicodeCharacter except LineTerminator
BlockComment  / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
|  BlockCommentCharacters NonSlash
|  PreSlashCharacters /
PreSlashCharacters 
   «empty»
|  BlockCommentCharacters NonAsteriskOrSlash
|  PreSlashCharacters /
NonSlash  UnicodeCharacter except /
NonAsteriskOrSlash  UnicodeCharacter except * | /

White space

WhiteSpace 
   «empty»
|  WhiteSpace WhiteSpaceCharacter
|  WhiteSpace LineTerminator
|  WhiteSpace LineComment LineTerminator
|  WhiteSpace BlockComment

Tokens

t  {rediv}
NextTokent  WhiteSpace Tokent
Tokenre 
   IdentifierOrReservedWord
|  Punctuator
|  NumericLiteral
|  QuantityLiteral
|  StringLiteral
|  RegExpLiteralslash
|  RegExpLiteralguillemet
|  EndOfInput
Tokendiv 
   IdentifierOrReservedWord
|  Punctuator
|  DivisionPunctuator
|  NumericLiteral
|  QuantityLiteral
|  StringLiteral
|  RegExpLiteralguillemet
|  EndOfInput
EndOfInput 
   End
|  LineComment End

Keywords and identifiers

IdentifierName 
   InitialIdentifierCharacter
|  IdentifierName ContinuingIdentifierCharacter
InitialIdentifierCharacter 
   OrdinaryInitialIdentifierCharacter
|  \ HexEscape
OrdinaryInitialIdentifierCharacter  UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacter 
   OrdinaryContinuingIdentifierCharacter
|  \ HexEscape
OrdinaryContinuingIdentifierCharacter  UnicodeAlphanumeric | $ | _
IdentifierOrReservedWord  IdentifierName

Punctuators

Punctuator 
   PunctuatorRE
|  PunctuatorDiv
PunctuatorRE 
   !
|  ! =
|  ! = =
|  #
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  *
|  * =
|  +
|  + =
|  ,
|  -
|  - =
|  - >
|  .
|  . .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  @
|  [
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  ~
PunctuatorDiv 
   )
|  + +
|  - -
|  ]
|  }
DivisionPunctuator 
   /
|  / =

Numeric literals

NumericLiteral 
   DecimalLiteral
|  HexIntegerLiteral [lookahead{HexDigit}]
|  OctalIntegerLiteral
DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE  E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits ASCIIDigit
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction  DecimalDigits
SignedInteger 
   DecimalDigits
|  + DecimalDigits
|  - DecimalDigits
DecimalDigits 
   ASCIIDigit
|  DecimalDigits ASCIIDigit
HexIntegerLiteral 
   0 LetterX HexDigit
|  HexIntegerLiteral HexDigit
LetterX  X | x
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f
OctalIntegerLiteral 
   0 OctalDigit
|  OctalIntegerLiteral OctalDigit
OctalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7

Quantity literals

QuantityLiteral  NumericLiteral QuantityName
QuantityName  [lookahead{LetterELetterX}] IdentifierName

String literals

q  {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "
StringCharsq 
   «empty»
|  StringCharsq StringCharq
StringCharq 
   LiteralStringCharq
|  \ StringEscape
LiteralStringCharsingle  UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble  UnicodeCharacter except " | \ | LineTerminator
StringEscape 
   ControlEscape
|  OctalEscape
|  HexEscape
|  IdentityEscape
IdentityEscape  NonTerminator except UnicodeAlphanumeric
ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v
OctalEscape 
   OctalDigit [lookahead{OctalDigit}]
|  ZeroToThree OctalDigit [lookahead{OctalDigit}]
|  FourToSeven OctalDigit
|  ZeroToThree OctalDigit OctalDigit
ZeroToThree  0 | 1 | 2 | 3
FourToSeven  4 | 5 | 6 | 7
HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit

Regular expression literals

r  {slashguillemet}
RegExpLiteralr  RegExpBodyr RegExpFlags
RegExpFlags 
   «empty»
|  RegExpFlags ContinuingIdentifierCharacter
RegExpBodyslash  / RegExpFirstChar RegExpCharsslash /
RegExpBodyguillemet  «u00AB» RegExpCharsguillemet «u00BB»
RegExpFirstChar 
   OrdinaryRegExpFirstChar
|  \ NonTerminator
OrdinaryRegExpFirstChar  NonTerminator except \ | / | *
RegExpCharsr 
   «empty»
|  RegExpCharsr RegExpCharr
RegExpCharr 
   OrdinaryRegExpCharr
|  \ NonTerminator
OrdinaryRegExpCharslash  NonTerminator except \ | /
OrdinaryRegExpCharguillemet  NonTerminator except \ | «u00BB»

Waldemar Horwat
Last modified Monday, June 7, 1999