JavaScript 2.0
Formal Description
Regular Expression Semantics
previousupnext

Sunday, May 16, 1999

The regular expression semantics describe the actions the regular expression engine takes in order to transform a regular expression pattern into a function for matching against input strings. For convenience, the regular expression grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

Please excuse the mess. The regular expression semantics below are working (except for multiline and case-insensitive matches) and have been tried on sample cases, but they could be cleaned up somewhat and formatted better.

Unicode Character Classes

Syntax

UnicodeCharacter  Any Unicode character
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
LineTerminator  «LF» | «CR» | «u2028» | «u2029»
NonTerminator  UnicodeCharacter except LineTerminator

Semantics

lineTerminators : {Character} = {‘«LF»’, ‘«CR»’, ‘«u2028»’, ‘«u2029»’}

nonTerminators : {Character} = {‘«NUL»’ ... ‘«uFFFF»’} - lineTerminators

reWhitespaces : {Character} = {‘«FF»’, ‘«LF»’, ‘«CR»’, ‘«TAB»’, ‘«VT»’, ‘ ’}

reDigits : {Character} = {‘0’ ... ‘9’}

reWordCharacters : {Character} = {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’, ‘_’}

Regular Expression Definitions

Semantics

type REInput = tuple {strStringignoreCaseBooleanmultilineBoolean}

Field str is the input string. ignoreCase and multiline are the corresponding regular expression flags.

type REResult = oneof {successREMatchfailure}

type REMatch = tuple {endIndexIntegercapturedCapture[]}

A REMatch holds an intermediate state during the pattern-matching process. endIndex is the index of the next input character to be matched by the next component in a regular expression pattern. If we are at the end of the pattern, endIndex is one plus the index of the last matched input character. captured is a zero-based array of the strings captured so far by capturing parentheses.

type Capture = oneof {presentStringabsent}

type Continuation = REMatch  REResult

A Continuation is a function that attempts to match the remaining portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. If a match is possible, it returns a success result that contains the final REMatch state; if no match is possible, it returns a failure result.

type Matcher = REInput  REMatch  Integer  Continuation  REResult

A Matcher is a function that attempts to match a middle portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. Since the remainder of the pattern heavily influences whether (and how) a middle portion will match, we must pass in a Continuation function that checks whether the rest of the pattern matched. If the continuation returns failure, the matcher function may call it repeatedly, trying various alternatives at pattern choice points.

The REInput parameter contains the input string and is merely passed down to subroutines. The Integer parameter contains the number of capturing left parentheses seen so far in the pattern and is used to assign static, consecutive numbers to capturing parentheses.

sequenceMatcher(matcher1MatcherparenCount1Integermatcher2Matcher) : Matcher
  = function(tREInputxREMatchpIntegercContinuation)
         let dContinuation
                 = function(yREMatch)
                        matcher2(typ + parenCount1c)
         in matcher1(txpd)

sequenceMatcher returns a Matcher that matches the concatenation of the patterns matched by matcher1 and matcher2. parenCount1 is the number of capturing left parentheses inside matcher2's pattern.

characterSetMatcher(acceptanceSet{Character}) : Matcher
  = function(tREInputxREMatchpIntegercContinuation)
         let iInteger = x.endIndex;
             sString = t.str
         in if i = length(s)
             then failure
             else if s[i acceptanceSet
             then c(endIndex (i + 1), captured x.captured)
             else failure

characterSetMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is a member of the acceptanceSet set of characters (possibly ignoring case).

characterMatcher(chCharacter) : Matcher = characterSetMatcher({ch})

characterMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is the same as ch (possibly ignoring case).

Regular Expression Patterns

Syntax

RegularExpressionPattern  Disjunction

Semantics

action Exec[RegularExpressionPattern] : REInput  Integer  REResult

Exec[RegularExpressionPattern  Disjunction](tREInputindexInteger)
  = Match[Disjunction](
         t,
         endIndex indexcaptured fillCapture(CountParens[Disjunction]),
         0,
         
         (function(xREMatch)
             success x))

fillCapture(iInteger) : Capture[]
  = if i = 0
     then []Capture
     else fillCapture(i - 1)  [absent]

Disjunctions

Syntax

Disjunction 
   Alternativenormal
|  Alternativenormal | Disjunction

Semantics

action Match[Disjunction] : Matcher

Match[Disjunction  Alternativenormal] = Match[Alternativenormal]

Match[Disjunction  Alternativenormal | Disjunction1]
            (tREInputxREMatchpIntegercContinuation)
  = case Match[Alternativenormal](txpcof
        success(yREMatch): success y;
        failureMatch[Disjunction1](txp + CountParens[Alternativenormal], c)
        end

action CountParens[Disjunction] : Integer

CountParens[Disjunction  Alternativenormal] = CountParens[Alternativenormal]

CountParens[Disjunction  Alternativenormal | Disjunction1]
  = CountParens[Alternativenormal] + CountParens[Disjunction1]

Quantifiers

Syntax

l  {normalnonDecimalDigitnonOctalDigit}
Alternativel 
   «empty»
|  Assertion Alternativenormal
|  OrdinaryAtoml Alternativenormal
|  OneDigitEscape AlternativenonDecimalDigit
|  ShortOctalEscape AlternativenonOctalDigit
|  Atoml Quantifier Alternativenormal
Quantifier 
   QuantifierPrefix
|  QuantifierPrefix ?
QuantifierPrefix 
   *
|  +
|  ?
|  { DecimalDigits }
|  { DecimalDigits , }
|  { DecimalDigits , DecimalDigits }
DecimalDigits 
   DecimalDigit
|  DecimalDigits DecimalDigit
DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

type Limit = oneof {finiteIntegerinfinite}

resetParens(xREMatchpIntegernParensInteger) : REMatch
  = if nParens = 0
     then x
     else let yREMatch = endIndex x.endIndexcaptured x.captured[p  absent]
           in resetParens(yp + 1, nParens - 1)

repeatMatcher(bodyMatcherminIntegermaxLimitlazyBooleannBodyParensInteger)
  : Matcher
  = function(tREInputxREMatchpIntegercContinuation)
         if 
             (case max of
                finite(mInteger): m = 0;
                infinitefalse
                end)
         then c(x)
         else let dContinuation
                       = function(yREMatch)
                              if min = 0 and (max is infinite and y.endIndex = x.endIndex)
                              then failure
                              else let newMinInteger
                                            = if min = 0
                                               then 0
                                               else min - 1;
                                        newMaxLimit
                                            = case max of
                                                  finite(mInteger): finite (m - 1);
                                                  infiniteinfinite
                                                  end
                                    in repeatMatcher(bodynewMinnewMaxlazynBodyParens)(
                                            t,
                                            y,
                                            p,
                                            c);
                   xrREMatch = resetParens(xpnBodyParens)
               in if min  0
                   then body(txrpd)
                   else if lazy
                   then case c(xof
                             success(zREMatch): success z;
                             failurebody(txrpd)
                             end
                   else case body(txrpdof
                            success(zREMatch): success z;
                            failurec(x)
                            end

action Match[Alternativel] : Matcher

Match[Alternativel  «empty»](tREInputxREMatchpIntegercContinuation) = c(x)

Match[Alternativel  Assertion Alternativenormal1]
            (tREInputxREMatchpIntegercContinuation)
  = if TestAssertion[Assertion](tx)
     then Match[Alternativenormal1](txpc)
     else failure

Match[Alternativel  OrdinaryAtoml Alternativenormal1]
  = sequenceMatcher(
         Match[OrdinaryAtoml],
         CountParens[OrdinaryAtoml],
         Match[Alternativenormal1])

Match[Alternativel  OneDigitEscape AlternativenonDecimalDigit1]
  = sequenceMatcher(Match[OneDigitEscape], 0, Match[AlternativenonDecimalDigit1])

Match[Alternativel  ShortOctalEscape AlternativenonOctalDigit1]
  = sequenceMatcher(Match[ShortOctalEscape], 0, Match[AlternativenonOctalDigit1])

Match[Alternativel  Atoml Quantifier Alternativenormal1]
  = let minInteger = Minimum[Quantifier];
         maxLimit = Maximum[Quantifier];
         lazyBoolean = Lazy[Quantifier]
     in if 
             (case max of
                finite(mInteger): m < min;
                infinitefalse
                end)
         then 
         else let looperMatcher
                       = repeatMatcher(Match[Atoml], minmaxlazyCountParens[Atoml])
               in sequenceMatcher(looperCountParens[Atoml], Match[Alternativenormal1])

action CountParens[Alternativel] : Integer

CountParens[Alternativel  «empty»] = 0

CountParens[Alternativel  Assertion Alternativenormal1]
  = CountParens[Alternativenormal1]

CountParens[Alternativel  OrdinaryAtoml Alternativenormal1]
  = CountParens[OrdinaryAtoml] + CountParens[Alternativenormal1]

CountParens[Alternativel  OneDigitEscape AlternativenonDecimalDigit1]
  = CountParens[AlternativenonDecimalDigit1]

CountParens[Alternativel  ShortOctalEscape AlternativenonOctalDigit1]
  = CountParens[AlternativenonOctalDigit1]

CountParens[Alternativel  Atoml Quantifier Alternativenormal1]
  = CountParens[Atoml] + CountParens[Alternativenormal1]

action Minimum[Quantifier] : Integer

Minimum[Quantifier  QuantifierPrefix] = Minimum[QuantifierPrefix]

Minimum[Quantifier  QuantifierPrefix ?] = Minimum[QuantifierPrefix]

action Maximum[Quantifier] : Limit

Maximum[Quantifier  QuantifierPrefix] = Maximum[QuantifierPrefix]

Maximum[Quantifier  QuantifierPrefix ?] = Maximum[QuantifierPrefix]

action Lazy[Quantifier] : Boolean

Lazy[Quantifier  QuantifierPrefix] = false

Lazy[Quantifier  QuantifierPrefix ?] = true

action Minimum[QuantifierPrefix] : Integer

Minimum[QuantifierPrefix  *] = 0

Minimum[QuantifierPrefix  +] = 1

Minimum[QuantifierPrefix  ?] = 0

Minimum[QuantifierPrefix  { DecimalDigits }] = IntegerValue[DecimalDigits]

Minimum[QuantifierPrefix  { DecimalDigits , }] = IntegerValue[DecimalDigits]

Minimum[QuantifierPrefix  { DecimalDigits1 , DecimalDigits2 }]
  = IntegerValue[DecimalDigits1]

action Maximum[QuantifierPrefix] : Limit

Maximum[QuantifierPrefix  *] = infinite

Maximum[QuantifierPrefix  +] = infinite

Maximum[QuantifierPrefix  ?] = finite 1

Maximum[QuantifierPrefix  { DecimalDigits }] = finite IntegerValue[DecimalDigits]

Maximum[QuantifierPrefix  { DecimalDigits , }] = infinite

Maximum[QuantifierPrefix  { DecimalDigits1 , DecimalDigits2 }]
  = finite IntegerValue[DecimalDigits2]

action IntegerValue[DecimalDigits] : Integer

IntegerValue[DecimalDigits  DecimalDigit] = DigitValue[DecimalDigit]

IntegerValue[DecimalDigits  DecimalDigits1 DecimalDigit]
  = 10*IntegerValue[DecimalDigits1] + DigitValue[DecimalDigit]

action DigitValue[DecimalDigit] : Integer = digitValue(DecimalDigit)

Assertions

Syntax

Assertion 
   ^
|  $
|  \ b
|  \ B

Semantics

action TestAssertion[Assertion] : REInput  REMatch  Boolean

TestAssertion[Assertion  ^](tREInputxREMatch) = x.endIndex = 0

TestAssertion[Assertion  $](tREInputxREMatch) = x.endIndex = length(t.str)

TestAssertion[Assertion  \ b](tREInputxREMatch)
  = atWordBoundary(x.endIndext.str)

TestAssertion[Assertion  \ B](tREInputxREMatch)
  = not atWordBoundary(x.endIndext.str)

atWordBoundary(iIntegersString) : Boolean
  = i = 0 or (i = length(sor (s[i - 1]  reWordCharacters xor s[i reWordCharacters))

Atoms

Syntax

Atoml 
   OrdinaryAtoml
|  OneDigitEscape
|  ShortOctalEscape
OrdinaryAtoml 
   CompoundAtom
|  PatternCharacterl
PatternCharacternormal  NonTerminator except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |
PatternCharacternonDecimalDigit  PatternCharacternormal except DecimalDigit
PatternCharacternonOctalDigit  PatternCharacternormal except OctalDigit
CompoundAtom 
   ( Disjunction )
|  ( ? : Disjunction )
|  .
|  CharacterClass
|  CharacterEscape
|  TwoDigitEscape
|  CharacterClassEscape

Semantics

action Match[Atoml] : Matcher

Match[Atoml  OrdinaryAtoml] = Match[OrdinaryAtoml]

Match[Atoml  OneDigitEscape] = Match[OneDigitEscape]

Match[Atoml  ShortOctalEscape] = Match[ShortOctalEscape]

action CountParens[Atoml] : Integer

CountParens[Atoml  OrdinaryAtoml] = CountParens[OrdinaryAtoml]

CountParens[Atoml  OneDigitEscape] = 0

CountParens[Atoml  ShortOctalEscape] = 0

action Match[OrdinaryAtoml] : Matcher

Match[OrdinaryAtoml  CompoundAtom] = Match[CompoundAtom]

Match[OrdinaryAtoml  PatternCharacterl] = characterMatcher(PatternCharacterl)

action CountParens[OrdinaryAtoml] : Integer

CountParens[OrdinaryAtoml  CompoundAtom] = CountParens[CompoundAtom]

CountParens[OrdinaryAtoml  PatternCharacterl] = 0

action Match[CompoundAtom] : Matcher

Match[CompoundAtom  ( Disjunction )]
            (tREInputxREMatchpIntegercContinuation)
  = let dContinuation
             = function(yREMatch)
                    let updatedCapturedCapture[]
                            = y.captured[p  present t.str[x.endIndex ... y.endIndex - 1]]
                    in c(endIndex y.endIndexcaptured updatedCaptured)
     in Match[Disjunction](txp + 1, d)

Match[CompoundAtom  ( ? : Disjunction )] = Match[Disjunction]

Match[CompoundAtom  .] = characterSetMatcher(nonTerminators)

Match[CompoundAtom  CharacterClass](tREInputxREMatchpIntegercContinuation)
  = let a{Character} = AcceptanceSet[CharacterClass](length(x.captured))
     in characterSetMatcher(a)(txpc)

Match[CompoundAtom  CharacterEscape] = characterMatcher(CharacterValue[CharacterEscape])

Match[CompoundAtom  TwoDigitEscape] = Match[TwoDigitEscape]

Match[CompoundAtom  CharacterClassEscape]
  = characterSetMatcher(AcceptanceSet[CharacterClassEscape])

action CountParens[CompoundAtom] : Integer

CountParens[CompoundAtom  ( Disjunction )] = CountParens[Disjunction] + 1

CountParens[CompoundAtom  ( ? : Disjunction )] = CountParens[Disjunction]

CountParens[CompoundAtom  .] = 0

CountParens[CompoundAtom  CharacterClass] = 0

CountParens[CompoundAtom  CharacterEscape] = 0

CountParens[CompoundAtom  TwoDigitEscape] = 0

CountParens[CompoundAtom  CharacterClassEscape] = 0

Escapes

Syntax

CharacterEscape 
   ControlEscape
|  \ c ControlLetter
|  ThreeDigitEscape
|  HexEscape
|  \ IdentityEscape
ControlLetter 
   A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|  a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
IdentityEscape  NonTerminator except UnicodeAlphanumeric
ControlEscape 
   \ f
|  \ n
|  \ r
|  \ t
|  \ v

Semantics

action CharacterValue[CharacterEscape] : Character

CharacterValue[CharacterEscape  ControlEscape] = CharacterValue[ControlEscape]

CharacterValue[CharacterEscape  \ c ControlLetter]
  = codeToCharacter(bitwiseAnd(characterToCode(ControlLetter), 31))

CharacterValue[CharacterEscape  ThreeDigitEscape] = CharacterValue[ThreeDigitEscape]

CharacterValue[CharacterEscape  HexEscape] = CharacterValue[HexEscape]

CharacterValue[CharacterEscape  \ IdentityEscape] = IdentityEscape

action CharacterValue[ControlEscape] : Character

CharacterValue[ControlEscape  \ f] = ‘«FF»

CharacterValue[ControlEscape  \ n] = ‘«LF»

CharacterValue[ControlEscape  \ r] = ‘«CR»

CharacterValue[ControlEscape  \ t] = ‘«TAB»

CharacterValue[ControlEscape  \ v] = ‘«VT»

Numeric Escapes

Syntax

OneDigitEscape  \ DecimalDigit
ShortOctalEscape  \ ZeroToThree OctalDigit
TwoDigitEscape 
   \ ZeroToThree EightOrNine
|  \ FourToNine DecimalDigit
ThreeDigitEscape  \ ZeroToThree OctalDigit OctalDigit
ZeroToThree  0 | 1 | 2 | 3
FourToNine  4 | 5 | 6 | 7 | 8 | 9
OctalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
EightOrNine  8 | 9

Semantics

backreferenceMatcher(nInteger) : Matcher
  = function(tREInputxREMatchpIntegercContinuation)
         case nthBackreference(xnof
            present(refString):
                  let iInteger = x.endIndex;
                      sString = t.str
                  in let jInteger = i + length(ref)
                  in if j > length(s)
                      then failure
                      else if s[i ... j - 1] = ref
                      then c(endIndex jcaptured x.captured)
                      else failure;
            absentfailure
            end

nthBackreference(xREMatchnInteger) : Capture
  = if n > 0 and n  length(x.captured)
     then x.captured[n - 1]
     else 

action Match[OneDigitEscape] : Matcher

Match[OneDigitEscape  \ DecimalDigit]
  = let nInteger = DigitValue[DecimalDigit]
     in if n = 0
         then characterMatcher(‘«NUL»’)
         else backreferenceMatcher(n)

action CharacterValue[ShortOctalEscape] : Integer  Character

CharacterValue[ShortOctalEscape  \ ZeroToThree OctalDigit](nCapturingParensInteger)
  = let nInteger = 10*DigitValue[ZeroToThree] + DigitValue[OctalDigit]
     in if n  10 and n  nCapturingParens
         then 
         else codeToCharacter(8*DigitValue[ZeroToThree] + DigitValue[OctalDigit])

action Match[ShortOctalEscape] : Matcher

Match[ShortOctalEscape  \ ZeroToThree OctalDigit]
            (tREInputxREMatchpIntegercContinuation)
  = let nInteger = 10*DigitValue[ZeroToThree] + DigitValue[OctalDigit]
     in if n  10 and n  length(x.captured)
         then backreferenceMatcher(n)(txpc)
         else characterMatcher(
                   codeToCharacter(8*DigitValue[ZeroToThree] + DigitValue[OctalDigit]))(
                   t,
                   x,
                   p,
                   c)

action Match[TwoDigitEscape] : Matcher

Match[TwoDigitEscape  \ ZeroToThree EightOrNine]
  = backreferenceMatcher(10*DigitValue[ZeroToThree] + DigitValue[EightOrNine])

Match[TwoDigitEscape  \ FourToNine DecimalDigit]
  = backreferenceMatcher(10*DigitValue[FourToNine] + DigitValue[DecimalDigit])

action CharacterValue[ThreeDigitEscape] : Character

CharacterValue[ThreeDigitEscape  \ ZeroToThree OctalDigit1 OctalDigit2]
  = codeToCharacter(
         64*DigitValue[ZeroToThree] + 8*DigitValue[OctalDigit1] + DigitValue[OctalDigit2])

action DigitValue[ZeroToThree] : Integer = digitValue(ZeroToThree)

action DigitValue[FourToNine] : Integer = digitValue(FourToNine)

action DigitValue[OctalDigit] : Integer = digitValue(OctalDigit)

action DigitValue[EightOrNine] : Integer = digitValue(EightOrNine)

Syntax

HexEscape 
   \ x HexDigit HexDigit
|  \ u HexDigit HexDigit HexDigit HexDigit
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

action CharacterValue[HexEscape] : Character

CharacterValue[HexEscape  \ x HexDigit1 HexDigit2]
  = codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])

CharacterValue[HexEscape  \ u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
  = codeToCharacter(
         4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
         HexValue[HexDigit4])

action DigitValue[HexDigit] : Integer = digitValue(HexDigit)

Character Class Escapes

Syntax

CharacterClassEscape 
   \ s
|  \ S
|  \ d
|  \ D
|  \ w
|  \ W

Semantics

action AcceptanceSet[CharacterClassEscape] : {Character}

AcceptanceSet[CharacterClassEscape  \ s] = reWhitespaces

AcceptanceSet[CharacterClassEscape  \ S] = {‘«NUL»’ ... ‘«uFFFF»’} - reWhitespaces

AcceptanceSet[CharacterClassEscape  \ d] = reDigits

AcceptanceSet[CharacterClassEscape  \ D] = {‘«NUL»’ ... ‘«uFFFF»’} - reDigits

AcceptanceSet[CharacterClassEscape  \ w] = reWordCharacters

AcceptanceSet[CharacterClassEscape  \ W] = {‘«NUL»’ ... ‘«uFFFF»’} - reWordCharacters

User-Specified Character Classes

Syntax

s  {anynoCaretnoDash}
CharacterClass 
   [ ClassRangesnoCaret ]
|  [ ^ ClassRangesany ]
ClassRangess 
   «empty»
|  RangeLists,normal
RangeLists,l 
   FinalRangeAtoms,l
|  OrdinaryRangeAtoms,l RangeListSuffixnormal
|  ZeroEscape RangeListSuffixnonDecimalDigit
|  ShortOctalEscape RangeListSuffixnonOctalDigit
RangeListSuffixl 
   RangeListnoDash,l
|  - RangeAtomany,normal
|  - OrdinaryRangeAtomany,normal RangeListany,normal
|  - ZeroEscape RangeListany,nonDecimalDigit
|  - ShortOctalEscape RangeListany,nonOctalDigit

Semantics

action AcceptanceSet[CharacterClass] : Integer  {Character}

AcceptanceSet[CharacterClass  [ ClassRangesnoCaret ]](nCapturingParensInteger)
  = AcceptanceSet[ClassRangesnoCaret](nCapturingParens)

AcceptanceSet[CharacterClass  [ ^ ClassRangesany ]](nCapturingParensInteger)
  = {‘«NUL»’ ... ‘«uFFFF»’} - AcceptanceSet[ClassRangesany](nCapturingParens)

action AcceptanceSet[ClassRangess] : Integer  {Character}

AcceptanceSet[ClassRangess  «empty»](nCapturingParensInteger) = {}Character

AcceptanceSet[ClassRangess  RangeLists,normal](nCapturingParensInteger)
  = AcceptanceSet[RangeLists,normal](nCapturingParens)

action AcceptanceSet[RangeLists,l] : Integer  {Character}

AcceptanceSet[RangeLists,l  FinalRangeAtoms,l](nCapturingParensInteger)
  = AcceptanceSet[FinalRangeAtoms,l](nCapturingParens)

AcceptanceSet[RangeLists,l  OrdinaryRangeAtoms,l RangeListSuffixnormal]
            (nCapturingParensInteger)
  = let a{Character} = AcceptanceSet[OrdinaryRangeAtoms,l]
     in AcceptanceSet[RangeListSuffixnormal](nCapturingParensa)

AcceptanceSet[RangeLists,l  ZeroEscape RangeListSuffixnonDecimalDigit]
            (nCapturingParensInteger)
  = let a{Character} = {CharacterValue[ZeroEscape]}
     in AcceptanceSet[RangeListSuffixnonDecimalDigit](nCapturingParensa)

AcceptanceSet[RangeLists,l  ShortOctalEscape RangeListSuffixnonOctalDigit]
            (nCapturingParensInteger)
  = let a{Character} = {CharacterValue[ShortOctalEscape](nCapturingParens)}
     in AcceptanceSet[RangeListSuffixnonOctalDigit](nCapturingParensa)

action AcceptanceSet[RangeListSuffixl] : Integer  {Character {Character}

AcceptanceSet[RangeListSuffixl  RangeListnoDash,l]
            (nCapturingParensIntegerlow{Character})
  = low  AcceptanceSet[RangeListnoDash,l](nCapturingParens)

AcceptanceSet[RangeListSuffixl  - RangeAtomany,normal]
            (nCapturingParensIntegerlow{Character})
  = characterRange(lowAcceptanceSet[RangeAtomany,normal](nCapturingParens))

AcceptanceSet[RangeListSuffixl  - OrdinaryRangeAtomany,normal RangeListany,normal]
            (nCapturingParensIntegerlow{Character})
  = let range{Character}
             = characterRange(lowAcceptanceSet[OrdinaryRangeAtomany,normal])
     in range  AcceptanceSet[RangeListany,normal](nCapturingParens)

AcceptanceSet[RangeListSuffixl  - ZeroEscape RangeListany,nonDecimalDigit]
            (nCapturingParensIntegerlow{Character})
  = let range{Character} = characterRange(low, {CharacterValue[ZeroEscape]})
     in range  AcceptanceSet[RangeListany,nonDecimalDigit](nCapturingParens)

AcceptanceSet[RangeListSuffixl  - ShortOctalEscape RangeListany,nonOctalDigit]
            (nCapturingParensIntegerlow{Character})
  = let range{Character}
             = characterRange(low, {CharacterValue[ShortOctalEscape](nCapturingParens)})
     in range  AcceptanceSet[RangeListany,nonOctalDigit](nCapturingParens)

characterRange(low{Character}high{Character}) : {Character}
  = if (|low|)  1 or (|high|)  1
     then 
     else let lCharacter = min low;
               hCharacter = min high
           in if l  h
               then {l ... h}
               else 

Character Class Range Atoms

Syntax

FinalRangeAtomany,l  RangeAtomany,l
FinalRangeAtomnoCaret,l  RangeAtomnoCaret,l
FinalRangeAtomnoDash,l  RangeAtomany,l
RangeAtoms,l 
   OrdinaryRangeAtoms,l
|  ZeroEscape
|  ShortOctalEscape
OrdinaryRangeAtoms,l 
   RangeCharacters,l
|  RangeEscape
RangeCharacterany,normal  NonTerminator except \ | ]
RangeCharacterany,nonDecimalDigit  RangeCharacterany,normal except DecimalDigit
RangeCharacterany,nonOctalDigit  RangeCharacterany,normal except OctalDigit
RangeCharacternoCaret,normal  RangeCharacterany,normal except ^
RangeCharacternoDash,normal  RangeCharacterany,normal except -
RangeCharacternoDash,nonDecimalDigit  RangeCharacternoDash,normal except DecimalDigit
RangeCharacternoDash,nonOctalDigit  RangeCharacternoDash,normal except OctalDigit
RangeEscape 
   \ b
|  CharacterEscape
|  CharacterClassEscape
ZeroEscape  \ 0

Semantics

action AcceptanceSet[FinalRangeAtoms,l] : Integer  {Character}

AcceptanceSet[FinalRangeAtomany,l  RangeAtomany,l](nCapturingParensInteger)
  = AcceptanceSet[RangeAtomany,l](nCapturingParens)

AcceptanceSet[FinalRangeAtomnoCaret,l  RangeAtomnoCaret,l](nCapturingParensInteger)
  = AcceptanceSet[RangeAtomnoCaret,l](nCapturingParens)

AcceptanceSet[FinalRangeAtomnoDash,l  RangeAtomany,l](nCapturingParensInteger)
  = AcceptanceSet[RangeAtomany,l](nCapturingParens)

action AcceptanceSet[RangeAtoms,l] : Integer  {Character}

AcceptanceSet[RangeAtoms,l  OrdinaryRangeAtoms,l](nCapturingParensInteger)
  = AcceptanceSet[OrdinaryRangeAtoms,l]

AcceptanceSet[RangeAtoms,l  ZeroEscape](nCapturingParensInteger)
  = {CharacterValue[ZeroEscape]}

AcceptanceSet[RangeAtoms,l  ShortOctalEscape](nCapturingParensInteger)
  = {CharacterValue[ShortOctalEscape](nCapturingParens)}

action AcceptanceSet[OrdinaryRangeAtoms,l] : {Character}

AcceptanceSet[OrdinaryRangeAtoms,l  RangeCharacters,l] = {RangeCharacters,l}

AcceptanceSet[OrdinaryRangeAtoms,l  RangeEscape] = AcceptanceSet[RangeEscape]

action AcceptanceSet[RangeEscape] : {Character}

AcceptanceSet[RangeEscape  \ b] = {‘«BS»’}

AcceptanceSet[RangeEscape  CharacterEscape] = {CharacterValue[CharacterEscape]}

AcceptanceSet[RangeEscape  CharacterClassEscape] = AcceptanceSet[CharacterClassEscape]

action CharacterValue[ZeroEscape] : Character

CharacterValue[ZeroEscape  \ 0] = ‘«NUL»


Waldemar Horwat
Last modified Sunday, May 16, 1999