| Regular Grammars: Operators |
| AB |
Concatenation Operator, concats A and B (has no symbol, implicit between operands). |
| A|B |
Union Operator of A and B; correlates to logical OR. |
| A* |
Quantifying: repetition Operator (Kleene Stern); repeat A n times where n=0 is allowed (empty expression). |
| A+ |
Quantifying: repetition Operator (non-empty); repeat A n times where n>0 is required (non-empty expression). |
| A{n} |
Quantifying: repetition Operator (minimum); repeat A at least n times (more allowed). |
| A{n,} |
Quantifying: repetition Operator (exact); repeat A exactly n times. |
| A{n,m} |
Quantifying: repetition Operator (minimum and maximum); repeat A at least n times but not more than m times. |
| (A) |
Parenthesis: bind composite expressions together against stronger operators. |
|
Operator precedence: Union < Concatenation < Quantifying Operators < Parenthesis |
| Regular Grammars: Notation |
| a |
The single character a |
| asdf |
The constant character sequence asdf |
| asdf|qwertz |
The constant character sequence asdf OR qwertz |
| asdf* |
The sequence asdf zero or n times. |
| asdf+ |
The sequence asdf at least one or n times. |
| (abcd|xyz)* |
Any sequence (including empty) that cosists of abcd and xyz sub sequences only |
| cat{3} |
The sequence cat three times: catcatcat |
| cat{3,5} |
The sequence cat three to five times: catcatcat, catcatcatcat or catcatcatcatcat |
| cat{3,} |
The sequence cat at least three times: catcatcat, catcatcatcat or catcatcatcatcat, catcatcatcatcatcat, ... |
| Escaping Characters |
| \\ |
The backslash character |
| \. |
Punctuation mark (to distinct from wildcard token) |
| \^ |
Circumflex (to distinct from begin-of-input token) |
| \$ |
Dollar (to distinct from end-of-input token) |
| \0n |
The character with octal value 0n (0 <= n <= 7) |
| \0nn |
The character with octal value 0nn (0 <= n <= 7) |
| \0mnn |
The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
| \xhh |
The character with hexadecimal value 0xhh |
| \uhhhh |
The character with hexadecimal value 0xhhhh |
| \t |
The tab character ('\u0009') |
| \n |
The newline (line feed) character \u000A |
| \r |
The carriage-return character \u000D |
| \f |
The form-feed character ('\u000C') |
| \a |
The alert (bell) character ('\u0007') |
| \e |
The escape character ('\u001B') |
| \<operator> |
Any operator symbol escaped (use as non-operator token) |
| Special tokens |
| ^ |
Start of input. |
| $ |
End of input. |
| . |
Any character (wild card). |
| Character Sets and Ranges |
| [abc] |
a, b, or c (simple class) |
| [^abc] |
Any character except a, b, or c (negation) |
| [a-z] |
a through z, inclusive (range) |
| [a-zA-Z] |
a through z or A through Z, inclusive (range) |
| [^a-zA-Z] |
Any character except a through z and A through Z (range and negation) |
| [a-z&&[def]] |
d, e, or f (intersection) |
| [a-z&&[^bc]] |
a through z, except for b and c: [ad-z] (subtraction) |
| [a-z&&[^m-p]] |
a through z, and not m through p: [a-lq-z](subtraction) |
| Predefined Character Classes |
| \d |
A digit: [0-9] |
| \D |
A non-digit: [^0-9] |
| \s |
A whitespace character: [\t\n\x0B\f\r] |
| \S |
A non-whitespace character: [^\s] |
| \w |
A word character: [a-zA-Z_0-9] |
| \W |
A non-word character: [^\w] |