|
1 | 1 | r[lex.whitespace]
|
2 | 2 | # Whitespace
|
3 | 3 |
|
| 4 | +r[whitespace.syntax] |
| 5 | +```grammar,lexer |
| 6 | +@root WHITESPACE -> |
| 7 | + // end of line |
| 8 | + LF |
| 9 | + | U+000B // vertical tabulation |
| 10 | + | U+000C // form feed |
| 11 | + | CR |
| 12 | + | U+0085 // Unicode next line |
| 13 | + | U+2028 // Unicode LINE SEPARATOR |
| 14 | + | U+2029 // Unicode PARAGRAPH SEPARATOR |
| 15 | + // Ignorable Code Point |
| 16 | + | U+200E // Unicode LEFT-TO-RIGHT MARK |
| 17 | + | U+200F // Unicode RIGHT-TO-LEFT MARK |
| 18 | + // horizontal whitespace |
| 19 | + | TAB |
| 20 | + | U+0020 // space ' ' |
| 21 | +
|
| 22 | +TAB -> U+0009 // horizontal tab ('\t') |
| 23 | +
|
| 24 | +LF -> U+000A // line feed ('\n') |
| 25 | +
|
| 26 | +CR -> U+000D // carriage return ('\r') |
| 27 | +``` |
| 28 | + |
4 | 29 | r[lex.whitespace.intro]
|
5 | 30 | Whitespace is any non-empty string containing only characters that have the
|
6 |
| -[`Pattern_White_Space`] Unicode property, namely: |
7 |
| - |
8 |
| -- `U+0009` (horizontal tab, `'\t'`) |
9 |
| -- `U+000A` (line feed, `'\n'`) |
10 |
| -- `U+000B` (vertical tab) |
11 |
| -- `U+000C` (form feed) |
12 |
| -- `U+000D` (carriage return, `'\r'`) |
13 |
| -- `U+0020` (space, `' '`) |
14 |
| -- `U+0085` (next line) |
15 |
| -- `U+200E` (left-to-right mark) |
16 |
| -- `U+200F` (right-to-left mark) |
17 |
| -- `U+2028` (line separator) |
18 |
| -- `U+2029` (paragraph separator) |
| 31 | +[`Pattern_White_Space`] Unicode property. |
19 | 32 |
|
20 | 33 | r[lex.whitespace.token-sep]
|
21 | 34 | Rust is a "free-form" language, meaning that all forms of whitespace serve only
|
|
0 commit comments