Skip to content

Commit 5d2ca02

Browse files
committed
Create Whitespace grammar productions
This does not create any new productions, instead preferring comments. #1974 will involve pulling out the horizontal whitespace into a separate production. Comment wording (and casing) is modeled off of https://www.unicode.org/reports/tr31/#R3a. I left off a "unicode" prefix for ASCII items as they are likely common enough in that context that specifying them as "unicode" could cause more confusion.
1 parent f82156b commit 5d2ca02

File tree

2 files changed

+26
-19
lines changed

2 files changed

+26
-19
lines changed

src/input-format.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,6 @@ r[input.syntax]
66
@root CHAR -> <a Unicode scalar value>
77
88
NUL -> U+0000
9-
10-
TAB -> U+0009
11-
12-
LF -> U+000A
13-
14-
CR -> U+000D
159
```
1610

1711
r[input.intro]

src/whitespace.md

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,34 @@
11
r[lex.whitespace]
22
# Whitespace
33

4+
r[whitespace.syntax]
5+
```grammar,lexer
6+
@root WHITESPACE ->
7+
// end of line
8+
LF
9+
| U+000B // vertical tabulation
10+
| U+000C // form feed
11+
| CR
12+
| U+0085 // Unicode next line
13+
| U+2028 // Unicode LINE SEPARATOR
14+
| U+2029 // Unicode PARAGRAPH SEPARATOR
15+
// Ignorable Code Point
16+
| U+200E // Unicode LEFT-TO-RIGHT MARK
17+
| U+200F // Unicode RIGHT-TO-LEFT MARK
18+
// horizontal whitespace
19+
| TAB
20+
| U+0020 // space ' '
21+
22+
TAB -> U+0009 // horizontal tab ('\t')
23+
24+
LF -> U+000A // line feed ('\n')
25+
26+
CR -> U+000D // carriage return ('\r')
27+
```
28+
429
r[lex.whitespace.intro]
530
Whitespace is any non-empty string containing only characters that have the
6-
[`Pattern_White_Space`] Unicode property, namely:
7-
8-
- `U+0009` (horizontal tab, `'\t'`)
9-
- `U+000A` (line feed, `'\n'`)
10-
- `U+000B` (vertical tab)
11-
- `U+000C` (form feed)
12-
- `U+000D` (carriage return, `'\r'`)
13-
- `U+0020` (space, `' '`)
14-
- `U+0085` (next line)
15-
- `U+200E` (left-to-right mark)
16-
- `U+200F` (right-to-left mark)
17-
- `U+2028` (line separator)
18-
- `U+2029` (paragraph separator)
31+
[`Pattern_White_Space`] Unicode property.
1932

2033
r[lex.whitespace.token-sep]
2134
Rust is a "free-form" language, meaning that all forms of whitespace serve only

0 commit comments

Comments
 (0)