@@ -50,23 +50,24 @@ SourceCharacter ::
50
50
- "U+0009"
51
51
- "U+000A"
52
52
- "U+000D"
53
- - "U+0020–U+FFFF "
53
+ - "U+0020–U+10FFFF "
54
54
55
55
GraphQL documents are expressed as a sequence of
56
- [ Unicode] ( https://unicode.org/standard/standard.html ) characters. However, with
56
+ [ Unicode] ( https://unicode.org/standard/standard.html ) code points (informally
57
+ referred to as * "characters"* through most of this specification). However, with
57
58
few exceptions, most of GraphQL is expressed only in the original non-control
58
59
ASCII range so as to be as widely compatible with as many existing tools,
59
60
languages, and serialization formats as possible and avoid display issues in
60
61
text editors and source control.
61
62
63
+ Note: Non-ASCII Unicode code points may freely appear within {StringValue} and
64
+ {Comment} tokens.
65
+
62
66
63
67
### Unicode
64
68
65
69
UnicodeBOM :: "Byte Order Mark (U+FEFF)"
66
70
67
- Non-ASCII Unicode characters may freely appear within {StringValue} and
68
- {Comment} portions of GraphQL.
69
-
70
71
The "Byte Order Mark" is a special Unicode character which
71
72
may appear at the beginning of a file containing Unicode which programs may use
72
73
to determine the fact that the text stream is Unicode, what endianness the text
@@ -804,13 +805,20 @@ StringValue ::
804
805
- ` """ ` BlockStringCharacter* ` """ `
805
806
806
807
StringCharacter ::
807
- - SourceCharacter but not ` " ` or \ or LineTerminator
808
- - \u EscapedUnicode
809
- - \ EscapedCharacter
808
+ - SourceCharacter but not ` " ` or ` \ ` or LineTerminator
809
+ - ` \u ` EscapedUnicode
810
+ - ` \ ` EscapedCharacter
811
+
812
+ EscapedUnicode ::
813
+ - HexDigit HexDigit HexDigit HexDigit
814
+ - ` { ` HexDigit+ ` } ` "but only if <= 0x10FFFF"
810
815
811
- EscapedUnicode :: /[ 0-9A-Fa-f] {4}/
816
+ HexDigit :: one of
817
+ - ` 0 ` ` 1 ` ` 2 ` ` 3 ` ` 4 ` ` 5 ` ` 6 ` ` 7 ` ` 8 ` ` 9 `
818
+ - ` A ` ` B ` ` C ` ` D ` ` E ` ` F `
819
+ - ` a ` ` b ` ` c ` ` d ` ` e ` ` f `
812
820
813
- EscapedCharacter :: one of ` " ` \ ` / ` b f n r t
821
+ EscapedCharacter :: one of ` " ` ` \ ` ` / ` ` b ` ` f ` ` n ` ` r ` ` t `
814
822
815
823
BlockStringCharacter ::
816
824
- SourceCharacter but not ` """ ` or ` \""" `
@@ -825,9 +833,9 @@ be interpreted as the beginning of a block string. As an example, the source
825
833
{` """""" ` } can only be interpreted as a single empty block string and not three
826
834
empty strings.
827
835
828
- Non-ASCII Unicode characters are allowed within single-quoted strings.
829
- Since {SourceCharacter} must not contain some ASCII control characters, escape
830
- sequences must be used to represent these characters. The {` \ ` }, {` " ` }
836
+ Non-ASCII Unicode characters are allowed within single-quoted strings.
837
+ Since {SourceCharacter} must not contain some ASCII control characters, escape
838
+ sequences must be used to represent these characters. The {` \ ` }, {` " ` }
831
839
characters also must be escaped. All other escape sequences are optional.
832
840
833
841
** Block Strings**
@@ -892,32 +900,49 @@ StringValue :: `""`
892
900
893
901
StringValue :: ` " ` StringCharacter+ ` " `
894
902
895
- * Return the Unicode character sequence of all {StringCharacter}
896
- Unicode character values.
897
-
898
- StringCharacter :: SourceCharacter but not ` " ` or \ or LineTerminator
899
-
900
- * Return the character value of {SourceCharacter}.
901
-
902
- StringCharacter :: \u EscapedUnicode
903
-
904
- * Return the character whose code unit value in the Unicode Basic Multilingual
905
- Plane is the 16-bit hexadecimal value {EscapedUnicode}.
906
-
907
- StringCharacter :: \ EscapedCharacter
908
-
909
- * Return the character value of {EscapedCharacter} according to the table below.
910
-
911
- | Escaped Character | Code Unit Value | Character Name |
912
- | ----------------- | --------------- | ---------------------------- |
913
- | ` " ` | U+0022 | double quote |
914
- | ` \ ` | U+005C | reverse solidus (back slash) |
915
- | ` / ` | U+002F | solidus (forward slash) |
916
- | ` b ` | U+0008 | backspace |
917
- | ` f ` | U+000C | form feed |
918
- | ` n ` | U+000A | line feed (new line) |
919
- | ` r ` | U+000D | carriage return |
920
- | ` t ` | U+0009 | horizontal tab |
903
+ * Let {string} be the sequence of all {StringCharacter} code points.
904
+ * For each {point} at {index} in {string}:
905
+ * If {codePoint} is >= 0xD800 and <= 0xDBFF (a [ * High Surrogate* ] ( https://unicodebook.readthedocs.io/unicode_encodings.html#utf-16-surrogate-pairs ) ):
906
+ * Let {lowCodePoint} be the code point at {index} + {1} in {string}.
907
+ * If {lowCodePoint} is not >= 0xDC00 and <= 0xDFFF (a [ * Low Surrogate* ] ( https://unicodebook.readthedocs.io/unicode_encodings.html#utf-16-surrogate-pairs ) ):
908
+ * Raise a parse error (a * High Surrogate* must be followed by a * Low Surrogate* ).
909
+ * Let {decodedPoint} = ({codePoint} - 0xD800) × 0x400 + ({lowCodePoint} - 0xDC00) + 0x10000.
910
+ * Within {string}, replace {codePoint} and {lowCodePoint} with {decodedPoint}.
911
+ * If {codePoint} is >= 0xDC00 and <= 0xDFFF (a [ * Low Surrogate* ] ( https://unicodebook.readthedocs.io/unicode_encodings.html#utf-16-surrogate-pairs ) ):
912
+ * Raise a parse error (a * Low Surrogate* must follow a * High Surrogate* ).
913
+ * Return {string}.
914
+
915
+ Note: {StringValue} should avoid encoding code points as surrogate pairs.
916
+ While services must interpret them accordingly, a bracked escape (for example
917
+ ` "\u{1F4A9}" ` ) is a clearer way to encode code points outside of the
918
+ [ Basic Multilingual Plane] ( https://unicodebook.readthedocs.io/unicode.html#bmp ) .
919
+
920
+ StringCharacter :: SourceCharacter but not ` " ` or ` \ ` or LineTerminator
921
+
922
+ * Return the code point {SourceCharacter}.
923
+
924
+ StringCharacter :: ` \u ` EscapedUnicode
925
+
926
+ * Let {value} be the 21-bit hexadecimal value represented by the sequence of
927
+ {HexDigit} within {EscapedUnicode}.
928
+ * Assert {value} <= 0x10FFFF.
929
+ * Return the code point {value}.
930
+
931
+ StringCharacter :: ` \ ` EscapedCharacter
932
+
933
+ * Return the code point represented by {EscapedCharacter} according to the
934
+ table below.
935
+
936
+ | Escaped Character | Code Point | Character Name |
937
+ | ----------------- | ---------- | ---------------------------- |
938
+ | ` " ` | U+0022 | double quote |
939
+ | ` \ ` | U+005C | reverse solidus (back slash) |
940
+ | ` / ` | U+002F | solidus (forward slash) |
941
+ | ` b ` | U+0008 | backspace |
942
+ | ` f ` | U+000C | form feed |
943
+ | ` n ` | U+000A | line feed (new line) |
944
+ | ` r ` | U+000D | carriage return |
945
+ | ` t ` | U+0009 | horizontal tab |
921
946
922
947
StringValue :: ` """ ` BlockStringCharacter* ` """ `
923
948
0 commit comments