-
Notifications
You must be signed in to change notification settings - Fork 469
Description
What version of regex are you using?
Latest
If it isn't the latest version, then please upgrade and check whether the bug
is still present.
Describe the bug at a high level.
Because regex_syntax is lazily using char::from_u32
not all valid unicode code points are parsed, and this prevents valid regex's from compiling.
Give a brief description of the actual problem you're observing.
Rust defines char as a "Unicode scalar value" and explicitly states that it's similar but not the same as a unicode code point.
The parser is supposed to extract all code points as documented above the function:
https://github.com/rust-lang/regex/blob/master/regex-syntax/src/ast/parse.rs#L1611
What is the expected behavior?
I expect this crate to include custom logic for validating code points, instead relying on char::from_u32
which omits valid code points/surrogate values because they aren't considered scalar values.
Javascript and several other regex engines can handle these fine.