Skip to content

Conversation

@liigo
Copy link
Contributor

@liigo liigo commented Jun 23, 2016

Closes #34318

@rust-highfive
Copy link
Contributor

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@tbu-
Copy link
Contributor

tbu- commented Jun 23, 2016

Is there precedence of other programming languages doing this? From the top of my head, I only know that Python doesn't.

I don't think this is a good idea, we shouldn't impose English upon the user. If you have, say, a Windows in German, then it is expected that all strings by the operating system actually are in German.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 23, 2016
@liigo
Copy link
Contributor Author

liigo commented Jun 24, 2016

@tbu- Yes, I'm Chinese, I would prefer reading OS error strings in Chinese. But Rust doesn't really print OS error strings in Chinese, it print them as \u{xxxx}\u{yyyy}..., which I can't read. English is better than \u{xxxx}\u{yyyy}..., at least. See #34318.

@tbu-
Copy link
Contributor

tbu- commented Jun 25, 2016

Maybe we should rather escape less characters in the Debug implementation of &str.

@retep998
Copy link
Contributor

I'd rather disable escaping only for OS error strings specifically, not for all strings.

@tbu-
Copy link
Contributor

tbu- commented Jun 25, 2016

@retep998 I believe Python has a good strategy here, and they don't escape most Unicode symbols, except for weird ones like zero-width space (U+200B):

Python 3.5.1 (default, <timestamp>) 
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "ä", "\u00e4" , "\u200b"
('ä', 'ä', '\u200b')

Equivalent Rust:

fn main() {
    println!("{:?} {:?}", "ä", "\u{200b}");
}

Outputs:

"\u{e4}" "\u{200b}"

@alexcrichton
Copy link
Member

Thanks for the PR @liigo! The libs team got a chance to talk about this today and the conclusion was that we're going to go with a solution like #34485 instead to keep localized error strings but try to escape fewer characters.

bors added a commit that referenced this pull request Jul 28, 2016
Escape fewer Unicode codepoints in `Debug` impl of `str`

Use the same procedure as Python to determine whether a character is
printable, described in [PEP 3138]. In particular, this means that the
following character classes are escaped:

- Cc (Other, Control)
- Cf (Other, Format)
- Cs (Other, Surrogate), even though they can't appear in Rust strings
- Co (Other, Private Use)
- Cn (Other, Not Assigned)
- Zl (Separator, Line)
- Zp (Separator, Paragraph)
- Zs (Separator, Space), except for the ASCII space `' '` `0x20`

This allows for user-friendly inspection of strings that are not
English (e.g. compare `"\u{e9}\u{e8}\u{ea}"` to `"éèê"`).

Fixes #34318.
CC #34422.

[PEP 3138]: https://www.python.org/dev/peps/pep-3138/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants