|
| 1 | +- Start Date: (fill me in with today's date, YYYY-MM-DD) |
| 2 | +- RFC PR: (leave this empty) |
| 3 | +- Rust Issue: (leave this empty) |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +Statically enforce that the `std::fmt` module can only create valid UTF-8 data |
| 8 | +by removing the arbitrary `write` method in favor of a `write_str` method. |
| 9 | + |
| 10 | +# Motivation |
| 11 | + |
| 12 | +Today it is conventionally true that the output from macros like `format!` and |
| 13 | +well as implementations of `Show` only create valid UTF-8 data. This is not |
| 14 | +statically enforced, however. As a consequence the `.to_string()` method must |
| 15 | +perform a `str::is_utf8` check before returning a `String`. |
| 16 | + |
| 17 | +This `str::is_utf8` check is currently [one of the most costly parts][bench1] |
| 18 | +of the formatting subsystem while normally just being a redundant check. |
| 19 | + |
| 20 | +[bench1]: https://gist.github.com/alexcrichton/162a5f8f93062800c914 |
| 21 | + |
| 22 | +Additionally, it is possible to statically enforce the convention that `Show` |
| 23 | +only deals with valid unicode, and as such the possibility of doing so should be |
| 24 | +explored. |
| 25 | + |
| 26 | +# Detailed design |
| 27 | + |
| 28 | +The `std::fmt::FormatWriter` trait will be redefined as: |
| 29 | + |
| 30 | +```rust |
| 31 | +pub trait Writer { |
| 32 | + fn write_str(&mut self, data: &str) -> Result; |
| 33 | + fn write_char(&mut self, ch: char) -> Result { |
| 34 | + // default method calling write_str |
| 35 | + } |
| 36 | + fn write_fmt(&mut self, f: &Arguments) -> Result { |
| 37 | + // default method calling fmt::write |
| 38 | + } |
| 39 | +} |
| 40 | +``` |
| 41 | + |
| 42 | +There are a few major differences with today's trait: |
| 43 | + |
| 44 | +* The name has changed to `Writer` in accordance with [RFC 356][rfc356] |
| 45 | +* The `write` method has moved from taking `&[u8]` to taking `&str` instead. |
| 46 | +* A `write_char` method has been added. |
| 47 | + |
| 48 | +[rfc356]: https://github.com/rust-lang/rfcs/blob/master/text/0356-no-module-prefixes.md |
| 49 | + |
| 50 | +The corresponding methods on the `Formatter` structure will also be altered to |
| 51 | +respect these signatures. |
| 52 | + |
| 53 | +The key idea behind this API is that the `Writer` trait only operates on unicode |
| 54 | +data. The `write_str` method is a static enforcement of UTF-8-ness, and using |
| 55 | +`write_char` follows suit as a `char` can only be a valid unicode codepoint. |
| 56 | + |
| 57 | +With this trait definition, the implementation of `Writer` for `Vec<u8>` will be |
| 58 | +removed (note this is *not* the `io::Writer` implementation) in favor of an |
| 59 | +implementation directly on `String`. The `.to_string()` method will change |
| 60 | +accordingly (as well as `format!`) to write directly into a `String`, bypassing |
| 61 | +all UTF-8 validity checks afterwards. |
| 62 | + |
| 63 | +This change [has been implemented][branch] in a branch of mine, and as expected |
| 64 | +the [benchmark numbers have improved][bench2] for the much larger texts. |
| 65 | + |
| 66 | +[branch]: https://github.com/alexcrichton/rust/tree/fmt-text |
| 67 | +[bench2]: https://gist.github.com/alexcrichton/182ccef5d8c2583a2423 |
| 68 | + |
| 69 | +Note that a key point of the changes implemented is that a call to `write!` into |
| 70 | +an arbitrary `io::Writer` is *still valid* as it's still just a sink for bytes. |
| 71 | +The changes outlined in this RFC will only affect `Show` and other formatting |
| 72 | +trait implementations. As can be seen from the sample implementation, the |
| 73 | +fallout is quite minimal with respect to the rest of the standard library. |
| 74 | + |
| 75 | +# Drawbacks |
| 76 | + |
| 77 | +A version of this RFC has been [previously postponed][rfc57], but this variant |
| 78 | +is much less ambitious in terms of generic `TextWriter` support. At this time |
| 79 | +the design of `fmt::Writer` is purposely conservative. |
| 80 | + |
| 81 | +[rfc57]: https://github.com/rust-lang/rfcs/pull/57 |
| 82 | + |
| 83 | +There are currently some use cases today where a `&mut Formatter` is interpreted |
| 84 | +as a `&mut Writer`, e.g. for the `Show` impl of `Json`. This is undoubtedly used |
| 85 | +outside this repository, and it would break all of these users relying on the |
| 86 | +binary functionality of the old `FormatWriter`. |
| 87 | + |
| 88 | +# Alternatives |
| 89 | + |
| 90 | +Another possible solution to specifically the performance problem is to have an |
| 91 | +`unsafe` flag on a `Formatter` indicating that only valid utf-8 data was |
| 92 | +written, and if all sub-parts of formatting set this flag then the data can be |
| 93 | +assumed utf-8. In general relying on `unsafe` apis is less "pure" than relying |
| 94 | +on the type system instead. |
| 95 | + |
| 96 | +The `fmt::Writer` trait can also be located as `io::TextWriter` instead to |
| 97 | +emphasize its possible future connection with I/O, although there are not |
| 98 | +concrete plans today to develop these connections. |
| 99 | + |
| 100 | +# Unresolved questions |
| 101 | + |
| 102 | +* It is unclear to what degree a `fmt::Writer` needs to interact with |
| 103 | + `io::Writer` and the various adaptors/buffers. For example one would have to |
| 104 | + implement their own `BufferedWriter` for a `fmt::Writer`. |
| 105 | + |
0 commit comments