Dangerously vague meaning of .len() and .truncate() on strings

There is a number of operations on the built-in string slices and `String` that are specified terms of length or the number of unspecified collection items.
Notably, the implementation of `Collection` method `len` and `String`'s method `truncate`. The documentation of these methods does not say whether the length is in bytes or UTF-8 codepoints; in practice it's bytes. This is hinted, but not said explicitly, in the general description of `str`.

Contrastingly, many popular programming environments such as Java, C#/CLI, and Qt, have similar string operations in terms of UTF-16 characters (UTF-16 is not free of the same issues, but it still generally works as wide-char Unicode for the masses, unless you are into dead or obscure scripts or emoji chat software). These operations are so familiar that many people would use them without looking up their precise definition, and in case of Rust, they may end up being wrong even if the documentation gave all possible warning. Their code will compile and work until it meets its first non-ASCII string, which in some sad cases might not happen until after shipping. Double grief if a mistakenly interpreted value passes into unsafe code and causes hard-to-debug trouble, putting a stain on the image of Rust as a safe language for the developers concerned.

I think this problem could best be mitigated by careful API design. I've got the following suggestions:
- Deprecate `truncate` in favor of `truncate_bytes`, and add `truncate_chars` alongside.
- Move `len` out of the `Collection` trait into a new subtrait `SizedCollection`, which standard strings will **not** implement. The byte length for strings will be always one method call away behind `as_bytes()`, which makes the intent explicit. This trait split will also allow implementations of linked lists without the explicitly maintained size counter, and maybe enable some clever lock-free concurrent collections in the future.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dangerously vague meaning of .len() and .truncate() on strings #350

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dangerously vague meaning of .len() and .truncate() on strings #350

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions