-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Required prerequisites
- Make sure you've read the documentation. Your issue may be addressed there.
- Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- Consider asking first in the Gitter chat room or in a Discussion.
Problem description
Currently, string_caster always creates a temporary PyBytes object by calling PyUnicode_AsEncodedString. However, in the common case of UTF_N == 8, on Python >= 3.3 we can instead use PyUnicode_AsUTF8AndSize, which manages and caches the UTF-8 encoding internally within the PyUnicode object. If the UTF-8 representation is cached, then the encoding does not have to be done at all, avoiding an extra copy of the string.
This is particularly advantageous in the IsView == true case: users likely expect casting from PyUnicode to std::string_view to be low cost, but currently it always involves a copy of the string. Additionally, it the IsView == true case has the additional cost of relying on loader_life_support::add_patient, which introduces additional cost and additional memory allocations (e.g. with the change in #3237, an allocation of the unordered_set bucket array on first use of loader_life_support, and an additional allocation of the node).
Reproducible example code
No response