Skip to content

[BUG]: string_caster should not use temporary on Python >= 3.3 #3252

@jbms

Description

@jbms

Required prerequisites

Problem description

Currently, string_caster always creates a temporary PyBytes object by calling PyUnicode_AsEncodedString. However, in the common case of UTF_N == 8, on Python >= 3.3 we can instead use PyUnicode_AsUTF8AndSize, which manages and caches the UTF-8 encoding internally within the PyUnicode object. If the UTF-8 representation is cached, then the encoding does not have to be done at all, avoiding an extra copy of the string.

This is particularly advantageous in the IsView == true case: users likely expect casting from PyUnicode to std::string_view to be low cost, but currently it always involves a copy of the string. Additionally, it the IsView == true case has the additional cost of relying on loader_life_support::add_patient, which introduces additional cost and additional memory allocations (e.g. with the change in #3237, an allocation of the unordered_set bucket array on first use of loader_life_support, and an additional allocation of the node).

Reproducible example code

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions