Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@Nayef211
Copy link
Contributor

@Nayef211 Nayef211 commented Mar 2, 2023

Summary:

  • PyBind11 throws an error when decoding a C++ std::string which contains incomplete UTF8 byte sequences since the default UTF8 conversion uses "strict" error handling (ref)
  • To resolve user issues (see post) we set the error handling to "ignore" which ignores the malformed data and continues decoding the string

Differential Revision: D43361716

fbshipit-source-id: 4ac488e4b4b894c8049728941a2ee36b1799258a

Summary:
- PyBind11 throws an error when decoding a C++ `std::string` which contains incomplete UTF8 byte sequences since the default UTF8 conversion uses `"strict"` error handling ([ref](https://docs.python.org/3/library/codecs.html#error-handlers))
- To resolve user issues (see [post](https://fb.workplace.com/groups/pytorchtext/permalink/899318121386487/)) we set the error handling to `"ignore"` which ignores the malformed data and continues decoding the string

Differential Revision: D43361716

fbshipit-source-id: 4ac488e4b4b894c8049728941a2ee36b1799258a
Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for doing this @Nayef211

@joecummings joecummings merged commit 7968ef0 into pytorch:release/0.15 Mar 2, 2023
@Nayef211 Nayef211 deleted the cp_3b30889 branch March 2, 2023 15:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants