-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
bpo-30708: Check for null characters in PyUnicode_AsWideCharString(). #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-30708: Check for null characters in PyUnicode_AsWideCharString(). #2285
Conversation
Raise a ValueError if the second argument is NULL and the wchar_t\* string contains null characters.
|
@serhiy-storchaka, thanks for your PR! By analyzing the history of the files in this pull request, we identified @birkenfeld, @zooba and @bitdancer to be potential reviewers. |
| } | ||
| if (size == NULL && wcslen(wstr) != (size_t)buflen) { | ||
| PyErr_SetString(PyExc_ValueError, | ||
| "embedded null character"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message doesn't really have a lot of information and could be a bit confusing. I'd suggest changing it to something like String has an embedded null character but size parameter was not provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a standard error message raised in a lot of places. Adding "but size parameter was not provided" only will confuse the end user/Python programmer, because at Python level there is no any size parameter. From Python view this exception just means that a string containing '\0' is passed to a function which rejects strings containing '\0'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a standard error message but PyUnicode_AsWideCharString is part of the public C API, from the perspective of someone writing an extension this error would be pretty confusing, I guess someone calling it from interpreter code itself could quickly realize why this error was thrown digging into the code but I don't see how this would be more confusing if enough detail was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception is for the users of extensions, not for the authors of extensions. PyArg_ParseTuple() raises the same exception, as well as some other functions. Using PyArg_ParseTuple() or PyUnicode_AsWideCharString() or other way for converting Python objects to C values is an implementation detail insignificant for the users of extensions. Words about a size parameter don't make sense at Python level.
The direct users of C API can read the PyUnicode_AsWideCharString() documentation when they get this exception (as well as they can read the PyArg_ParseTuple() documentation when they get the similar exception in PyArg_ParseTuple()). It is explicitly documented.
Objects/unicodeobject.c
Outdated
| memcpy(buffer, wstr, buflen * sizeof(wchar_t)); | ||
| if (size != NULL) | ||
| *size = buflen; | ||
| *size = buflen - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this way of incrementing buflen to make room for the terminator and then setting this to buflen - 1 requires a bit of thinking when reading this at a glance. Maybe move setting the size to before the buflen is incremented but this isn't a big issue.
…ing(). (pythonGH-2285) Raise a ValueError if the second argument is NULL and the wchar_t\* string contains null characters.. (cherry picked from commit e613e6a)
…harString(). (pythonGH-2285) (pythonGH-2443) Raise a ValueError if the second argument is NULL and the wchar_t\* string contains null characters.. (cherry picked from commit e613e6a). (cherry picked from commit 0edffa3)
…ng(). (GH-2285) (GH-2443) (#2448) And use it instead of PyUnicode_AsWideCharString() if appropriate. _PyUnicode_AsWideCharString(unicode) is like PyUnicode_AsWideCharString(unicode, NULL), but raises a ValueError if the wchar_t* string contains null characters. (cherry picked from commit e613e6a). (cherry picked from commit 0edffa3)
Raise a
ValueErrorif the second argument isNULLand thewchar_t*string contains null characters.