-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
bpo-31170: Write unit test for Expat 2.2.4 UTF-8 bug #3570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The XML file comes from the libexpat bug: libexpat/libexpat#115 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know Chinese. I used other example for reproducing this bug: https://bugs.python.org/issue31303#msg300997.
Or simpler:
xml.etree.ElementTree.XML(b'<a b="' + b'x'*1023 + b'\xc3\xa0"/>')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xml.etree.ElementTree.XML(b'')
Ok, I also added this test. I prefer 2 tests rather than a single one :-) I like to reuse the same XML data which was used to reproduce the initial Expat bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have proposed different test not because I don't know Chinese, but because it is simpler, don't need an external file, and easier extensible. For example you can test with two strings
b'<a b="' + b'\xc3\xa0'*1024 + b'"/>'
b'<a b="x' + b'\xc3\xa0'*1024 + b'"/>'
and be sure that it covers any buffer boundaries in the range of two kilobytes.
The original Chinese example works only for specific buffer size (1 KiB) and specific buffering strategy (attribute value is written from the start of the buffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example you can test with two strings : (...)
Ok, I added these two examples as well.
FYI I tested and b'' (2 KB) worked well on Expat <= 2.2.3, but b'' failed (2 KB + 1).
I added 4 tests. I should now be enough to test for non-regression, no? If you want more tests, IMHO they should be written in Expat, not CPython ;-)
I only wanted to make sure that the fix was applied, not try all corner cases.
tiran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with using the reproducer from the expat bug.
Non-regression tests for the Expat 2.2.3 UTF-8 decoder bug.
Lib/test/test_xml_etree.py
Outdated
| # Check that Expat 2.2.4 fixed the bug. | ||
|
|
||
| # 1 KB buffer | ||
| text = b'x' * 1023 + b'\xc3\xa0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is redundant. The third test supersedes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Lib/test/test_xml_etree.py
Outdated
| text = b'x' * 1023 + b'\xc3\xa0' | ||
| self.check_expat224_utf8_bug(text) | ||
|
|
||
| # 2 KB buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to document these two tests as testing buffer bounds at odd and even positions.
And please use binary prefixes: KiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vstinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New update to take latest @serhiy-storchaka comments in account.
Lib/test/test_xml_etree.py
Outdated
| text = b'x' * 1023 + b'\xc3\xa0' | ||
| self.check_expat224_utf8_bug(text) | ||
|
|
||
| # 2 KB buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Lib/test/test_xml_etree.py
Outdated
| # Check that Expat 2.2.4 fixed the bug. | ||
|
|
||
| # 1 KB buffer | ||
| text = b'x' * 1023 + b'\xc3\xa0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Lib/test/test_xml_etree.py
Outdated
| # | ||
| # Test buffer bounds at odd and even positions. | ||
|
|
||
| # 2 KiB buffer (even) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vice verse. The first test covers the case of buffer bound at odd position, and the second test covers the case of buffer bound at even position.
"2 KiB" is not related. The tests cover all buffer sizes (up to 2 KiB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the comments. "# Test buffer bounds at odd and even positions." is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my comments were confusing.
|
Thanks for the review and advices @serhiy-storchaka! |
|
Thanks @Haypo for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6. |
1 similar comment
|
Thanks @Haypo for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6. |
|
Sorry, @Haypo, I could not cleanly backport this to |
|
GH-3745 is a backport of this pull request to the 2.7 branch. |
|
GH-3746 is a backport of this pull request to the 3.6 branch. |
https://bugs.python.org/issue31170