Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Sep 14, 2017

@vstinner
Copy link
Member Author

The XML file comes from the libexpat bug: libexpat/libexpat#115

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know Chinese. I used other example for reproducing this bug: https://bugs.python.org/issue31303#msg300997.

Or simpler:

xml.etree.ElementTree.XML(b'<a b="' + b'x'*1023 + b'\xc3\xa0"/>')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xml.etree.ElementTree.XML(b'')

Ok, I also added this test. I prefer 2 tests rather than a single one :-) I like to reuse the same XML data which was used to reproduce the initial Expat bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have proposed different test not because I don't know Chinese, but because it is simpler, don't need an external file, and easier extensible. For example you can test with two strings

b'<a b="' + b'\xc3\xa0'*1024 + b'"/>'
b'<a b="x' + b'\xc3\xa0'*1024 + b'"/>'

and be sure that it covers any buffer boundaries in the range of two kilobytes.

The original Chinese example works only for specific buffer size (1 KiB) and specific buffering strategy (attribute value is written from the start of the buffer).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example you can test with two strings : (...)

Ok, I added these two examples as well.

FYI I tested and b'' (2 KB) worked well on Expat <= 2.2.3, but b'' failed (2 KB + 1).

I added 4 tests. I should now be enough to test for non-regression, no? If you want more tests, IMHO they should be written in Expat, not CPython ;-)

I only wanted to make sure that the fix was applied, not try all corner cases.

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with using the reproducer from the expat bug.

Non-regression tests for the Expat 2.2.3 UTF-8 decoder bug.
# Check that Expat 2.2.4 fixed the bug.

# 1 KB buffer
text = b'x' * 1023 + b'\xc3\xa0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is redundant. The third test supersedes it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

text = b'x' * 1023 + b'\xc3\xa0'
self.check_expat224_utf8_bug(text)

# 2 KB buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to document these two tests as testing buffer bounds at odd and even positions.

And please use binary prefixes: KiB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member Author

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New update to take latest @serhiy-storchaka comments in account.

text = b'x' * 1023 + b'\xc3\xa0'
self.check_expat224_utf8_bug(text)

# 2 KB buffer
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Check that Expat 2.2.4 fixed the bug.

# 1 KB buffer
text = b'x' * 1023 + b'\xc3\xa0'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

#
# Test buffer bounds at odd and even positions.

# 2 KiB buffer (even)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vice verse. The first test covers the case of buffer bound at odd position, and the second test covers the case of buffer bound at even position.

"2 KiB" is not related. The tests cover all buffer sizes (up to 2 KiB).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comments. "# Test buffer bounds at odd and even positions." is enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my comments were confusing.

@vstinner vstinner merged commit e6d9fcb into python:master Sep 25, 2017
@vstinner vstinner deleted the expat224_utf8_bug branch September 25, 2017 08:27
@vstinner
Copy link
Member Author

Thanks for the review and advices @serhiy-storchaka!

@miss-islington
Copy link
Contributor

Thanks @Haypo for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6.
🐍🍒⛏🤖

1 similar comment
@miss-islington
Copy link
Contributor

Thanks @Haypo for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @Haypo, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker e6d9fcbb8d0c325e57df08ae8781aafedb71eca2 2.7

@bedevere-bot
Copy link

GH-3745 is a backport of this pull request to the 2.7 branch.

@bedevere-bot
Copy link

GH-3746 is a backport of this pull request to the 3.6 branch.

vstinner added a commit that referenced this pull request Sep 25, 2017
Non-regression tests for the Expat 2.2.3 UTF-8 decoder bug.

(cherry picked from commit e6d9fcb)
vstinner added a commit that referenced this pull request Sep 25, 2017
Non-regression tests for the Expat 2.2.3 UTF-8 decoder bug.

(cherry picked from commit e6d9fcb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants