Skip to content

Conversation

Davda-James
Copy link

@Davda-James Davda-James commented Aug 31, 2025

urllib.parse.parse_qsl earlier it was accepting the illegal characters as well.

Proof (that I reproduce) :
Before_Fix

Closes issue : #138284

Proof (after fixing error):
After_fix

I added the test for it as well.
Test for urlparse only :
test

All tests:
all_tests

  • Passes all tests

@python-cla-bot
Copy link

python-cla-bot bot commented Aug 31, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Davda-James Davda-James changed the title urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 Aug 31, 2025
@bedevere-app
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@@ -91,6 +91,9 @@
# Unsafe bytes to be removed per WHATWG spec
_UNSAFE_URL_BYTES_TO_REMOVE = ['\t', '\r', '\n']

# Allowed valid characters in parse_qsl
_VALID_QUERY_CHARS = re.compile(r"^[A-Za-z0-9\-._~!$&'()*+,;=:@/?%]*$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced with str.isascii, str.isdecimal and a strings with the others, this should be faster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will do it and add new commit.

@StanFromIreland
Copy link
Member

Please add a NEWS entry, and this does break existing code.

@bedevere-app
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Davda-James
Copy link
Author

@StanFromIreland I have added your suggestion. Can you please review it again.
Thank You

Copy link

@shloktech shloktech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the reporter of the issue which is being solved here. The changes look good to me, and I think they solve the issue very well. Approving from my end.

Note: @Davda-James do get it reviewed by Stan.

Suggestion: Squash your commits below to have a single commit. It is a good practice to have :)

Image

cc: @StanFromIreland

@StanFromIreland
Copy link
Member

I have requested the expert for this module.

Suggestion: Squash your commits below to have a single commit.

Please do not, it confuses gh making it difficult to review. They will be squashed when merged anyway.

@picnixz picnixz self-requested a review August 31, 2025 17:14
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that this change is expected, the following must be done:

  • The documentation of urllib.parse.parse_qsl must be updated accordingly.
  • Test coverage must be increased.

I'm not entirely sure that we necessarily need to consider this as a bug fix. The rationale is as follows:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I do not know whether we should consider this is a pitfall or not.

@@ -854,6 +866,11 @@ def _unquote(s):
name, has_eq, value = name_value.partition(eq)
if not has_eq and strict_parsing:
raise ValueError("bad query field: %r" % (name_value,))
if strict_parsing:
# Validate RFC3986 characters
to_check = (name_value.decode() if isinstance(name_value, bytes) else name_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use _unquote as this handles the %-encoded values and takes care of the encoding parameter as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if strict_parsing:
# Validate RFC3986 characters
to_check = _unquote(name_value)
if isinstance(to_check, (bytes, bytearray)):
to_check = to_check.decode(encoding, errors)
if not _is_valid_rfc3986_query(to_check): using like this is it good as we need to decode back as _unquote returns bytes and _is_valid_rfc3986_query accepts the string ?

Davda-James and others added 2 commits August 31, 2025 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants