gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

Davda-James · 2025-08-31T12:30:55Z

urllib.parse.parse_qsl earlier it was accepting the illegal characters as well.

Proof (that I reproduce) :

Closes issue : #138284

Proof (after fixing error):

I added the test for it as well.
Test for urlparse only :

All tests:

Passes all tests

Issue: urllib.parse.parse_qsl is accepting illegal characters #138284

… passed

python-cla-bot · 2025-08-31T12:30:59Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2025-08-31T12:31:00Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2025-08-31T12:44:27Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

StanFromIreland · 2025-08-31T12:44:34Z

Lib/urllib/parse.py

@@ -91,6 +91,9 @@
 # Unsafe bytes to be removed per WHATWG spec
 _UNSAFE_URL_BYTES_TO_REMOVE = ['\t', '\r', '\n']

+# Allowed valid characters in parse_qsl
+_VALID_QUERY_CHARS = re.compile(r"^[A-Za-z0-9\-._~!$&'()*+,;=:@/?%]*$") 


This could be replaced with str.isascii, str.isdecimal and a strings with the others, this should be faster.

Okay I will do it and add new commit.

StanFromIreland · 2025-08-31T12:45:13Z

Please add a NEWS entry, and this does break existing code.

…k for performance

bedevere-app · 2025-08-31T12:55:39Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Davda-James · 2025-08-31T13:41:02Z

@StanFromIreland I have added your suggestion. Can you please review it again.
Thank You

shloktech

I am the reporter of the issue which is being solved here. The changes look good to me, and I think they solve the issue very well. Approving from my end.

Note: @Davda-James do get it reviewed by Stan.

Suggestion: Squash your commits below to have a single commit. It is a good practice to have :)

cc: @StanFromIreland

StanFromIreland · 2025-08-31T16:57:47Z

I have requested the expert for this module.

Suggestion: Squash your commits below to have a single commit.

Please do not, it confuses gh making it difficult to review. They will be squashed when merged anyway.

picnixz

Assuming that this change is expected, the following must be done:

The documentation of urllib.parse.parse_qsl must be updated accordingly.
Test coverage must be increased.

I'm not entirely sure that we necessarily need to consider this as a bug fix. The rationale is as follows:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I do not know whether we should consider this is a pitfall or not.

Misc/NEWS.d/next/Library/2025-08-31-13-00-22.gh-issue-138284.6MOp4k.rst

Lib/test/test_urlparse.py

Lib/urllib/parse.py

picnixz · 2025-08-31T17:39:18Z

Lib/urllib/parse.py

@@ -854,6 +866,11 @@ def _unquote(s):
            name, has_eq, value = name_value.partition(eq)
            if not has_eq and strict_parsing:
                raise ValueError("bad query field: %r" % (name_value,))
+            if strict_parsing:
+                # Validate RFC3986 characters
+                to_check = (name_value.decode() if isinstance(name_value, bytes) else name_value)


Use _unquote as this handles the %-encoded values and takes care of the encoding parameter as well.

if strict_parsing:
# Validate RFC3986 characters
to_check = _unquote(name_value)
if isinstance(to_check, (bytes, bytearray)):
to_check = to_check.decode(encoding, errors)
if not _is_valid_rfc3986_query(to_check): using like this is it good as we need to decode back as _unquote returns bytes and _is_valid_rfc3986_query accepts the string ?

…MOp4k.rst Updated it according to suggestion Co-authored-by: Bénédikt Tran <[email protected]>

urllib.parse.parse_qsl now raises ValueError if illegal characters is…

79f25b9

… passed

bedevere-app bot added the awaiting review label Aug 31, 2025

Davda-James changed the title ~~urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986~~ gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 Aug 31, 2025

bedevere-app bot mentioned this pull request Aug 31, 2025

urllib.parse.parse_qsl is accepting illegal characters #138284

Open

fixed the linting

345e86b

StanFromIreland reviewed Aug 31, 2025

View reviewed changes

replaced regex with char.isascii() and char.isalnum() and manual chec…

4934ff2

…k for performance

📜🤖 Added by blurb_it.

cf763db

shloktech approved these changes Aug 31, 2025

View reviewed changes

bedevere-app bot added awaiting core review and removed awaiting review labels Aug 31, 2025

StanFromIreland requested a review from orsenthil August 31, 2025 16:57

picnixz self-requested a review August 31, 2025 17:14

picnixz reviewed Aug 31, 2025

View reviewed changes

Davda-James and others added 2 commits August 31, 2025 23:43

Update Misc/NEWS.d/next/Library/2025-08-31-13-00-22.gh-issue-138284.6…

5cea514

…MOp4k.rst Updated it according to suggestion Co-authored-by: Bénédikt Tran <[email protected]>

changes made as per reviews

f42744b

Uh oh!

gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

Are you sure you want to change the base?

gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

Uh oh!

Conversation

Davda-James commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Aug 31, 2025

Uh oh!

bedevere-app bot commented Aug 31, 2025

Uh oh!

StanFromIreland Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Davda-James Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

StanFromIreland commented Aug 31, 2025

Uh oh!

bedevere-app bot commented Aug 31, 2025

Uh oh!

Davda-James commented Aug 31, 2025

Uh oh!

shloktech left a comment

Choose a reason for hiding this comment

Uh oh!

StanFromIreland commented Aug 31, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Davda-James Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Davda-James commented Aug 31, 2025 •

edited

Loading

python-cla-bot bot commented Aug 31, 2025 •

edited

Loading