gh-135676: Simplify docs on lexing names #140464

encukou · 2025-10-22T15:53:31Z

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section.

It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but:

parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators
normalizes the name
validates the name, using the id_start/id_continue sets (referred to in previous sections as “letter-like” and “number-like” characters, with a link to the details)

This also means we don't need xid_start/xid_continue to define the behaviour :)

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--140464.org.readthedocs.build/

Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Blaise Pabon <[email protected]> Co-authored-by: Micha Albert <[email protected]> Co-authored-by: KeithTheEE <[email protected]>

willingc

Outstanding document @encukou. I had one small suggestion to be a bit more explicit on the normalization example with number.

willingc · 2025-10-22T18:24:46Z

Doc/reference/lexical_analysis.rst

+This means that, for example, some typographic variants of characters are
+converted to their "basic" form, for example::
+
+   >>> nᵘₘᵇₑʳ = 3


It would be helpful to add an explicit comment that the normalized form of nᵘₘᵇₑʳis number.

encukou and others added 4 commits October 8, 2025 17:58

Simplify Names section

4606120

Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Blaise Pabon <[email protected]> Co-authored-by: Micha Albert <[email protected]> Co-authored-by: KeithTheEE <[email protected]>

Casing; 3 dots for character ranges

6163c24

Clean-ups

de6d1af

Mention Unicode's *ID_Start* and *ID_Continue*

152e7aa

encukou requested review from AA-Turner and willingc as code owners October 22, 2025 15:53

bedevere-app bot added docs Documentation in the Doc dir skip news labels Oct 22, 2025

github-project-automation bot added this to Docs PRs Oct 22, 2025

github-project-automation bot moved this to Todo in Docs PRs Oct 22, 2025

bedevere-app bot mentioned this pull request Oct 22, 2025

Reword the Lexical Analysis chapter of the docs #135676

Open

bedevere-app bot added the awaiting core review label Oct 22, 2025

StanFromIreland linked an issue Oct 22, 2025 that may be closed by this pull request

Docs: note requirement to normalise unicode identifiers passed to globals() and locals() #86846

Open

willingc approved these changes Oct 22, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-135676: Simplify docs on lexing names #140464

gh-135676: Simplify docs on lexing names #140464

encukou commented Oct 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

willingc left a comment

Uh oh!

willingc Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

gh-135676: Simplify docs on lexing names #140464

Are you sure you want to change the base?

gh-135676: Simplify docs on lexing names #140464

Conversation

encukou commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

willingc Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

encukou commented Oct 22, 2025 •

edited by github-actions bot

Loading