@@ -10,8 +10,8 @@ Abstract
1010
1111In CPython, the compilation from source code to bytecode involves several steps:
1212
13- 1. Parse source code into a parse tree (:file: `Parser/pgen .c `)
14- 2. Transform parse tree into an Abstract Syntax Tree (:file: `Python/ast .c `)
13+ 1. Tokenize the source code (:file: `Parser/tokenizer .c `)
14+ 2. Parse the stream of tokens into an Abstract Syntax Tree (:file: `Parser/parser .c `)
15153. Transform AST into a Control Flow Graph (:file: `Python/compile.c `)
16164. Emit bytecode based on the Control Flow Graph (:file: `Python/compile.c `)
1717
@@ -23,49 +23,18 @@ in terms of the how the entire system works. You will most likely need
2323to read some source to have an exact understanding of all details.
2424
2525
26- Parse Trees
27- -----------
26+ Parsing
27+ -------
2828
29- Python's parser is an LL(1) parser mostly based off of the
30- implementation laid out in the Dragon Book [Aho86 ]_.
29+ As of Python 3.9, Python's parser is a PEG parser of a somewhat
30+ unusual design (since its input is a stream of tokens rather than a
31+ stream of characters as is more common with PEG parsers).
3132
32- The grammar file for Python can be found in :file: `Grammar/Grammar ` with the
33- numeric value of grammar rules stored in :file: `Include/graminit.h `. The
34- list of types of tokens (literal tokens, such as ``: ``, numbers, etc.) can
35- be found in :file: `Grammar/Tokens ` with the numeric value stored in
36- :file: `Include/token.h `. The parse tree is made up
37- of ``node * `` structs (as defined in :file: `Include/node.h `).
38-
39- Querying data from the node structs can be done with the following
40- macros (which are all defined in :file: `Include/node.h `):
41-
42- ``CHILD(node *, int) ``
43- Returns the nth child of the node using zero-offset indexing
44- ``RCHILD(node *, int) ``
45- Returns the nth child of the node from the right side; use
46- negative numbers!
47- ``NCH(node *) ``
48- Number of children the node has
49- ``STR(node *) ``
50- String representation of the node; e.g., will return ``: `` for a
51- ``COLON `` token
52- ``TYPE(node *) ``
53- The type of node as specified in :file: `Include/graminit.h `
54- ``REQ(node *, TYPE) ``
55- Assert that the node is the type that is expected
56- ``LINENO(node *) ``
57- Retrieve the line number of the source code that led to the
58- creation of the parse rule; defined in :file: `Python/ast.c `
59-
60- For example, consider the rule for 'while':
61-
62- .. productionlist ::
63- while_stmt: "while" `expression ` ":" `suite ` : ["else" ":" `suite `]
64-
65- The node representing this will have ``TYPE(node) == while_stmt `` and
66- the number of children can be 4 or 7 depending on whether there is an
67- 'else' statement. ``REQ(CHILD(node, 2), COLON) `` can be used to access
68- what should be the first ``: `` and require it be an actual ``: `` token.
33+ The grammar file for Python can be found in
34+ :file: `Grammar/python.gram `. The numeric values for literal tokens
35+ (such as ``: ``, numbers, etc.) can be found in :file: `Grammar/Tokens `.
36+ Various C files, including :file: `Parser/parser.c ` are generated from
37+ these (see :doc: `grammar `).
6938
7039
7140Abstract Syntax Trees (AST)
@@ -569,10 +538,6 @@ thanks to having to support both classic and new-style classes.
569538References
570539----------
571540
572- .. [Aho86 ] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman.
573- `Compilers: Principles, Techniques, and Tools `,
574- https://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108
575-
576541.. [Wang97 ] Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris
577542 S. Serra. `The Zephyr Abstract Syntax Description Language. `_
578543 In Proceedings of the Conference on Domain-Specific Languages, pp.
0 commit comments