-
-
Notifications
You must be signed in to change notification settings - Fork 33.4k
GH-132554: Specialize GET_ITER and FOR_ITER for range
#135063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-132554: Specialize GET_ITER and FOR_ITER for range
#135063
Conversation
GET_ITER and FOR_ITER for rangeGET_ITER and FOR_ITER for range
Python/bytecodes.c
Outdated
| } | ||
| else { | ||
| PyObject *iter_o = PyStackRef_AsPyObjectBorrow(iter); | ||
| next = _PyForIter_NextWithIndex(iter_o, null_or_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _PyForIter_NextWithIndex handles the exact lists and exact tuples.
In this PR we have the code to handle range iteration by pushing the index and limit to the stack. Could we simplify _PyForIter_NextWithIndex to only deal with lists and for tuples push the index and length of the tuple to the stack (e.g. use the same approach as range)?
(if this is possible maybe in a followup PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you proposing pushing a third value to the stack during iteration?
I doubt that would be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Might not be worth it indeed, but I will check once this pr has settled.
|
Micro benchmarks look good: ScriptThe |
|
Performance is varied. Linux is unchanged, Windows shows an insignificant 0.3% speedup and Mac shows a suspiciously high 8% speedup. The stats show that |
| uint16_t tp_versions_used; | ||
| /* Returns the object at the index given, or NULL if out-of-bounds | ||
| * Never raises an exception. */ | ||
| iterindexfunc tp_iterindex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment above states:
If this structure is modified, Doc/includes/typestruct.h should be updated
Should that document be updated?
Extends the idea of "virtual iterators" to ranges as well. Most ranges have a step of one. For these ranges we can treat them much like a C for loop, using tagged integers for the current value and the limit.
The stack during iteration now looks like this:
GET_ITERis specialized for the above cases plus any iterable withPy_TYPE(self)->tp_iter == PyObject_SelfIterwhich avoids the call toPyObject_GetIter; simply pushingNULLinstead.An "indexable item" is one that has fast indexing operations that can used instead of creating a new iterator object.
This includes
list,tuple,bytes,bytearrayandstrobjects.Also fixes stats for
FOR_ITERandGET_ITER.