Skip to content

Conversation

@sergey-miryanov
Copy link
Contributor

@sergey-miryanov sergey-miryanov commented Oct 14, 2025

As requested by @vstinner I have added a separate PR with two functions: PyTuple_MakeSingle and PyTuple_MakePair.


📚 Documentation preview 📚: https://cpython-previews--140132.org.readthedocs.build/

@sergey-miryanov sergey-miryanov marked this pull request as draft October 14, 2025 20:51
@sergey-miryanov
Copy link
Contributor Author

@vstinner I have made requested changes. Please take a look.

@sergey-miryanov sergey-miryanov marked this pull request as ready for review October 14, 2025 21:01
@sergey-miryanov
Copy link
Contributor Author

Done, please take a look.

@sergey-miryanov
Copy link
Contributor Author

sergey-miryanov commented Oct 15, 2025

Microbenchmarks:
Windows 11, i5-11600K @ 3.90GHz

+----------------+---------+-----------------+-----------------------+-----------------------+-----------------------+
| Benchmark      | t       | s               | p                     | a                     | m                     |
+================+=========+=================+=======================+=======================+=======================+
| tuple-1        | 12.8 ns | not significant | 12.2 ns: 1.06x faster | 12.4 ns: 1.04x faster | 11.8 ns: 1.09x faster |
+----------------+---------+-----------------+-----------------------+-----------------------+-----------------------+
| tuple-2        | 14.5 ns | not significant | 13.3 ns: 1.09x faster | 13.2 ns: 1.10x faster | 12.3 ns: 1.18x faster |
+----------------+---------+-----------------+-----------------------+-----------------------+-----------------------+
| Geometric mean | (ref)   | 1.01x slower    | 1.07x faster          | 1.07x faster          | 1.13x faster          |
+----------------+---------+-----------------+-----------------------+-----------------------+-----------------------+

Ubuntu 24.04, gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, same cpu, built with lto enabled

+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Benchmark      | t       | s                     | p                     | a                     | m                     |
+================+=========+=======================+=======================+=======================+=======================+
| tuple-1        | 15.7 ns | 14.5 ns: 1.08x faster | 12.7 ns: 1.24x faster | 11.5 ns: 1.36x faster | 11.2 ns: 1.40x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| tuple-2        | 22.6 ns | 20.1 ns: 1.12x faster | 16.0 ns: 1.41x faster | 14.2 ns: 1.59x faster | 15.0 ns: 1.51x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Geometric mean | (ref)   | 1.10x faster          | 1.32x faster          | 1.47x faster          | 1.45x faster          |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+

t - PyTuple_New + PyTuple_SetItem
s - PyTuple_New + PyTuple_SET_ITEM
p - PyTuple_Pack
a - PyTuple_FromArray
m - PyTuple_Make[Single,Pair]

Microbenchmarks - sergey-miryanov@07f7c6a

run scripts

bench_tuple.py

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 2):
    func = functools.partial(_testcapi.bench_tuple, size)
    runner.bench_time_func(f'tuple-{size}', func)

bench_steal.py

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 2):
    func = functools.partial(_testcapi.bench_tuple_steal, size)
    runner.bench_time_func(f'tuple-{size}', func)

bench_pack.py

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 2):
    func = functools.partial(_testcapi.bench_tuple_pack, size)
    runner.bench_time_func(f'tuple-{size}', func)

bench_from_array.py

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 2):
    func = functools.partial(_testcapi.bench_tuple_from_array, size)
    runner.bench_time_func(f'tuple-{size}', func)

bench_make.py

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 2):
    func = functools.partial(_testcapi.bench_tuple_make, size)
    runner.bench_time_func(f'tuple-{size}', func)

@eendebakpt
Copy link
Contributor

In the microbenchmarks it seems odd that PyTuple_Pack is faster than PyTuple_New + PyTuple_SET_ITEM. I can think of no reasons for this. Which optimization settings did you use? Looking at the implementation of the microbenchmarks: maybe for PyTuple_Pack the code PyObject *one = PyLong_FromLong(0); is optimized away. What happens if you move the PyLong_FromLong outside the loop?

@sergey-miryanov sergey-miryanov marked this pull request as draft October 22, 2025 10:04
@sergey-miryanov
Copy link
Contributor Author

sergey-miryanov commented Oct 23, 2025

Microbenchmark results from Windows machine (Windows 11, i5-11600K @ 3.90GHz)
Results for Tuple(Long) and Tuple(Long, Long) - tuple will not be tracked.

+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Benchmark      | n       | s                     | p                     | a                     | m                     |
+================+=========+=======================+=======================+=======================+=======================+
| tuple-1        | 14.1 ns | not significant       | 9.79 ns: 1.44x faster | 9.39 ns: 1.50x faster | 8.62 ns: 1.64x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| tuple-2        | 15.7 ns | 16.1 ns: 1.02x slower | 12.3 ns: 1.28x faster | 13.1 ns: 1.20x faster | 12.1 ns: 1.30x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Geometric mean | (ref)   | 1.01x slower          | 1.36x faster          | 1.34x faster          | 1.46x faster          |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+

Results for Tuple(EmptyTuple) and Tuple(EmptyTuple, EmptyTuple) - tuple will be tracked (EmptyTuple - is a special case - we don't allocate it)

+----------------+---------+-----------------------+-----------------------+-----------------+-----------------------+
| Benchmark      | tn      | ts                    | tp                    | ta              | tm                    |
+================+=========+=======================+=======================+=================+=======================+
| tuple-1        | 14.4 ns | not significant       | 14.1 ns: 1.02x faster | not significant | 13.9 ns: 1.04x faster |
+----------------+---------+-----------------------+-----------------------+-----------------+-----------------------+
| tuple-2        | 15.1 ns | 15.7 ns: 1.04x slower | 16.0 ns: 1.06x slower | not significant | 14.5 ns: 1.04x faster |
+----------------+---------+-----------------------+-----------------------+-----------------+-----------------------+
| Geometric mean | (ref)   | 1.02x slower          | 1.02x slower          | 1.00x slower    | 1.04x faster          |
+----------------+---------+-----------------------+-----------------------+-----------------+-----------------------+

Notes:

  1. We can see a significant effect of tracking and not tracking of tuples.
  2. We can see that version with PyTuple_Make[Single, Pair] much better for not trackable tuples, and get not much gain for trackable ones.
  3. We can see that SET_ITEM version is slower than SetItem one, I don't understand why.

Linux benchmarks will be a bit later.

Benchmarks here - https://github.com/sergey-miryanov/cpython/tree/140052-pytuple-make-pair-bench

@vstinner
Copy link
Member

I don't know how to read your benchmark. What are the "a", "p", "s", etc. columns?

@vstinner
Copy link
Member

Oh, I suppose that letters are the same from previous benchmark: #140132 (comment)

@sergey-miryanov
Copy link
Contributor Author

Yes, they are the same (except n then in new benchmarks means PyTuple_New + PyTuple_SetItem).
Sorry.

n - PyTuple_New + PyTuple_SetItem
s - PyTuple_New + PyTuple_SET_ITEM
p - PyTuple_Pack
a - PyTuple_FromArray
m - PyTuple_Make[Single,Pair]

tn, ts, tp, ta, tm - for version where internal tuple's item is an EmptyTuple.

@sergey-miryanov
Copy link
Contributor Author

Microbenchmark results from Linux (Ubuntu 24.04, gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, i5-11600K @ 3.90GHz)
Results for Tuple(Long) and Tuple(Long, Long) - tuple will not be tracked.

+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Benchmark      | n       | s                     | p                     | a                     | m                     |
+================+=========+=======================+=======================+=======================+=======================+
| tuple-1        | 11.9 ns | 11.8 ns: 1.01x faster | 9.26 ns: 1.29x faster | 8.37 ns: 1.43x faster | 8.70 ns: 1.37x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| tuple-2        | 17.2 ns | 16.5 ns: 1.04x faster | 12.6 ns: 1.37x faster | 11.1 ns: 1.56x faster | 10.7 ns: 1.60x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Geometric mean | (ref)   | 1.03x faster          | 1.33x faster          | 1.49x faster          | 1.48x faster          |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+

Results for Tuple(EmptyTuple) and Tuple(EmptyTuple, EmptyTuple):

+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Benchmark      | tn      | ts                    | tp                    | ta                    | tm                    |
+================+=========+=======================+=======================+=======================+=======================+
| tuple-1        | 12.7 ns | 12.2 ns: 1.04x faster | 10.9 ns: 1.17x faster | 9.70 ns: 1.31x faster | 9.94 ns: 1.28x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| tuple-2        | 18.1 ns | 17.4 ns: 1.04x faster | 13.8 ns: 1.31x faster | 11.6 ns: 1.56x faster | 12.0 ns: 1.51x faster |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+
| Geometric mean | (ref)   | 1.04x faster          | 1.24x faster          | 1.43x faster          | 1.39x faster          |
+----------------+---------+-----------------------+-----------------------+-----------------------+-----------------------+v

Notes:

  1. On Linux I see that tracking and not tracking of the tuples doesn't have much difference.
  2. Version with PyTuple_Make[Single,Pair] is a bit slower than version with PyTuple_FromArray, I suspect it is because PyTuple_Make[Single,Pair] are not covered by PGO + LTO.
  3. Build params:
./configure --enable-optimizations --with-lto=full --prefix=/home/msn/work/cpython/installed/tuple-make-pair-bench

Legend:

n - PyTuple_New + PyTuple_SetItem
s - PyTuple_New + PyTuple_SET_ITEM
p - PyTuple_Pack
a - PyTuple_FromArray
m - PyTuple_Make[Single,Pair]

tn, ts, tp, ta, tm - for version where internal tuple's item is an EmptyTuple.

@sergey-miryanov
Copy link
Contributor Author

@eendebakpt I have updated microbenchmarks. Could you please take a look? Are they fair enough now?

@sergey-miryanov sergey-miryanov marked this pull request as ready for review October 24, 2025 05:13
@sergey-miryanov
Copy link
Contributor Author

@vstinner This is ready for review. Could you please take a look?

Comment on lines +127 to +129
# because we only check type for gc support can't untrack tuple of
# immutable tuples, see maybe_tracked
self.assertTrue(gc.is_tracked(make_single((1, 2))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it'll be better not to check this, since we can make another decision in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a small disagreement here.
I think we should explicitly fix the current behavior in the tests.
@efimov-mikhail does not want to fix the behavior that may change in the near future because doing so would require changes to the tests.

We need another opinion on this.

@eendebakpt
Copy link
Contributor

@eendebakpt I have updated microbenchmarks. Could you please take a look? Are they fair enough now?

The benchmarks seem fine, although I still find the results surprising (why is PyTuple_Pack faster than PyTuple_New + PyTuple_SET_ITEM? maybe it depends on whether the object one you are adding in the performance tests is not tracked by the GC).

But I am +1 on the PR, even if we don't go looking into tiny performance details: the methods PyTuple_MakeSingle/PyTuple_MakePair have a clean interface, they can be used quite a bit in the codebase and are faster than the alternative PyTuple_Pack.

@vstinner
Copy link
Member

I created capi-workgroup/decisions#84 decision issue for the C API Working Group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants