Investigation: Why are pystats changing in unexpected ways?

We are still seeing unexpected results in the pystats diffs.

@markshannon suggested I look at a [recent PR to add a globals to constants pass](https://github.com/python/cpython/pull/114592) where there should be *some* changes, but not to the level that we are seeing.  [The original results stats diff for that PR](https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20240118-3.13.0a3%2B-35a92cc-PYTHON_UOPS/bm-20240118-azure-x86_64-faster%252dcpython-globals_to_constants-3.13.0a3%2B-35a92cc-pystats-vs-base.md) didn't have the per-benchmark results, so I [re-ran it](https://github.com/faster-cpython/benchmarking/blob/main/results/bm-20240207-3.13.0a3%2B-28e6201-PYTHON_UOPS/bm-20240207-azure-x86_64-mdboom-pystats_test2-3.13.0a3%2B-28e6201-pystats-vs-base.md).

These two sets of results (Mark's run, and my later run of the same commits) are in strong agreement, so there doesn't seem to be anything attributable to randomness or things that change between runs.  I also ruled out problems with summation (i.e. the totals across all benchmarks not being equal to the sum of all benchmarks).  I also don't think there is cross-benchmark contamination -- each benchmark is run with a separate invocation of `pyperformance`, and the `/tmp/py_stats` directory is empty in between (I added some asserts to the run to confirm this).

Drilling down on the numbers, the most changed uop in terms of execution count is `TO_BOOL_ALWAYS_TRUE`:

| Name | Base Count | Head Count | Change |
| -- | -- | -- | -- |
| TO_BOOL_ALWAYS_TRUE | 12,145,706 | 30,824,186 | 153.8% |

This difference is entirely attributable to two benchmarks:

| Benchmark | Base | Head |
| -- | -- | -- |
| go | 5,840 | 129,400 |
| pycparser | 11,120,400 | 29675320 |

The `go` one is nice to work with because it has no dependencies.  Running that benchmark 10 times against the head and base branches produces these numbers exactly every time, so I don't think there is anything non-deterministic in the benchmark.

The other thing that I think @markshannon mentioned should be completely unchanged by the PR is the optimization attempts.

There are many more benchmarks that contribute to this change:

| Benchmark | Base | Head |
| -- | -- | -- |
| async_generators | 1060 | 1260 |
| asyncio_websockets | 420 | 480 |
| concurrent_imap | 4462 | 4465 |
| dask | 4274 | 4249 |
| deltablue | 440 | 18900 |
| docutils | 11920 | 11980 |
| genshi | 35560 | 35640 |
| go | 860 | 74920 |
| html5lib | 1020 | 1040 |
| mypy2 | 16536 | 16597 |
| pycparser | 1200 | 3320 |
| regex_v8 | 1560 | 2340 |
| sqlglot | 3280 | 3320 |
| sqlglot_optimize | 5160 | 5220 |
| sqlglot_parse | 380 | 440 |
| sqlglot_transpile | 1340 | 1400 |
| sympy | 13798 | 13903 |
| tornado_http | 1080 | 1140 |
| typing_runtime_protocols | 700 | 780 |

Again, looking at the `go` benchmark, I can reproduce these numbers exactly locally in isolation.

Since "optimization attempts" are counted in "JUMP_BACKWARD" (when reaching a threshold), I also compared that, and I get the following Tier 1 counts for "JUMP_BACKWARD":

|  | Base | Head |
| -- | -- | -- |
| Optimization attempts | 860 | 74920 |
| JUMP_BACKWARD | 14860 | 28402880 |

These numbers are not proportional, but they do at least move in the same direction.

I did confirm the obvious that the benchmark is doing the same amount of work and running the same number of times in both cases (just with adding `print`s and counting).

I'm completely stumped as to why that PR changes the number of JUMP_BACKWARD and thus optimization attempts -- it doesn't seem like that should be affected at all.  But it does seem like that could be the cause of a lot of changes "downstream".

I've created a [gist to reproduce this](https://gist.github.com/mdboom/8e3268ed9fd4c792855f6e574afc68b1) that may be helpful.  Provided a path to a CPython checkout with a `--enable-pystats` build, it runs the `go` benchmark and reports on the optimization attempts and number of executions of `JUMP_BACKWARD`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigation: Why are pystats changing in unexpected ways? #652

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark	Base	Head
async_generators	1060	1260
asyncio_websockets	420	480
concurrent_imap	4462	4465
dask	4274	4249
deltablue	440	18900
docutils	11920	11980
genshi	35560	35640
go	860	74920
html5lib	1020	1040
mypy2	16536	16597
pycparser	1200	3320
regex_v8	1560	2340
sqlglot	3280	3320
sqlglot_optimize	5160	5220
sqlglot_parse	380	440
sqlglot_transpile	1340	1400
sympy	13798	13903
tornado_http	1080	1140
typing_runtime_protocols	700	780

Investigation: Why are pystats changing in unexpected ways? #652

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions