Skip to content

Conversation

mhsmith
Copy link
Member

@mhsmith mhsmith commented Aug 31, 2025

--randomize can discover real problems, but they're often ordering dependencies between tests, which are difficult to diagnose, and usually have nothing to do with the PR on which they occur.

I assume this was the reason for removing --fail-rerun from the --fast-ci and --slow-ci arguments in #110849. However, this renders --randomize almost useless, because the failing test will usually pass on the rerun in a fresh process, and nobody will ever know that there was a failure.

Also, --rerun without --fail-rerun means that a test which ALWAYS fails the first time and passes the second time will still be treated as a pass. This seems unsafe.

So I propose removing --randomize and restoring --fail-rerun.

This will also allow iOS and Android to switch to --fast-ci on GitHub Actions and --slow-ci on the buildbots. They were previously unable to do this because of the frequent failures caused by --randomize, which were not hidden on the rerun because these platforms use --single-process mode.

@mhsmith mhsmith requested a review from vstinner August 31, 2025 18:17
@mhsmith mhsmith requested a review from freakboy3742 as a code owner August 31, 2025 18:17
@mhsmith mhsmith added skip news needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Aug 31, 2025
@mhsmith mhsmith removed the request for review from freakboy3742 August 31, 2025 18:17
@picnixz picnixz added the infra CI, GitHub Actions, buildbots, Dependabot, etc. label Aug 31, 2025
@mhsmith mhsmith changed the title Change CI arguments: remove --randomize, add --fail-rerun gh-137242: Change CI arguments: remove --randomize, add --fail-rerun Aug 31, 2025
@mhsmith
Copy link
Member Author

mhsmith commented Aug 31, 2025

Also, --rerun without --fail-rerun means that a test which ALWAYS fails the first time and passes the second time will still be treated as a pass.

It looks like some tests are already doing that on some runners. I'll switch this PR back to draft until this has been resolved.

@mhsmith mhsmith marked this pull request as draft August 31, 2025 18:57
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike changing CI defaults.

if ns.use_mp is None:
ns.use_mp = 0
ns.randomize = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep randomization for --fast-ci and --slow-ci options.

Copy link
Member

@vstinner vstinner Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer adding an option to disable randomization, option which could be used with --fast-ci / --slow-ci.

ns.fail_env_changed = True
if ns.python is None:
ns.rerun = True
ns.fail_rerun = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already pass --fail-rerun. Making it the default would make all CIs way more strict, I dislike this idea. When I tried a few years ago, I discovered tons of flaky tests and it was a pain to fix all of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra CI, GitHub Actions, buildbots, Dependabot, etc. needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants