Check precise Windows `GIX_TEST_IGNORE_ARCHIVES` expectations on CI #1663

EliahKagan · 2024-11-09T01:18:42Z

This modifies the CI test-fixtures-windows job introduced in #1657, by making it so that, instead of checking that no more than the expected number of test cases fail, it checks that the exact test cases expected to fail are the ones that do fail. (The other check done before as part of this job, that no tests report errors, is still done.) The approach taken is:

To collect actual failures, parse the XML output saved by cargo nextest to get the package name and test name of each failing test and save them, one per line, to a file.

Getting this information is pretty straightforward in PowerShell. When iterating through testcase nodes, which are filtered by the presence of a failure subnode, $_.name gives the name of the test case, including :: qualifications within the containing package, and $_.classname gives the name of the containing package. Thus "$($_.classname) $($_.name)" identifies a test in the same way it is identified in human-readable cargo nextest output.

The unintuitive attribute name classname is because cargo nextest XML is in the JUnit format, which originated for Java tests where test cases are methods in classes. An alternative approach is to use $_.ParentNode.name instead of $_.classname. I've verified that this works too, but I don't know of any strong reason to prefer one over another, so I went with the more compact $_.classname.
To collect expected failures, use gh (the GitHub CLI) to download the text of #1358, and parse the first ```text code block with regex, extracting the list of failing tests and saving them, one per line, to a file.

One of the tests currently listed there, gix-ref-tests::refs packed::iter::performance, is a performance test that seems not to fail on CI (anymore?), though I still find it to fail locally. If it were to start failing on CI, we would want to know about it. So, before these changes, it was not included in the count. I've carried that over to the precise matching done here: tests that appear to be performance tests due to having performance in their names (not as part of any longer \w+ word) are not regarded as expected to fail.

It is not obvious that the general approach of downloading the information from an issue is the best one. Although I like this approach and I think it's not too cumbersome, this is admittedly the less elegant step. Other approaches, including hard-coding the expected failures in the workflow or another file, could be used if this approach is not wanted.
To compare them, use git diff --no-index with various other options, printing the entire list of failures as context if they differ. The job fails if there are any differences.

Because this neither a required check for PR auto-merge nor a dependency of one, I think failing not only if tests unexpectedly fail but also if any unexpectedly pass is unlikely to cause any significant problems, and that knowing whenever the set of failing tests changes in any way is worthwhile.

I intentionally excluded performance tests late in the process, so that there would be a commit whose CI results could be inspected that would verify that the check is capable of failing when the diff is nonempty (and that the output would display in a useful way, with effective colorization, etc., in this situation). That can be observed in this run output and compared to passing runs, such as the runs in this PR.

This modifies the `test-fixtures-windows` job that tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that no more than 14 failures occur, it checks that the failing tests are exactly those that are documented in GitoxideLabs#1358 as expected to fail. The initial check that no tests have *error* status is preserved, with only stylistic changes, and kept separate from the subsequent logic so that the output is clearer. The new steps are no longer conditional on `nextest` having exited with a failure status, since (a) that was probably unnecessary before and definitely unnecessary now, (b) at last for now, the comparison is precise, so it would be strange to pass if the diff were to have changes on *all* lines, and (c) this makes it slightly less likely that GitoxideLabs#1358 will accidentally stay open even once fixed. The current approach is to actually retrieve the list of tests expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from the GitoxideLabs#1358 issue body. This has the advantage that it automatically keeps up to date with changes made to that issue description, but this is of course not the only possible approach for populating the expected value. Two changes should be made before this is ready: - As noted in the "FIXME" comment, the job should currently fail becuase the performance test reported to fail in GitoxideLabs#1358 is not being filtered out from the expected failures list. It's left in as of this commit, to verify that the job is capable of failing. (After that, the performance test should either be filtered out or removed from the list in GitoxideLabs#1358, but the former approach is currently preferable because I have not done diverse enough testing to check if the failure on my main Windows system is due to that system being too slow rather than a performance bug.) - The scratchwork file should be removed once no longer needed.

This is to fix the error: gh: To use GitHub CLI in a GitHub Actions workflow, set the GH_TOKEN environment variable. Example: env: GH_TOKEN: ${{ github.token }} InvalidOperation: D:\a\_temp\ba9d92b7-3a94-4f7c-b541-d19003f40e19.ps1:9 Line | 9 | $expected_failures = $match_info.Matches.Groups[1].Value -split "`n" … | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | Cannot index into a null array. Error: Process completed with exit code 1.

Including color in the diff of expected vs. actual failed tests makes the output easier to see (much as color helps in `nextest`). This commit also makes some stylistic changes to the command so it is easier to read.

This omits tests containing `performance` (and not as part of a larger "word", not even with `_`) from being expected to fail on CI with `GIX_TEST_IGNORE_ARCHIVES=1` on Windows. Currently there is one such test listed in GitoxideLabs#1358, `gix-ref-tests::refs packed::iter::performance`.

The relevant code is in the `test-fixtures-windows` CI job and it is working (both to fail the job when there is a mismatch and to have the job succeed when there is agreement).

Byron

Thanks a lot for making this happen!

I am very impressed by what can be done with powershell, it clearly is .NET in a shell :).

And indeed, like predicted, reading the text block from the issue breaks isolation too much and is the reason I'd love to see these extracted into a file, for example if that's OK with you.

Thanks again.

.github/workflows/ci.yml

Instead of retrieving them from GitoxideLabs#1358. (See discussion in GitoxideLabs#1663.)

Byron

Wonderful, thanks so much!

Since GitoxideLabs#1663, the `test-fixtures-windows` CI job checks actual failures against a list of specific tests that are known to fail on Windows when `GIX_TEST_IGNORE_ARCHIVES=1`. It is therefore capable of providing useful information about new failures, or newly passing tests that should be removed from the list, if the job ever does fail. The job also seems not to fail. This is to say that while GitoxideLabs#1358 is not fixed, the `test-fixtures-windows` job has a very low rate of failure and, if it does fail, something new and interesting would be happening such that we would want to know about it and probably not immediately merge a PR that caused it without checking how and why that happened. This adds `test-fixtures-windows` to the list of jobs that are dependencies of a required check for branch protection based PR auto-merge.

On CI, `test-fixtures-windows` does `GIX_TEST_IGNORE_ARCHIVES=1` test runs on Windows, which have a number of known and expected failures, as described in GitoxideLabs#1358. Since GitoxideLabs#1663, these are listed in `etc/test-fixtures-windows-expected-failures-see-issue-1358.txt` and `test-fixtures-windows` compares those expected failures to the actual failures of a run. If there are any differences, it shows a unified diff with full context and fails the run. The default Git foreground color scheme has been used for the diff, where any tests that are no longer failing are shown as `-` lines in red, and any tests that are newly failing are shown as `+` lines in green. These are the usual default `git diff` colors; we did not pick those colors specifically to be meaningful to this scenario. While the use of `-` and `+` lines and the specific unified diff format chosen are intuitive and informative, this is less so for the color scheme. That is for two reasons: 1. By default and unless another theme is chosen explicitly or selected based on device settings, the GitHub web interface uses a light theme. In this theme, as currently coded/styled, on the pale background in GitHub Actions, foreground normal-styled green text can be hard to distinguish visually from foreground normal-style black text. So it is often not immediately clear which test failures are new. (New test failures are the typical way a `test-fixtures-windows` job fails, when it does.) 2. A test that is no longer failing would usually mean that a test or implementation bug has been fixed, or that compatibility has otherwise been improved. `test-fixtures-windows` fails on these so that we attend to them, usually by dropping newly passing tests from the list of expected failures. Likewise, a test that is failing, especially a newly failing tests, usually if not always means that something is not right, even if it is merely the test itself that needs to be improved. But showing things that have gotten better in red, and things that have gotten worse in green, is unintuitive. This is partly because these colors have the opposite meaning in numerous contexts, including in some parts of the GitHub Actions interface. It is more specifically because they have the opposite meaning in the context of previous step output, where `nextest` shows stderr in red (and specifically shows stderr on tests that have failed), shows `FAIL` in red, and shows `PASS` in green. It may be possible to make red and green show up better. These colors are clear in the `nextest` output. However, the options for `diff` colors, at least when configuring them with `color.diff.*`, are limited. Furthermore, that would only improve (1), not (2). This bolds both `-` and `+` lines, shows `-` lines in magenta, and shows `+` lines in blue. These colors seem to be visible on both light and dark backgrounds, and they seem to be at least as distinguishable from each other as are red and green, in general. This change only applies to the specific diff that shows newly passing and newly failing tests in `test-fixtures-windows`.

EliahKagan added 6 commits November 8, 2024 16:32

Colorize the diff

63473bc

Including color in the diff of expected vs. actual failed tests makes the output easier to see (much as color helps in `nextest`). This commit also makes some stylistic changes to the command so it is easier to read.

Remove scratchwork

b2ce048

The relevant code is in the `test-fixtures-windows` CI job and it is working (both to fail the job when there is a mismatch and to have the job succeed when there is agreement).

Clarify comment and code style

067e7d2

Byron requested changes Nov 9, 2024

View reviewed changes

.github/workflows/ci.yml Show resolved Hide resolved

.github/workflows/ci.yml Outdated Show resolved Hide resolved

Use a file with test-fixtures-windows expected failures

99238a7

Instead of retrieving them from GitoxideLabs#1358. (See discussion in GitoxideLabs#1663.)

EliahKagan requested a review from Byron November 9, 2024 08:32

Byron approved these changes Nov 9, 2024

View reviewed changes

Byron merged commit 1df68e4 into GitoxideLabs:main Nov 9, 2024
17 checks passed

EliahKagan deleted the run-ci/test-fixtures-windows-precise branch November 9, 2024 09:34

EliahKagan mentioned this pull request Nov 11, 2024

Improve CI permissions, auto-merge maintainability, and clarity #1668

Merged

EliahKagan mentioned this pull request Jan 22, 2025

Make test-fixtures-windows required for PR auto-merge #1793

Merged

EliahKagan mentioned this pull request Feb 19, 2025

3 new Windows GIX_TEST_IGNORE_ARCHIVES=1 failures with Git 2.48.1 #1849

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Check precise Windows `GIX_TEST_IGNORE_ARCHIVES` expectations on CI #1663

Check precise Windows `GIX_TEST_IGNORE_ARCHIVES` expectations on CI #1663

Uh oh!

EliahKagan commented Nov 9, 2024 •

edited

Loading

Uh oh!

Byron left a comment

Uh oh!

Uh oh!

Uh oh!

Byron left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check precise Windows GIX_TEST_IGNORE_ARCHIVES expectations on CI #1663

Check precise Windows GIX_TEST_IGNORE_ARCHIVES expectations on CI #1663

Uh oh!

Conversation

EliahKagan commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Check precise Windows `GIX_TEST_IGNORE_ARCHIVES` expectations on CI #1663

Check precise Windows `GIX_TEST_IGNORE_ARCHIVES` expectations on CI #1663

EliahKagan commented Nov 9, 2024 •

edited

Loading