Skip to content

Commit 6498804

Browse files
authored
[autorevert] treat test failures as successes in job-track Signals (#7399)
Currently, we have a slight overlap between job and test Signals. Specifically, when there is any non-test related failure in the job within the lookback window (e.g. any infra flake), the job signal is extracted, and it extracts ANY job failures, including test failures as the job signal. This potentially leads to the duplication, when the same test failure is processed both as test and job-track Signals, and it increases the chance of false positives. This PR makes the separation explicit: 1. test failures are extracted as test-track Signals 2. job-track Signals deliberately **ignore** job failures caused **exclusively** by tests (such events are extracted as successes) --- The intended outcome of this change: 1. job-track signals will only try to revert infra-breaking changes and ignore test failures 2. test-track signal processing remains as it is
1 parent 904f6d6 commit 6498804

File tree

3 files changed

+62
-7
lines changed

3 files changed

+62
-7
lines changed

aws/lambda/pytorch-auto-revert/SIGNAL_EXTRACTION.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,22 @@ Notes
8282
- Each commit holds a list of `SignalEvent`s (time‑ordered by `started_at`).
8383
Ordering: dicts in Python 3.7+ preserve insertion order. Phase A inserts commit keys in push‑timestamp DESC order, so iterating the mapping yields newest→older commits without extra sorting.
8484

85+
### Test‑track semantics
86+
- Source of truth for SUCCESS/FAILURE is `default.test_run_s3` per test id.
87+
- When a test row exists for an attempt:
88+
- Emit at most one FAILURE if any failed runs exist; at most one SUCCESS if any successful runs exist.
89+
- When no test rows exist for an attempt and any grouped job for that attempt is pending → emit PENDING.
90+
- Otherwise (no test rows and not pending) → no event for that attempt.
91+
92+
### Job‑track semantics (non‑test)
93+
- Build per normalized job base across commits; aggregate shards by `(wf_run_id, run_attempt)`.
94+
- Event mapping per attempt uses aggregated job meta with test‑failure filtering:
95+
- FAILURE only when the attempt had non‑test failures (e.g. infra‑related).
96+
- PENDING when the attempt is still running.
97+
- SUCCESS otherwise, including when failures are exclusively test‑caused (these are handled by test‑track).
98+
- Cancelled attempts are treated as missing (no event).
99+
- Emit a job‑track Signal only when at least one attempt/commit shows a non‑test (infra) failure within the window.
100+
85101
Event naming (for debuggability):
86102
- Consistent key=value format: `wf=<workflow> kind=<test|job> id=<test_id|job_base> run=<wf_run_id> attempt=<run_attempt>`
87103
- Examples:

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_extraction.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -482,16 +482,15 @@ def _build_non_test_signals(
482482
# Map aggregation verdict to outer SignalStatus
483483
if meta.status is None:
484484
continue
485-
if meta.status == AggStatus.FAILURE:
485+
if meta.status == AggStatus.FAILURE and meta.has_non_test_failures:
486486
# mark presence of non-test failures (relevant for job track)
487-
if meta.has_non_test_failures:
488-
has_relevant_failures = True
489-
487+
has_relevant_failures = True
490488
ev_status = SignalStatus.FAILURE
491-
elif meta.status == AggStatus.SUCCESS:
492-
ev_status = SignalStatus.SUCCESS
493-
else:
489+
elif meta.status == AggStatus.PENDING:
494490
ev_status = SignalStatus.PENDING
491+
else:
492+
# Note: when all failures are caused by tests, we do NOT emit job-level failures
493+
ev_status = SignalStatus.SUCCESS
495494

496495
# Extract wf_run_id/run_attempt from the attempt key
497496
_, _, _, wf_run_id, run_attempt = akey

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/tests/test_signal_extraction.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,46 @@ def test_non_test_inclusion_gate(self):
319319
self._find_job_signal(signals_b, "trunk", jobs_b[0].base_name)
320320
)
321321

322+
def test_job_track_treats_test_failures_as_success(self):
323+
# When a base has a non-test (infra) failure somewhere (so a job signal is emitted),
324+
# attempts that fail due to tests should NOT appear as FAILURES in the job track.
325+
# They should be treated as SUCCESS at the job-track level, leaving the failure to test-track.
326+
jobs = [
327+
# Newer commit: infra-caused failure (non-test classification)
328+
J(
329+
sha="Z2",
330+
run=9100,
331+
job=801,
332+
attempt=1,
333+
started_at=ts(self.t0, 20),
334+
conclusion="failure",
335+
rule="infra", # non-test
336+
),
337+
# Older commit: failure caused by tests (test classification)
338+
J(
339+
sha="Z1",
340+
run=9000,
341+
job=800,
342+
attempt=1,
343+
started_at=ts(self.t0, 10),
344+
conclusion="failure",
345+
rule="pytest failure", # test-caused
346+
),
347+
]
348+
349+
signals = self._extract(jobs, tests=[])
350+
base = jobs[0].base_name
351+
sig = self._find_job_signal(signals, "trunk", base)
352+
self.assertIsNotNone(sig)
353+
# Expect commits newest->older
354+
self.assertEqual([c.head_sha for c in sig.commits], ["Z2", "Z1"])
355+
# Newer infra failure remains FAILURE
356+
self.assertEqual(len(sig.commits[0].events), 1)
357+
self.assertEqual(sig.commits[0].events[0].status, SignalStatus.FAILURE)
358+
# Older test-caused failure is mapped to SUCCESS in job track
359+
self.assertEqual(len(sig.commits[1].events), 1)
360+
self.assertEqual(sig.commits[1].events[0].status, SignalStatus.SUCCESS)
361+
322362
def test_commits_without_jobs_are_included(self):
323363
# Verify that commits with no jobs at all are still included in signals
324364
# Simulate case where C2 has a failure, C3 has no jobs (e.g., periodic workflow),

0 commit comments

Comments
 (0)