Skip to content

Commit 355694d

Browse files
chore(ci_visibility): move retry logic to pytest_runtest_protocol (#13376)
Currently we let pytest's builtin `pytest_runtest_protocol` hook to run the test, and we check whether to retry at the makereport stage. This has a number of consequences: - We don't have access to the setup, call, teardown reports all at once; we see one at a time during `makereport`, and we have to patch them and stash information around to keep state across each time we pass through the reports for a given test. - pytest logs each report as it is created, so we have to patch them at the right time so they get printed correctly (to e.g. change the outcome from `FAILED` to `ATR INITIAL ATTEMPT FAILED`; this affects not only printing, but also the session exit status). - In particular, for EFD when the initial attempt passes, we run too late and the `PASSED` status was already logged by the time we patch it, so it shows `PASSED` instead of `EFD INITIAL ATTEMPT PASSED`. Not only that, but we have to [handle this case specially](https://github.com/DataDog/dd-trace-py/blob/0efbb6c3cc6ab4c59a838771f2db92f1b729e87a/ddtrace/contrib/internal/pytest/_efd_utils.py#L237) when generating the terminal summary. This PR moves the retry logic from the `pytest_runtest_makereport` hook to the `pytest_runtest_protocol` hook. This means we _replace_ pytest's own `pytest_runtest_protocol` with our own. We invoke pytest's internal `runtestprotocol` function directly from our hook, so the behavior of our hook is similar to the pytest's own hook. The difference is that we call this function with `log=False`, so pytest doesn't log the setup, call, teardown reports as they are created. Instead, we collect all reports, patch them as needed, and _then_ print them out. This means we can write the logic having full knowledge of the final status of a test run, instead of patching things as we see them during setup, call and teardown. For retries, the responsibility for logging the statuses is moved to the retry handlers themselves, so they can delay printing to after the reports have been patched. In principle, we could even decide to _not_ print the retry results individually and only print the final status (which would make for a cleaner output), but this can come in a future version. For EFD, the special final outcomes (`dd_efd_final_passed`, etc.) are replaced with plain `passed`, `failed`, `skipped` states, which xdist can handle, and the final states are only used in `efd_get_teststatus` (called from the `pytest_report_teststatus` hook). Future work: - Attempt to Fix has to be modified in similar ways to EFD, but it also has to handle quarantine, so it's a bit more involved. - xdist still prints `EFD INITIAL ATTEMPT` for all atempts (not just the first one). - The whole retry logic outside of pytest should be refactored (see #13224), but this PR is a first step to make the rest possible. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
1 parent 01bb3a5 commit 355694d

File tree

7 files changed

+238
-186
lines changed

7 files changed

+238
-186
lines changed

ddtrace/contrib/internal/pytest/_atr_utils.py

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,18 @@
44
import pytest
55

66
from ddtrace.contrib.internal.pytest._retry_utils import RetryOutcomes
7+
from ddtrace.contrib.internal.pytest._retry_utils import RetryReason
8+
from ddtrace.contrib.internal.pytest._retry_utils import UserProperty
79
from ddtrace.contrib.internal.pytest._retry_utils import _get_outcome_from_retry
810
from ddtrace.contrib.internal.pytest._retry_utils import _get_retry_attempt_string
911
from ddtrace.contrib.internal.pytest._retry_utils import set_retry_num
1012
from ddtrace.contrib.internal.pytest._types import _pytest_report_teststatus_return_type
1113
from ddtrace.contrib.internal.pytest._types import pytest_TestReport
1214
from ddtrace.contrib.internal.pytest._utils import _PYTEST_STATUS
15+
from ddtrace.contrib.internal.pytest._utils import TestPhase
1316
from ddtrace.contrib.internal.pytest._utils import _get_test_id_from_item
1417
from ddtrace.contrib.internal.pytest._utils import _TestOutcome
18+
from ddtrace.contrib.internal.pytest._utils import get_user_property
1519
from ddtrace.ext.test_visibility.api import TestStatus
1620
from ddtrace.internal.logger import get_logger
1721
from ddtrace.internal.test_visibility._internal_item_ids import InternalTestId
@@ -50,11 +54,14 @@ class _QUARANTINE_ATR_RETRY_OUTCOMES(_ATR_RETRY_OUTCOMES):
5054
def atr_handle_retries(
5155
test_id: InternalTestId,
5256
item: pytest.Item,
53-
when: str,
54-
original_result: pytest_TestReport,
57+
test_reports: t.Dict[str, pytest_TestReport],
5558
test_outcome: _TestOutcome,
5659
is_quarantined: bool = False,
5760
):
61+
setup_report = test_reports.get(TestPhase.SETUP)
62+
call_report = test_reports.get(TestPhase.CALL)
63+
teardown_report = test_reports.get(TestPhase.TEARDOWN)
64+
5865
if is_quarantined:
5966
retry_outcomes = _QUARANTINE_ATR_RETRY_OUTCOMES
6067
final_outcomes = _QUARANTINE_FINAL_OUTCOMES
@@ -70,11 +77,14 @@ def atr_handle_retries(
7077
XPASS=retry_outcomes.ATR_ATTEMPT_FAILED,
7178
)
7279

80+
item.ihook.pytest_runtest_logreport(report=setup_report)
81+
7382
# Overwrite the original result to avoid double-counting when displaying totals in final summary
74-
if when == "call":
83+
if call_report:
7584
if test_outcome.status == TestStatus.FAIL:
76-
original_result.outcome = outcomes.FAILED
77-
return
85+
call_report.outcome = outcomes.FAILED
86+
87+
item.ihook.pytest_runtest_logreport(report=call_report)
7888

7989
atr_outcome = _atr_do_retries(item, outcomes)
8090
longrepr = InternalTest.stash_get(test_id, "failure_longrepr")
@@ -83,19 +93,14 @@ def atr_handle_retries(
8393
nodeid=item.nodeid,
8494
location=item.location,
8595
keywords={k: 1 for k in item.keywords},
86-
when="call",
96+
when=TestPhase.CALL,
8797
longrepr=longrepr,
8898
outcome=final_outcomes[atr_outcome],
89-
user_properties=item.user_properties + [("dd_retry_reason", "auto_test_retry")],
99+
user_properties=item.user_properties + [(UserProperty.RETRY_REASON, RetryReason.AUTO_TEST_RETRY)],
90100
)
91101
item.ihook.pytest_runtest_logreport(report=final_report)
92102

93-
94-
def get_user_property(report, key, default=None):
95-
for k, v in report.user_properties:
96-
if k == key:
97-
return v
98-
return default
103+
item.ihook.pytest_runtest_logreport(report=teardown_report)
99104

100105

101106
def atr_get_failed_reports(terminalreporter: _pytest.terminal.TerminalReporter) -> t.List[pytest_TestReport]:
@@ -127,12 +132,12 @@ def _atr_write_report_for_status(
127132
markedup_strings: t.List[str],
128133
color: str,
129134
delete_reports: bool = True,
130-
retry_reason: str = "auto_test_retry",
135+
retry_reason: str = RetryReason.AUTO_TEST_RETRY,
131136
):
132137
reports = [
133138
report
134139
for report in terminalreporter.getreports(report_outcome)
135-
if get_user_property(report, "dd_retry_reason") == retry_reason
140+
if get_user_property(report, UserProperty.RETRY_REASON) == retry_reason
136141
]
137142
markup_kwargs = {color: True}
138143
if reports:

ddtrace/contrib/internal/pytest/_attempt_to_fix.py

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from ddtrace.contrib.internal.pytest._retry_utils import set_retry_num
1111
from ddtrace.contrib.internal.pytest._types import _pytest_report_teststatus_return_type
1212
from ddtrace.contrib.internal.pytest._types import pytest_TestReport
13+
from ddtrace.contrib.internal.pytest._utils import TestPhase
1314
from ddtrace.contrib.internal.pytest._utils import _get_test_id_from_item
1415
from ddtrace.contrib.internal.pytest._utils import _TestOutcome
1516
from ddtrace.contrib.internal.pytest.constants import USER_PROPERTY_QUARANTINED
@@ -41,10 +42,14 @@ class _RETRY_OUTCOMES:
4142
def attempt_to_fix_handle_retries(
4243
test_id: InternalTestId,
4344
item: pytest.Item,
44-
when: str,
45-
original_result: pytest_TestReport,
45+
test_reports: t.Dict[str, pytest_TestReport],
4646
test_outcome: _TestOutcome,
47+
is_quarantined: bool = False,
4748
):
49+
setup_report = test_reports.get(TestPhase.SETUP)
50+
call_report = test_reports.get(TestPhase.CALL)
51+
teardown_report = test_reports.get(TestPhase.TEARDOWN)
52+
4853
retry_outcomes = _RETRY_OUTCOMES
4954
final_outcomes = _FINAL_OUTCOMES
5055

@@ -56,28 +61,33 @@ def attempt_to_fix_handle_retries(
5661
XPASS=retry_outcomes.ATTEMPT_FAILED,
5762
)
5863

64+
item.ihook.pytest_runtest_logreport(report=setup_report)
65+
5966
# Overwrite the original result to avoid double-counting when displaying totals in final summary
60-
if when == "call":
67+
if call_report:
6168
if test_outcome.status == TestStatus.FAIL:
62-
original_result.outcome = outcomes.FAILED
69+
call_report.outcome = outcomes.FAILED
6370
elif test_outcome.status == TestStatus.SKIP:
64-
original_result.outcome = outcomes.SKIPPED
65-
return
71+
call_report.outcome = outcomes.SKIPPED
72+
73+
item.ihook.pytest_runtest_logreport(report=call_report)
6674

6775
retries_outcome = _do_retries(item, outcomes)
6876
longrepr = InternalTest.stash_get(test_id, "failure_longrepr")
6977

7078
final_report = RetryTestReport(
7179
nodeid=item.nodeid,
7280
location=item.location,
73-
keywords=item.keywords,
74-
when="call",
81+
keywords={k: 1 for k in item.keywords},
82+
when=TestPhase.CALL,
7583
longrepr=longrepr,
7684
outcome=final_outcomes[retries_outcome],
7785
user_properties=item.user_properties,
7886
)
7987
item.ihook.pytest_runtest_logreport(report=final_report)
8088

89+
item.ihook.pytest_runtest_logreport(report=teardown_report)
90+
8191

8292
def _do_retries(item: pytest.Item, outcomes: RetryOutcomes) -> TestStatus:
8393
test_id = _get_test_id_from_item(item)

ddtrace/contrib/internal/pytest/_efd_utils.py

Lines changed: 54 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,18 @@
44
import pytest
55

66
from ddtrace.contrib.internal.pytest._retry_utils import RetryOutcomes
7-
from ddtrace.contrib.internal.pytest._retry_utils import RetryTestReport
7+
from ddtrace.contrib.internal.pytest._retry_utils import RetryReason
8+
from ddtrace.contrib.internal.pytest._retry_utils import UserProperty
89
from ddtrace.contrib.internal.pytest._retry_utils import _get_outcome_from_retry
910
from ddtrace.contrib.internal.pytest._retry_utils import _get_retry_attempt_string
1011
from ddtrace.contrib.internal.pytest._retry_utils import set_retry_num
1112
from ddtrace.contrib.internal.pytest._types import _pytest_report_teststatus_return_type
1213
from ddtrace.contrib.internal.pytest._types import pytest_TestReport
1314
from ddtrace.contrib.internal.pytest._utils import PYTEST_STATUS
15+
from ddtrace.contrib.internal.pytest._utils import TestPhase
1416
from ddtrace.contrib.internal.pytest._utils import _get_test_id_from_item
1517
from ddtrace.contrib.internal.pytest._utils import _TestOutcome
18+
from ddtrace.contrib.internal.pytest._utils import get_user_property
1619
from ddtrace.ext.test_visibility.api import TestStatus
1720
from ddtrace.internal.logger import get_logger
1821
from ddtrace.internal.test_visibility._efd_mixins import EFDTestStatus
@@ -37,67 +40,76 @@ class _EFD_RETRY_OUTCOMES:
3740
_EFD_FLAKY_OUTCOME = "flaky"
3841

3942
_FINAL_OUTCOMES: t.Dict[EFDTestStatus, str] = {
40-
EFDTestStatus.ALL_PASS: _EFD_RETRY_OUTCOMES.EFD_FINAL_PASSED,
41-
EFDTestStatus.ALL_FAIL: _EFD_RETRY_OUTCOMES.EFD_FINAL_FAILED,
42-
EFDTestStatus.ALL_SKIP: _EFD_RETRY_OUTCOMES.EFD_FINAL_SKIPPED,
43-
EFDTestStatus.FLAKY: _EFD_RETRY_OUTCOMES.EFD_FINAL_FLAKY,
43+
EFDTestStatus.ALL_PASS: PYTEST_STATUS.PASSED,
44+
EFDTestStatus.ALL_FAIL: PYTEST_STATUS.FAILED,
45+
EFDTestStatus.ALL_SKIP: PYTEST_STATUS.SKIPPED,
46+
EFDTestStatus.FLAKY: PYTEST_STATUS.PASSED,
4447
}
4548

4649

4750
def efd_handle_retries(
4851
test_id: InternalTestId,
4952
item: pytest.Item,
50-
when: str,
51-
original_result: pytest_TestReport,
53+
test_reports: t.Dict[str, pytest_TestReport],
5254
test_outcome: _TestOutcome,
55+
is_quarantined: bool = False,
5356
):
57+
setup_report = test_reports.get(TestPhase.SETUP)
58+
call_report = test_reports.get(TestPhase.CALL)
59+
teardown_report = test_reports.get(TestPhase.TEARDOWN)
60+
5461
# Overwrite the original result to avoid double-counting when displaying totals in final summary
55-
if when == "call":
62+
if call_report:
5663
if test_outcome.status == TestStatus.FAIL:
57-
original_result.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_FAILED
64+
call_report.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_FAILED
5865
elif test_outcome.status == TestStatus.PASS:
59-
original_result.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_PASSED
66+
call_report.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_PASSED
6067
elif test_outcome.status == TestStatus.SKIP:
61-
original_result.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_SKIPPED
62-
return
68+
call_report.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_SKIPPED
69+
6370
if InternalTest.get_tag(test_id, "_dd.ci.efd_setup_failed"):
6471
log.debug("Test item %s failed during setup, will not be retried for Early Flake Detection")
6572
return
73+
6674
if InternalTest.get_tag(test_id, "_dd.ci.efd_teardown_failed"):
6775
# NOTE: tests that passed their call but failed during teardown are not retried
6876
log.debug("Test item %s failed during teardown, will not be retried for Early Flake Detection")
6977
return
7078

7179
# If the test skipped (can happen either in setup or call depending on mark vs calling .skip()), we set the original
7280
# status as skipped and then continue handling retries because we may not return
73-
if test_outcome.status == TestStatus.SKIP and when in ["setup", "call"]:
74-
original_result.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_SKIPPED
75-
# We don't return for when == call when skip happens during setup, so we need to log it and make sure the status
76-
# of the test is set
77-
if when == "setup":
78-
item.ihook.pytest_runtest_logreport(
79-
nodeid=item.nodeid,
80-
locationm=item.location,
81-
keywords=item.keywords,
82-
when="setup",
83-
longrepr=None,
84-
outcome=_EFD_RETRY_OUTCOMES.EFD_ATTEMPT_SKIPPED,
85-
)
86-
InternalTest.mark_skip(test_id)
81+
if test_outcome.status == TestStatus.SKIP:
82+
if call_report:
83+
call_report.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_SKIPPED
84+
else:
85+
# When skip happens during setup, we don't have a call report.
86+
setup_report.outcome = _EFD_RETRY_OUTCOMES.EFD_ATTEMPT_SKIPPED
87+
88+
item.ihook.pytest_runtest_logreport(report=setup_report)
89+
90+
if call_report:
91+
item.ihook.pytest_runtest_logreport(report=call_report)
8792

8893
efd_outcome = _efd_do_retries(item)
8994
longrepr = InternalTest.stash_get(test_id, "failure_longrepr")
9095

91-
final_report = RetryTestReport(
96+
final_report = pytest_TestReport(
9297
nodeid=item.nodeid,
9398
location=item.location,
94-
keywords=item.keywords,
95-
when="call",
99+
keywords={k: 1 for k in item.keywords},
100+
when=TestPhase.CALL,
96101
longrepr=longrepr,
97102
outcome=_FINAL_OUTCOMES[efd_outcome],
103+
user_properties=item.user_properties
104+
+ [
105+
(UserProperty.RETRY_REASON, RetryReason.EARLY_FLAKE_DETECTION),
106+
(UserProperty.RETRY_FINAL_OUTCOME, efd_outcome.value),
107+
],
98108
)
99109
item.ihook.pytest_runtest_logreport(report=final_report)
100110

111+
item.ihook.pytest_runtest_logreport(report=teardown_report)
112+
101113

102114
def efd_get_failed_reports(terminalreporter: _pytest.terminal.TerminalReporter) -> t.List[pytest_TestReport]:
103115
return terminalreporter.getreports(_EFD_RETRY_OUTCOMES.EFD_ATTEMPT_FAILED)
@@ -325,14 +337,17 @@ def efd_get_teststatus(report: pytest_TestReport) -> _pytest_report_teststatus_r
325337
"s",
326338
(f"EFD RETRY {_get_retry_attempt_string(report.nodeid)}SKIPPED", {"yellow": True}),
327339
)
328-
if report.outcome == _EFD_RETRY_OUTCOMES.EFD_FINAL_PASSED:
329-
return (_EFD_RETRY_OUTCOMES.EFD_FINAL_PASSED, ".", ("EFD FINAL STATUS: PASSED", {"green": True}))
330-
if report.outcome == _EFD_RETRY_OUTCOMES.EFD_FINAL_FAILED:
331-
return (_EFD_RETRY_OUTCOMES.EFD_FINAL_FAILED, "F", ("EFD FINAL STATUS: FAILED", {"red": True}))
332-
if report.outcome == _EFD_RETRY_OUTCOMES.EFD_FINAL_SKIPPED:
333-
return (_EFD_RETRY_OUTCOMES.EFD_FINAL_SKIPPED, "S", ("EFD FINAL STATUS: SKIPPED", {"yellow": True}))
334-
if report.outcome == _EFD_RETRY_OUTCOMES.EFD_FINAL_FLAKY:
335-
# Flaky tests are the only one that have a pretty string because they are intended to be displayed in the final
336-
# count of terminal summary
337-
return (_EFD_FLAKY_OUTCOME, "K", ("EFD FINAL STATUS: FLAKY", {"yellow": True}))
340+
341+
if get_user_property(report, UserProperty.RETRY_REASON) == RetryReason.EARLY_FLAKE_DETECTION:
342+
efd_outcome = get_user_property(report, UserProperty.RETRY_FINAL_OUTCOME)
343+
if efd_outcome == "passed":
344+
return (_EFD_RETRY_OUTCOMES.EFD_FINAL_PASSED, ".", ("EFD FINAL STATUS: PASSED", {"green": True}))
345+
if efd_outcome == "failed":
346+
return (_EFD_RETRY_OUTCOMES.EFD_FINAL_FAILED, "F", ("EFD FINAL STATUS: FAILED", {"red": True}))
347+
if efd_outcome == "skipped":
348+
return (_EFD_RETRY_OUTCOMES.EFD_FINAL_SKIPPED, "S", ("EFD FINAL STATUS: SKIPPED", {"yellow": True}))
349+
if efd_outcome == "flaky":
350+
# Flaky tests are the only one that have a pretty string because they are intended to be displayed in the
351+
# final count of terminal summary
352+
return (_EFD_FLAKY_OUTCOME, "K", ("EFD FINAL STATUS: FLAKY", {"yellow": True}))
338353
return None

0 commit comments

Comments
 (0)