chore(ci_visibility): refactor test retry logic #13224

vitor-de-araujo · 2025-04-17T14:15:34Z

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2025-04-17T14:16:11Z

CODEOWNERS have been resolved as:

ddtrace/internal/ci_visibility/api/_retry.py                            @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_atr_utils.py                           @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_plugin_v2.py                           @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_retry_utils.py                         @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_utils.py                               @DataDog/ci-app-libraries
ddtrace/internal/ci_visibility/api/_test.py                             @DataDog/ci-app-libraries
ddtrace/internal/ci_visibility/recorder.py                              @DataDog/ci-app-libraries
ddtrace/internal/test_visibility/_atr_mixins.py                         @DataDog/ci-app-libraries
ddtrace/internal/test_visibility/api.py                                 @DataDog/ci-app-libraries

datadog-datadog-prod-us1 · 2025-04-17T14:16:12Z

ddtrace/contrib/internal/pytest/_retry_utils.py

-    if setup_report.outcome == outcomes.PASSED:
-        call_call, call_report = _retry_run_when(item, "call", outcomes)
-        if call_report.outcome == outcomes.FAILED:
+    if setup_outcome == outcomes.PASSED:


🟠 Code Quality Violation

Found too many nested ifs within this condition (...read more)

Too many nested loops make the code hard to read and understand. Simplify your code by removing nesting levels and separate code in small units.

datadog-datadog-prod-us1 · 2025-04-17T14:16:12Z

ddtrace/contrib/internal/pytest/_retry_utils.py

    # Teardown does not happen if setup skipped
-    if not setup_report.skipped:
-        teardown_call, teardown_report = _retry_run_when(item, "teardown", outcomes)
+    if not setup_outcome == outcomes.SKIPPED:


🟠 Code Quality Violation

Found too many nested ifs within this condition (...read more)

Too many nested loops make the code hard to read and understand. Simplify your code by removing nesting levels and separate code in small units.

github-actions · 2025-04-17T14:36:22Z

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 252 ± 5 ms.

The average import time from base is: 256 ± 7 ms.

The import time difference between this PR and base is: -4.1 ± 0.3 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.215 ms (0.88%)

ddtrace.bootstrap.sitecustomize 1.511 ms (0.60%)

ddtrace.bootstrap.preload 1.511 ms (0.60%)

ddtrace.internal.products 1.511 ms (0.60%)

ddtrace.internal.remoteconfig.client 0.697 ms (0.28%)

ddtrace 0.704 ms (0.28%)

pr-commenter · 2025-04-17T15:38:35Z

Benchmarks

Benchmark execution time: 2025-04-23 18:12:33

Comparing candidate commit 931fb82 in PR branch vitor-de-araujo/SDTEST-1850/refactor-retry-logic with baseline commit 7a1745d in branch main.

Found 5 performance improvements and 2 performance regressions! Performance is the same for 483 metrics, 2 unstable metrics.

scenario:ddtracerun-auto_profiling

🟥 execution_time [+171.689ms; +175.100ms] or [+26.236%; +26.757%]

scenario:iast_aspects-swapcase_aspect

🟥 execution_time [+198.539ns; +247.071ns] or [+8.039%; +10.004%]

scenario:otelspan-start-finish

🟩 execution_time [-19.006ms; -18.689ms] or [-24.111%; -23.709%]

scenario:otelspan-start-finish-telemetry

🟩 execution_time [-19.271ms; -18.970ms] or [-24.004%; -23.629%]

scenario:span-start-finish

🟩 execution_time [-15.704ms; -15.233ms] or [-32.651%; -31.670%]

scenario:span-start-finish-telemetry

🟩 execution_time [-16.378ms; -15.775ms] or [-33.111%; -31.893%]

scenario:span-start-finish-traceid128

🟩 execution_time [-18.861ms; -18.358ms] or [-36.739%; -35.758%]

datadog-datadog-prod-us1 · 2025-04-21T11:01:00Z

ddtrace/internal/ci_visibility/api/_retry.py

+log = get_logger(__name__)
+
+
+class RetryManager:


⚪ Code Quality Violation

Class RetryManager should have an init method (...read more)

Ensure that a class has an __init__ method. This check is bypassed when the class is a data class (annotated with @dataclass).

…3376) Currently we let pytest's builtin `pytest_runtest_protocol` hook to run the test, and we check whether to retry at the makereport stage. This has a number of consequences: - We don't have access to the setup, call, teardown reports all at once; we see one at a time during `makereport`, and we have to patch them and stash information around to keep state across each time we pass through the reports for a given test. - pytest logs each report as it is created, so we have to patch them at the right time so they get printed correctly (to e.g. change the outcome from `FAILED` to `ATR INITIAL ATTEMPT FAILED`; this affects not only printing, but also the session exit status). - In particular, for EFD when the initial attempt passes, we run too late and the `PASSED` status was already logged by the time we patch it, so it shows `PASSED` instead of `EFD INITIAL ATTEMPT PASSED`. Not only that, but we have to [handle this case specially](https://github.com/DataDog/dd-trace-py/blob/0efbb6c3cc6ab4c59a838771f2db92f1b729e87a/ddtrace/contrib/internal/pytest/_efd_utils.py#L237) when generating the terminal summary. This PR moves the retry logic from the `pytest_runtest_makereport` hook to the `pytest_runtest_protocol` hook. This means we _replace_ pytest's own `pytest_runtest_protocol` with our own. We invoke pytest's internal `runtestprotocol` function directly from our hook, so the behavior of our hook is similar to the pytest's own hook. The difference is that we call this function with `log=False`, so pytest doesn't log the setup, call, teardown reports as they are created. Instead, we collect all reports, patch them as needed, and _then_ print them out. This means we can write the logic having full knowledge of the final status of a test run, instead of patching things as we see them during setup, call and teardown. For retries, the responsibility for logging the statuses is moved to the retry handlers themselves, so they can delay printing to after the reports have been patched. In principle, we could even decide to _not_ print the retry results individually and only print the final status (which would make for a cleaner output), but this can come in a future version. For EFD, the special final outcomes (`dd_efd_final_passed`, etc.) are replaced with plain `passed`, `failed`, `skipped` states, which xdist can handle, and the final states are only used in `efd_get_teststatus` (called from the `pytest_report_teststatus` hook). Future work: - Attempt to Fix has to be modified in similar ways to EFD, but it also has to handle quarantine, so it's a bit more involved. - xdist still prints `EFD INITIAL ATTEMPT` for all atempts (not just the first one). - The whole retry logic outside of pytest should be refactored (see #13224), but this PR is a first step to make the rest possible. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

…3448) This is essentially #13376, recreated now that the fix in the unittest suite (#13445) has been merged, and with the previous `pytest_runtest_protocol` logic wrapped in a try/except to avoid breaking the pipeline in case of an internal error in CI Visibility. ___ Currently we let pytest's builtin `pytest_runtest_protocol` hook to run the test, and we check whether to retry at the makereport stage. This has a number of consequences: - We don't have access to the setup, call, teardown reports all at once; we see one at a time during `makereport`, and we have to patch them and stash information around to keep state across each time we pass through the reports for a given test. - pytest logs each report as it is created, so we have to patch them at the right time so they get printed correctly (to e.g. change the outcome from `FAILED` to `ATR INITIAL ATTEMPT FAILED`; this affects not only printing, but also the session exit status). - In particular, for EFD when the initial attempt passes, we run too late and the `PASSED` status was already logged by the time we patch it, so it shows `PASSED` instead of `EFD INITIAL ATTEMPT PASSED`. Not only that, but we have to [handle this case specially](https://github.com/DataDog/dd-trace-py/blob/0efbb6c3cc6ab4c59a838771f2db92f1b729e87a/ddtrace/contrib/internal/pytest/_efd_utils.py#L237) when generating the terminal summary. This PR moves the retry logic from the `pytest_runtest_makereport` hook to the `pytest_runtest_protocol` hook. This means we _replace_ pytest's own `pytest_runtest_protocol` with our own. We invoke pytest's internal `runtestprotocol` function directly from our hook, so the behavior of our hook is similar to the pytest's own hook. The difference is that we call this function with `log=False`, so pytest doesn't log the setup, call, teardown reports as they are created. Instead, we collect all reports, patch them as needed, and _then_ print them out. This means we can write the logic having full knowledge of the final status of a test run, instead of patching things as we see them during setup, call and teardown. For retries, the responsibility for logging the statuses is moved to the retry handlers themselves, so they can delay printing to after the reports have been patched. In principle, we could even decide to _not_ print the retry results individually and only print the final status (which would make for a cleaner output), but this can come in a future version. For EFD, the special final outcomes (`dd_efd_final_passed`, etc.) are replaced with plain `passed`, `failed`, `skipped` states, which xdist can handle, and the final states are only used in `efd_get_teststatus` (called from the `pytest_report_teststatus` hook). Future work: - Attempt to Fix has to be modified in similar ways to EFD, but it also has to handle quarantine, so it's a bit more involved. - xdist still prints `EFD INITIAL ATTEMPT` for all atempts (not just the first one). - The whole retry logic outside of pytest should be refactored (see #13224), but this PR is a first step to make the rest possible. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

github-actions · 2025-05-25T00:09:12Z

This pull request has been automatically closed after a period of inactivity.
After this much time, it will likely be easier to open a new pull request with the
same changes than to update this one from the base branch. Please comment or reopen
if you think this pull request was closed in error.

vitor-de-araujo added 5 commits April 14, 2025 20:38

first step into retry refactor

65bcf4d

intermediary shenanigans

dcbc21d

go back

6d2d2b8

moving attempt count logic

7614292

every test counts

df60b61

datadog-datadog-prod-us1 bot reviewed Apr 17, 2025

View reviewed changes

efd: first steps

b003dc1

datadog-datadog-prod-us1 bot reviewed Apr 21, 2025

View reviewed changes

vitor-de-araujo added 9 commits April 21, 2025 11:13

a

214013c

a

a1d18f2

a

7f1913d

a

4437cb0

retry number as a user property works better with xdist for some reason

4614406

curb your enthusiasm

b6f33d9

flaky reports, part 1/∞

6ca8693

almost not terrible

b2b5fd7

correct count, weird method

931fb82

vitor-de-araujo mentioned this pull request May 14, 2025

chore(ci_visibility): move retry logic to pytest_runtest_protocol #13376

Merged

2 tasks

vitor-de-araujo mentioned this pull request May 19, 2025

chore(ci_visibility): move retry logic to pytest_runtest_protocol #13448

Merged

2 tasks

github-actions bot added the stale label May 24, 2025

github-actions bot closed this May 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(ci_visibility): refactor test retry logic #13224

chore(ci_visibility): refactor test retry logic #13224

Uh oh!

vitor-de-araujo commented Apr 17, 2025

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

datadog-datadog-prod-us1 bot Apr 17, 2025

Uh oh!

datadog-datadog-prod-us1 bot Apr 17, 2025

Uh oh!

github-actions bot commented Apr 17, 2025 •

edited

Loading

Uh oh!

pr-commenter bot commented Apr 17, 2025 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 bot Apr 21, 2025

Uh oh!

github-actions bot commented May 25, 2025

Uh oh!

Uh oh!

chore(ci_visibility): refactor test retry logic #13224

chore(ci_visibility): refactor test retry logic #13224

Uh oh!

Conversation

vitor-de-araujo commented Apr 17, 2025

Checklist

Reviewer Checklist

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

datadog-datadog-prod-us1 bot Apr 17, 2025

Choose a reason for hiding this comment

🟠 Code Quality Violation

Uh oh!

datadog-datadog-prod-us1 bot Apr 17, 2025

Choose a reason for hiding this comment

🟠 Code Quality Violation

Uh oh!

github-actions bot commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bootstrap import analysis

Summary

Import time breakdown

Uh oh!

pr-commenter bot commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:ddtracerun-auto_profiling

scenario:iast_aspects-swapcase_aspect

scenario:otelspan-start-finish

scenario:otelspan-start-finish-telemetry

scenario:span-start-finish

scenario:span-start-finish-telemetry

scenario:span-start-finish-traceid128

Uh oh!

datadog-datadog-prod-us1 bot Apr 21, 2025

Choose a reason for hiding this comment

⚪ Code Quality Violation

Uh oh!

github-actions bot commented May 25, 2025

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025 •

edited

Loading

pr-commenter bot commented Apr 17, 2025 •

edited

Loading