Skip to content

chore(ci_visibility): refactor test retry logic #13224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vitor-de-araujo
Copy link
Contributor

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

CODEOWNERS have been resolved as:

ddtrace/internal/ci_visibility/api/_retry.py                            @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_atr_utils.py                           @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_plugin_v2.py                           @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_retry_utils.py                         @DataDog/ci-app-libraries
ddtrace/contrib/internal/pytest/_utils.py                               @DataDog/ci-app-libraries
ddtrace/internal/ci_visibility/api/_test.py                             @DataDog/ci-app-libraries
ddtrace/internal/ci_visibility/recorder.py                              @DataDog/ci-app-libraries
ddtrace/internal/test_visibility/_atr_mixins.py                         @DataDog/ci-app-libraries
ddtrace/internal/test_visibility/api.py                                 @DataDog/ci-app-libraries

if setup_report.outcome == outcomes.PASSED:
call_call, call_report = _retry_run_when(item, "call", outcomes)
if call_report.outcome == outcomes.FAILED:
if setup_outcome == outcomes.PASSED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Code Quality Violation

Found too many nested ifs within this condition (...read more)

Too many nested loops make the code hard to read and understand. Simplify your code by removing nesting levels and separate code in small units.

View in Datadog  Leave us feedback  Documentation

# Teardown does not happen if setup skipped
if not setup_report.skipped:
teardown_call, teardown_report = _retry_run_when(item, "teardown", outcomes)
if not setup_outcome == outcomes.SKIPPED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Code Quality Violation

Found too many nested ifs within this condition (...read more)

Too many nested loops make the code hard to read and understand. Simplify your code by removing nesting levels and separate code in small units.

View in Datadog  Leave us feedback  Documentation

Copy link
Contributor

github-actions bot commented Apr 17, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 252 ± 5 ms.

The average import time from base is: 256 ± 7 ms.

The import time difference between this PR and base is: -4.1 ± 0.3 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.215 ms (0.88%)
ddtrace.bootstrap.sitecustomize 1.511 ms (0.60%)
ddtrace.bootstrap.preload 1.511 ms (0.60%)
ddtrace.internal.products 1.511 ms (0.60%)
ddtrace.internal.remoteconfig.client 0.697 ms (0.28%)
ddtrace 0.704 ms (0.28%)

@pr-commenter
Copy link

pr-commenter bot commented Apr 17, 2025

Benchmarks

Benchmark execution time: 2025-04-23 18:12:33

Comparing candidate commit 931fb82 in PR branch vitor-de-araujo/SDTEST-1850/refactor-retry-logic with baseline commit 7a1745d in branch main.

Found 5 performance improvements and 2 performance regressions! Performance is the same for 483 metrics, 2 unstable metrics.

scenario:ddtracerun-auto_profiling

  • 🟥 execution_time [+171.689ms; +175.100ms] or [+26.236%; +26.757%]

scenario:iast_aspects-swapcase_aspect

  • 🟥 execution_time [+198.539ns; +247.071ns] or [+8.039%; +10.004%]

scenario:otelspan-start-finish

  • 🟩 execution_time [-19.006ms; -18.689ms] or [-24.111%; -23.709%]

scenario:otelspan-start-finish-telemetry

  • 🟩 execution_time [-19.271ms; -18.970ms] or [-24.004%; -23.629%]

scenario:span-start-finish

  • 🟩 execution_time [-15.704ms; -15.233ms] or [-32.651%; -31.670%]

scenario:span-start-finish-telemetry

  • 🟩 execution_time [-16.378ms; -15.775ms] or [-33.111%; -31.893%]

scenario:span-start-finish-traceid128

  • 🟩 execution_time [-18.861ms; -18.358ms] or [-36.739%; -35.758%]

log = get_logger(__name__)


class RetryManager:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Class RetryManager should have an init method (...read more)

Ensure that a class has an __init__ method. This check is bypassed when the class is a data class (annotated with @dataclass).

View in Datadog  Leave us feedback  Documentation

vitor-de-araujo added a commit that referenced this pull request May 14, 2025
…3376)

Currently we let pytest's builtin `pytest_runtest_protocol` hook to run
the test, and we check whether to retry at the makereport stage. This
has a number of consequences:

- We don't have access to the setup, call, teardown reports all at once;
we see one at a time during `makereport`, and we have to patch them and
stash information around to keep state across each time we pass through
the reports for a given test.
- pytest logs each report as it is created, so we have to patch them at
the right time so they get printed correctly (to e.g. change the outcome
from `FAILED` to `ATR INITIAL ATTEMPT FAILED`; this affects not only
printing, but also the session exit status).
- In particular, for EFD when the initial attempt passes, we run too
late and the `PASSED` status was already logged by the time we patch it,
so it shows `PASSED` instead of `EFD INITIAL ATTEMPT PASSED`. Not only
that, but we have to [handle this case
specially](https://github.com/DataDog/dd-trace-py/blob/0efbb6c3cc6ab4c59a838771f2db92f1b729e87a/ddtrace/contrib/internal/pytest/_efd_utils.py#L237)
when generating the terminal summary.

This PR moves the retry logic from the `pytest_runtest_makereport` hook
to the `pytest_runtest_protocol` hook. This means we _replace_ pytest's
own `pytest_runtest_protocol` with our own. We invoke pytest's internal
`runtestprotocol` function directly from our hook, so the behavior of
our hook is similar to the pytest's own hook. The difference is that we
call this function with `log=False`, so pytest doesn't log the setup,
call, teardown reports as they are created. Instead, we collect all
reports, patch them as needed, and _then_ print them out. This means we
can write the logic having full knowledge of the final status of a test
run, instead of patching things as we see them during setup, call and
teardown.

For retries, the responsibility for logging the statuses is moved to the
retry handlers themselves, so they can delay printing to after the
reports have been patched. In principle, we could even decide to _not_
print the retry results individually and only print the final status
(which would make for a cleaner output), but this can come in a future
version.

For EFD, the special final outcomes (`dd_efd_final_passed`, etc.) are
replaced with plain `passed`, `failed`, `skipped` states, which xdist
can handle, and the final states are only used in `efd_get_teststatus`
(called from the `pytest_report_teststatus` hook).

Future work:

- Attempt to Fix has to be modified in similar ways to EFD, but it also
has to handle quarantine, so it's a bit more involved.
- xdist still prints `EFD INITIAL ATTEMPT` for all atempts (not just the
first one).
- The whole retry logic outside of pytest should be refactored (see
#13224), but this PR is a
first step to make the rest possible.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
vitor-de-araujo added a commit that referenced this pull request May 19, 2025
…3448)

This is essentially #13376, recreated now that the fix in the unittest
suite (#13445) has been merged, and with the previous
`pytest_runtest_protocol` logic wrapped in a try/except to avoid
breaking the pipeline in case of an internal error in CI Visibility.

___

Currently we let pytest's builtin `pytest_runtest_protocol` hook to run
the test, and we check whether to retry at the makereport stage. This
has a number of consequences:

- We don't have access to the setup, call, teardown reports all at once;
we see one at a time during `makereport`, and we have to patch them and
stash information around to keep state across each time we pass through
the reports for a given test.
- pytest logs each report as it is created, so we have to patch them at
the right time so they get printed correctly (to e.g. change the outcome
from `FAILED` to `ATR INITIAL ATTEMPT FAILED`; this affects not only
printing, but also the session exit status).
- In particular, for EFD when the initial attempt passes, we run too
late and the `PASSED` status was already logged by the time we patch it,
so it shows `PASSED` instead of `EFD INITIAL ATTEMPT PASSED`. Not only
that, but we have to [handle this case
specially](https://github.com/DataDog/dd-trace-py/blob/0efbb6c3cc6ab4c59a838771f2db92f1b729e87a/ddtrace/contrib/internal/pytest/_efd_utils.py#L237)
when generating the terminal summary.

This PR moves the retry logic from the `pytest_runtest_makereport` hook
to the `pytest_runtest_protocol` hook. This means we _replace_ pytest's
own `pytest_runtest_protocol` with our own. We invoke pytest's internal
`runtestprotocol` function directly from our hook, so the behavior of
our hook is similar to the pytest's own hook. The difference is that we
call this function with `log=False`, so pytest doesn't log the setup,
call, teardown reports as they are created. Instead, we collect all
reports, patch them as needed, and _then_ print them out. This means we
can write the logic having full knowledge of the final status of a test
run, instead of patching things as we see them during setup, call and
teardown.

For retries, the responsibility for logging the statuses is moved to the
retry handlers themselves, so they can delay printing to after the
reports have been patched. In principle, we could even decide to _not_
print the retry results individually and only print the final status
(which would make for a cleaner output), but this can come in a future
version.

For EFD, the special final outcomes (`dd_efd_final_passed`, etc.) are
replaced with plain `passed`, `failed`, `skipped` states, which xdist
can handle, and the final states are only used in `efd_get_teststatus`
(called from the `pytest_report_teststatus` hook).

Future work:

- Attempt to Fix has to be modified in similar ways to EFD, but it also
has to handle quarantine, so it's a bit more involved.
- xdist still prints `EFD INITIAL ATTEMPT` for all atempts (not just the
first one).
- The whole retry logic outside of pytest should be refactored (see
#13224), but this PR is a
first step to make the rest possible.


## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
@github-actions github-actions bot added the stale label May 24, 2025
Copy link
Contributor

This pull request has been automatically closed after a period of inactivity.
After this much time, it will likely be easier to open a new pull request with the
same changes than to update this one from the base branch. Please comment or reopen
if you think this pull request was closed in error.

@github-actions github-actions bot closed this May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant