Skip to content

feat(related_issues): Trace connected errors #69237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 24, 2024

Conversation

armenzg
Copy link
Member

@armenzg armenzg commented Apr 18, 2024

Given a group, we look for the recommended event and search for any other errors in the trace.

These trace-connected groups will be shown under the Related Issues tab. See the current look (more UI changes will be needed).

image

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 18, 2024
Copy link

codecov bot commented Apr 18, 2024

Codecov Report

Attention: Patch coverage is 92.10526% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 79.48%. Comparing base (85558cb) to head (16914f5).
Report is 22 commits behind head on master.

❗ Current head 16914f5 differs from pull request most recent head 9371263. Consider uploading reports for the commit 9371263 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #69237      +/-   ##
==========================================
- Coverage   79.65%   79.48%   -0.17%     
==========================================
  Files        6476     6477       +1     
  Lines      287490   287527      +37     
  Branches    49547    49554       +7     
==========================================
- Hits       228990   228549     -441     
- Misses      58132    58610     +478     
  Partials      368      368              
Files Coverage Δ
src/sentry/eventstore/models.py 96.73% <100.00%> (+0.04%) ⬆️
src/sentry/issues/related/__init__.py 100.00% <100.00%> (ø)
src/sentry/issues/related/same_root_cause.py 100.00% <100.00%> (ø)
src/sentry/search/events/builder/discover.py 96.31% <100.00%> (+<0.01%) ⬆️
src/sentry/snuba/referrer.py 100.00% <100.00%> (ø)
src/sentry/testutils/cases.py 89.43% <100.00%> (+0.07%) ⬆️
src/sentry/testutils/factories.py 96.00% <100.00%> (+<0.01%) ⬆️
.../sentry/api/endpoints/organization_events_trace.py 24.68% <85.71%> (-65.91%) ⬇️
src/sentry/issues/related/trace_connected.py 85.71% <85.71%> (ø)

... and 9 files with indirect coverage changes

Base automatically changed from ref/trace_test_case/armenzg to master April 22, 2024 14:37
armenzg added 7 commits April 22, 2024 10:41
Given an issue, if we can find an event with a trace, we may be able to find other issue in its trace.

This can allow the customer to act on them together.
auto_fields=False,
),
)
error_query = find_errors_for_trace_id(params, trace_id, limit)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to call find_errors_for_trace_id from my endpoint, thus, making it its own function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer you re-implement this function rather than re-use here so that I can still make changes here without potentially breaking your endpoint.

Especially because the query pattern here is very dependent on feeding the latter augment_transactions_with_spans

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm

@@ -11,7 +11,10 @@ def same_root_cause_analysis(group: Group) -> list[int]:
"""Analyze and create a group set if the group was caused by the same root cause."""
# Querying the data field (which is a GzippedDictField) cannot be done via
# Django's ORM, thus, we do so via compare_groups
project_groups = RangeQuerySetWrapper(Group.objects.filter(project=group.project_id), limit=100)
project_groups = RangeQuerySetWrapper(
Group.objects.filter(project=group.project_id).exclude(id=group.id),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change to exclude the group ID we're querying about.

This is the related test change for it:
image

endpoint = "sentry-api-0-issues-related-issues"
FEATURES: list[str] = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed in one of the test cases. Ideally, I fix it up there.


def setUp(self) -> None:
super().setUp()
self.login_as(user=self.user)
self.organization = self.create_organization(owner=self.user)
self.error_type = "ApiTimeoutError"
self.error_value = "Timed out attempting to reach host: api.github.com"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change. I could move it into its own PR.

@armenzg armenzg requested a review from a team April 23, 2024 19:52
@armenzg armenzg changed the title WIP feat(related_issues): Trace connected errors feat(related_issues): Trace connected errors Apr 23, 2024
@armenzg armenzg marked this pull request as ready for review April 23, 2024 19:53
@armenzg armenzg requested a review from a team as a code owner April 23, 2024 19:53
@armenzg armenzg self-assigned this Apr 23, 2024
Comment on lines +23 to +32
query = QueryBuilder(
Dataset.Events,
{"start": start, "end": end, "organization_id": org_id, "project_id": project_ids},
query=f"trace:{event.trace_id}",
selected_columns=["id", "issue.id"],
# Don't add timestamp to this orderby as snuba will have to split the time range up and make multiple queries
orderby=["id"],
limit=100,
config=QueryBuilderConfig(auto_fields=False),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like potentially rather heavy query, maybe add some read-through cache?

@armenzg armenzg requested a review from wmak April 23, 2024 21:31
org_id = group.project.organization_id
# XXX: Test without a list and validate the data type
project_ids = list(Project.objects.filter(organization_id=org_id).values_list("id", flat=True))
start, end = default_start_end_dates() # Today to 90 days back
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Bartek's comment here, this is a giant query if its spanning multiple projects and ninety days, I'm not familiar with what you're building, but do you really need to scan 90d of data? Traces rarely span more than a few minutes, and even more rarely span multiple days.

That is to say, you have an event do you know that events timestamp? can you use that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow up.
We can just use the timestamp of the event as the start date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that a day in either direction seems like it would cover 99% of useful cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd want to see the trace connected issues for downstream issues as well (e.g. the second issue in a trace). I think you could limit this safely to several hours either direction of the occurrence.

@armenzg armenzg enabled auto-merge (squash) April 24, 2024 11:57
@armenzg armenzg merged commit 34d9222 into master Apr 24, 2024
48 checks passed
@armenzg armenzg deleted the feat/trace_connected_errors/armenzg branch April 24, 2024 12:22
armenzg added a commit that referenced this pull request Apr 24, 2024
Given a group, we look for the recommended event and search for any
other errors in the trace.

These trace-connected groups will be shown under the Related Issues tab.
See the current look (more UI changes will be needed).

<img width="933" alt="image"
src="https://github.com/getsentry/sentry/assets/44410/b3f774d7-dc71-40b9-b144-65f143b6e981">
MichaelSun48 pushed a commit that referenced this pull request Apr 25, 2024
Given a group, we look for the recommended event and search for any
other errors in the trace.

These trace-connected groups will be shown under the Related Issues tab.
See the current look (more UI changes will be needed).

<img width="933" alt="image"
src="https://github.com/getsentry/sentry/assets/44410/b3f774d7-dc71-40b9-b144-65f143b6e981">
@armenzg armenzg added this to the Related Issues - V1 milestone Apr 25, 2024
@armenzg armenzg linked an issue Apr 25, 2024 that may be closed by this pull request
2 tasks
armenzg added a commit that referenced this pull request Apr 25, 2024
Copy link

sentry-io bot commented Apr 25, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ AssertionError: assert 0 == 3 pytest.runtest.protocol tests/snuba/api/endpoin... View Issue

Did you find this useful? React with a 👍 or 👎

@armenzg armenzg mentioned this pull request Apr 25, 2024
5 tasks
armenzg added a commit that referenced this pull request Apr 26, 2024
**NOTE:** This is not the final UI but a step towards it.

This shows the recently added trace-connected errors. This is a follow-up to the backend work: #69237 

![image](https://github.com/getsentry/sentry/assets/44410/a354acca-f6b4-4f9d-9ea6-9c0212f478c9)
@github-actions github-actions bot locked and limited conversation to collaborators May 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BE - ⏫ Trace-connected related issues
4 participants