-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
feat(grouping): Add hashing_metadata
field to GroupHashMetadata
table
#80531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lobsterkatie
merged 2 commits into
master
from
kmclb-add-hashing-metadata-types-and-field
Nov 15, 2024
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
34 changes: 34 additions & 0 deletions
34
src/sentry/migrations/0791_add_hashing_metadata_to_grouphash_metadata.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Generated by Django 5.1.1 on 2024-11-14 22:09 | ||
|
||
from django.db import migrations | ||
|
||
import sentry.db.models.fields.jsonfield | ||
from sentry.new_migrations.migrations import CheckedMigration | ||
|
||
|
||
class Migration(CheckedMigration): | ||
# This flag is used to mark that a migration shouldn't be automatically run in production. | ||
# This should only be used for operations where it's safe to run the migration after your | ||
# code has deployed. So this should not be used for most operations that alter the schema | ||
# of a table. | ||
# Here are some things that make sense to mark as post deployment: | ||
# - Large data migrations. Typically we want these to be run manually so that they can be | ||
# monitored and not block the deploy for a long period of time while they run. | ||
# - Adding indexes to large tables. Since this can take a long time, we'd generally prefer to | ||
# run this outside deployments so that we don't block them. Note that while adding an index | ||
# is a schema change, it's completely safe to run the operation after the code has deployed. | ||
# Once deployed, run these manually via: https://develop.sentry.dev/database-migrations/#migration-deployment | ||
|
||
is_post_deployment = False | ||
|
||
dependencies = [ | ||
("sentry", "0790_delete_dashboard_perms_col"), | ||
] | ||
|
||
operations = [ | ||
migrations.AddField( | ||
model_name="grouphashmetadata", | ||
name="hashing_metadata", | ||
field=sentry.db.models.fields.jsonfield.JSONField(null=True), | ||
), | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
from __future__ import annotations | ||
|
||
from typing import NotRequired, TypedDict | ||
|
||
# NOTE: The structure in these metadata types is intentionaly flat, to make it easier to query in | ||
# Redash or BigQuery, and they are all merged into a single flat JSON blob (which is then stored in | ||
# `GroupHashMetadata.hashing_metadata`). Therefore, if entries are added, they should be namespaced | ||
# according to their corresponding hash basis (so, for example, `fingerprint_source` and | ||
# `message_source`, rather than just `source`), both for clarity and to avoid collisions. | ||
|
||
|
||
class FingerprintHashingMetadata(TypedDict): | ||
lobsterkatie marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Fingerprint data, gathered both during stand-alone custom/built-in fingerprinting and hybrid | ||
fingerprinting involving message, stacktrace, security, or template hashing | ||
""" | ||
|
||
# The fingerprint value | ||
fingerprint: str | ||
# Either "client", "server_builtin_rule", or "server_custom_rule". (We don't have a "none of the | ||
# above" option here because we only record fingerprint metadata in cases where there's some | ||
# sort of custom fingerprint.) | ||
fingerprint_source: str | ||
# The fingerprint value set in the SDK, if anything other than ["{{ default }}"]. Note that just | ||
# because this is set doesn't mean we necessarily used it for grouping, since server-side rules | ||
# take precedence over client fingerprints. See `fingerprint_source` above. | ||
client_fingerprint: NotRequired[str] | ||
# The server-side rule applied, if any | ||
matched_fingerprinting_rule: NotRequired[str] | ||
# Whether or not a hybrid fingerprint (one involving both the signal value `{{ default }}` and a | ||
# custom value) was used. In that case, we group as we normally would, but then split the events | ||
# into more granular groups based on the custom value. | ||
is_hybrid_fingerprint: bool | ||
|
||
|
||
class MessageHashingMetadata(TypedDict): | ||
""" | ||
Data gathered when an event is grouped by log message or error type and value | ||
""" | ||
|
||
# Either "message" (from "message" or "logentry") or "exception" (error type and value, in cases | ||
# where there's no stacktrace) | ||
message_source: str | ||
# Whether we've done any parameterization of the message, such as replacing a number with "<int>" | ||
message_parameterized: bool | ||
|
||
|
||
class SaltedMessageHashingMetadata(MessageHashingMetadata, FingerprintHashingMetadata): | ||
""" | ||
Data from message-based bybrid fingerprinting | ||
""" | ||
|
||
pass | ||
|
||
|
||
class StacktraceHashingMetadata(TypedDict): | ||
""" | ||
Data gathered when an event is grouped based on a stacktrace found in an exception, a thread, or | ||
diretly in the event | ||
""" | ||
|
||
# Either "in-app" or "system" | ||
stacktrace_type: str | ||
# Where in the event data the stacktrace was found - either "exception", "thread", or | ||
# "top-level" | ||
stacktrace_location: str | ||
# The number of stacktraces used for grouping (will be more than 1 in cases of chained | ||
# exceptions) | ||
num_stacktraces: int | ||
|
||
|
||
class SaltedStacktraceHashingMetadata(StacktraceHashingMetadata, FingerprintHashingMetadata): | ||
""" | ||
Data from stacktrace-based bybrid fingerprinting | ||
""" | ||
|
||
pass | ||
|
||
|
||
class SecurityHashingMetadata(TypedDict): | ||
""" | ||
Data gathered when grouping browser-based security (Content Security Policy, Certifcate | ||
Transparency, Online Certificate Status Protocol Stapling, or HTTP Public Key Pinning) reports | ||
""" | ||
|
||
# Either "csp", "expect-ct", "expect-staple", or "hpkp" | ||
security_report_type: str | ||
# Domain name of the blocked address | ||
blocked_host: str | ||
# The CSP directive which was violated | ||
csp_directive: NotRequired[str] | ||
# In the case of a local `script-src` violation, whether it's an `unsafe-inline` or an | ||
# `unsafe-eval` violation | ||
csp_script_violation: NotRequired[str] | ||
|
||
|
||
class SaltedSecurityHashingMetadata(SecurityHashingMetadata, FingerprintHashingMetadata): | ||
""" | ||
Data from security-report-based bybrid fingerprinting | ||
""" | ||
|
||
pass | ||
|
||
|
||
class TemplateHashingMetadata(TypedDict): | ||
""" | ||
Data gathered when grouping errors generated by Django templates | ||
""" | ||
|
||
# The name of the template with the invalid template variable | ||
template_name: NotRequired[str] | ||
# The text of the line in the template containing the invalid variable | ||
template_context_line: NotRequired[str] | ||
|
||
|
||
class SaltedTemplateHashingMetadata(TemplateHashingMetadata, FingerprintHashingMetadata): | ||
""" | ||
Data from template-based bybrid fingerprinting | ||
""" | ||
|
||
pass | ||
|
||
|
||
class ChecksumHashingMetadata(TypedDict): | ||
""" | ||
Data gathered when legacy checksum grouping (wherein a hash is provided directly in the event) | ||
is used | ||
""" | ||
|
||
# The checksum used for grouping | ||
checksum: str | ||
# The incoming checksum value, if it was something other than a 32-digit hex value and we | ||
# therefore had to hash it before using it | ||
raw_checksum: NotRequired[str] | ||
|
||
|
||
class FallbackHashingMetadata(TypedDict): | ||
""" | ||
Data gathered when no other grouping method produces results | ||
""" | ||
|
||
# Whether we landed in the fallback because of a lack of data, because we had a stacktrace but | ||
# all frames were ignored, or some other reason | ||
fallback_reason: str | ||
|
||
|
||
HashingMetadata = ( | ||
FingerprintHashingMetadata | ||
| MessageHashingMetadata | ||
| SaltedMessageHashingMetadata | ||
| StacktraceHashingMetadata | ||
| SaltedStacktraceHashingMetadata | ||
| SecurityHashingMetadata | ||
| SaltedSecurityHashingMetadata | ||
| TemplateHashingMetadata | ||
| SaltedTemplateHashingMetadata | ||
| ChecksumHashingMetadata | ||
| FallbackHashingMetadata | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be repeated
HashingMetadata | None, HashingMetadata | None
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not well documented (that I could find), but this is the setter type and the getter type: https://github.com/typeddjango/django-stubs/blob/ab4dcfa002dbe138b7548086328f13cbde3c41b3/django-stubs/db/models/fields/__init__.pyi#L46-L51.