-
Notifications
You must be signed in to change notification settings - Fork 55
feat: trace improvements #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ging into trace_improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm: high art of regex, bravo!
_DJANGO_TRACE_HEADER = "HTTP_X_CLOUD_TRACE_CONTEXT" | ||
_DJANGO_XCLOUD_TRACE_HEADER = "HTTP_X_CLOUD_TRACE_CONTEXT" | ||
_DJANGO_TRACEPARENT = "HTTP_TRACEPARENT" | ||
_DJANGO_USERAGENT_HEADER = "HTTP_USER_AGENT" | ||
_DJANGO_REMOTE_ADDR_HEADER = "REMOTE_ADDR" | ||
_DJANGO_REFERER_HEADER = "HTTP_REFERER" | ||
_FLASK_TRACE_HEADER = "X_CLOUD_TRACE_CONTEXT" | ||
_FLASK_XCLOUD_TRACE_HEADER = "X_CLOUD_TRACE_CONTEXT" | ||
_FLASK_TRACEPARENT = "TRACEPARENT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since these headers are standardized, is there a need to maintain an instance per web server framework?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for some reason django uses "HTTP_" as a prefix when reading most HTTP headers.
At some point as we add more framework support, we may want to revisit this to add the prefix later in django code instead of in the constant, but for now I'd prefer to leave it as-is
Data related to the current http request, trace_id, and span_id for | ||
the request. All fields will be None if a django request isn't | ||
found. | ||
Tuple[Optional[dict], Optional[str], Optional[str], bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't bool be optional too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the LogEntry spec says if the sampling decision is uncertain, you should default the trace_sampled
field to False. So this function will always return True or False for that field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the code immediately below returns None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good catch! I started with it being optional before switching. I'll look through the code again
Data related to the current http request, trace_id, and span_id for | ||
the request. All fields will be None if a django request isn't | ||
found. | ||
Tuple[Optional[dict], Optional[str], Optional[str], bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same question about Optional[] here
# see https://www.w3.org/TR/trace-context/ for W3C traceparent format | ||
if header: | ||
try: | ||
VERSION_PART = r"(?!ff)[a-f\d]{2}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: right now the version must be 00
. other versions are considered invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you handling other versions? Just ignoring anything other than 00
?
It looks like nodejs is assuming forward-compatibility, based on this part of the spec ("The current version of this specification assumes that future versions of the traceparent header will be additive to the current one."). I'm fine with locking to version 00 for now though if we decide that's safer. Let me know if you have thoughts.
header_suffix = split_header[1] | ||
# the span is the set of alphanumeric characters after the / | ||
span_id = re.findall(r"^\w+", header_suffix)[0] | ||
regex = r"([\w-]+)?(\/?([\w-]+))?(;?o=(\d))?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: according to our doc TRACE_ID is 32-character hexadec value that can be presented via acceptible UUID formats. SPAN_ID is a decimal value. adding it because tracing of the w3c was more strict. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm kind of curious on your thoughts here. I made the w3c code more strict, because the spec is pretty explicit about dropping invalid headers (which makes sense for an open spec if you want to ensure all implementations are interoperable).
I thought we might want to be a bit more permissible in the X-Cloud-Trace-Context
parsing, and leave the correct formatting to the user. But maybe it would make sense to be strict for both?
@@ -534,6 +534,7 @@ def test_handlers_w_extras(self): | |||
extra = { | |||
"trace": "123", | |||
"span_id": "456", | |||
"trace_sampled": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do you want to add a regression test (sampled=false or omitted)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I added a couple more assertions to this file
expected_agent = "Mozilla/5.0" | ||
expected_trace = "4bf92f3577b34da6a3ce929d0e0e4736" | ||
expected_span = "00f067aa0ba902b7" | ||
combined_trace = f"00-{expected_trace}-{expected_span}-03" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might want to set the flags field to 01
to keep aligned with other flags of the standard.
on the same account would you like to add sampling=false test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace_sampled should be True if the last bit in the flags byte string is True. I wanted to test the more complex case of "0...011" instead of the simple case of 0...001
. (The 01 case and others are thoroughly tested elsewhere in test__helpers. test_handlers is slightly more high level)
expected_agent = "Mozilla/5.0" | ||
expected_trace = "4bf92f3577b34da6a3ce929d0e0e4736" | ||
expected_span = "00f067aa0ba902b7" | ||
trace_header = f"00-{expected_trace}-{expected_span}-09" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does flags=09 has a special meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, I wanted to test with a flags field with multiple bits set (b1001
), to make it more interesting. Most of the trace_sampled tests are in the test__helpers.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR improves trace detection by adding the following features:
traceparent
header formattrace_sampled
fieldThese changes should bring the python library in line with how nodejs currently works.
I also added a number of unit, system, and environment tests to make sure the new features work as expected throughout the stack
Fixes #321