Skip to content

Fix incorrect parent id for the aws.lambda span #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 17, 2021

Conversation

tianchu
Copy link
Contributor

@tianchu tianchu commented Feb 17, 2021

What does this PR do?

The currentTraceHeaders method calls currentTraceContext, which always return the current trace context. This method is good for getting the trace headers for the injection to downstream calls, because it always sets the parent id to the currently active Datadog span or X-Ray segment. However, it shouldn't be used to determine the parent of the aws.lambda span, because it doesn't matter if the user creates a custom span or segment. The aws.lambda span should always be parented to the root trace context, i.e., the parent span should either be from the incoming trace context or the X-Ray root segment.

Perhaps this wasn't an issue before, because getXraySegment doesn't return anything unless X-Ray active tracing is enabled, but it now always return a segment? But it doesn't matter anymore, getXraySegment shouldn't be called when determining the parent of the aws.lambda span anyway.

To fix the issue, rootTraceHeaders was added to TraceContextService and it always return the trace headers from the root trace context. When determining the aws.lambda span's parent in listener.js, call rootTraceHeaders instead of currentTraceHeaders.

Other changes included:

  • Upgrade dd-trace to 0.31.0 version to allow the Datadog trace context propagate through AWS SDK lambda.invoke().
  • Additional comments
  • Additional debugging logs

Motivation

When the incoming Lambda event object or context object contains a Datadog trace context, the automatically generated aws.lambda span should be parented to it. However, it currently parents the span to the X-Ray segment instead, even when X-Ray active tracing is not enabled. This results the aws.lambda span being an orphan (parent id points to a non-existent span).

Testing Guidelines

Additional Notes

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@tianchu tianchu requested a review from a team as a code owner February 17, 2021 17:24
@tianchu tianchu force-pushed the tian.chu/fix-incorrect-parent-id branch from 8ef2935 to ca0d0ad Compare February 17, 2021 17:27
@codecov-io
Copy link

codecov-io commented Feb 17, 2021

Codecov Report

Merging #166 (7f0363f) into main (1212716) will decrease coverage by 0.20%.
The diff coverage is 78.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
- Coverage   86.73%   86.53%   -0.21%     
==========================================
  Files          31       31              
  Lines        1184     1203      +19     
  Branches      235      236       +1     
==========================================
+ Hits         1027     1041      +14     
- Misses        102      106       +4     
- Partials       55       56       +1     
Impacted Files Coverage Δ
src/handler.ts 0.00% <0.00%> (ø)
src/trace/trace-context-service.ts 93.02% <87.50%> (-4.28%) ⬇️
src/trace/listener.ts 88.57% <88.88%> (ø)
src/trace/context.ts 90.51% <93.75%> (+0.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1212716...7f0363f. Read the comment docs.

@@ -1,4 +1,3 @@
XXXX-XX-XX XX:XX:XX.XXX XXXX-XXXX-XXXX-XXXX-XXXX INFO Snapshot test http requests successfully made to URLs: https://ip-ranges.datadoghq.com,https://ip-ranges.datadoghq.eu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing with the logs below, I don't think this line should be there. Probably missed from previous integration test updates.

XXXX-XX-XX XX:XX:XX.XXX XXXX-XXXX-XXXX-XXXX-XXXX INFO [dd.trace_id=XXXX dd.span_id=XXXX] Processed SNS request
{"traces":[[{"trace_id":"XXXX","span_id":"XXXX","parent_id":"XXXX","name":"aws.lambda","resource":"integration-plugin-dev-async-metrics_node12_with_plugin","error":0,"meta":{"_dd.origin":"lambda","service":"integration-plugin-dev-async-metrics_node12_with_plugin","cold_start":"false","function_arn":"XXXX_node12_with_plugin","function_version":"$LATEST","request_id":"XXXX","resource_names":"integration-plugin-dev-async-metrics_node12_with_plugin","datadog_lambda":"XXXX","dd_trace":"XXXX","_dd.parent_source":"xray","function_trigger.event_source":"sns","function_trigger.event_source_arn":"arn:aws:sns:us-east-2:123456789012:sns-lambda"},"metrics":{"_sample_rate":1,"_sampling_priority_v1":2},"start":XXXX,"duration":XXXX,"service":"aws.lambda","type":"serverless"}]]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the new version of tracer logs the traces a little bit later than the previous version, not sure, but probably doesn't hurt.


// Get the trace headers from the root trace context.
get rootTraceHeaders(): Partial<TraceHeaders> {
const rootTraceContext = this.rootTraceContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we set the root trace context to being the incoming Datadog trace headers. However, when hybrid tracing is enabled, we want to use the PARENT ID from X-Ray but the TRACE ID from Datadog. Is that accounted for here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the root trace context is derived from X-Ray when there is not incoming Datadog trace headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Nick is right about the desired behaviour. Otherwise the parent on the root span swaps between the incoming datadog span, (when it's available), and the root x-ray segment. This was an existing issue, so we don't have to fix it with this PR, but I think we would change the logic here: https://github.com/DataDog/datadog-lambda-js/blob/main/src/trace/context.ts#L88, to set the parentId to the x-ray when merge x-ray traces is enabled.

Copy link
Contributor

@DarcyRaynerDD DarcyRaynerDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Nick's comment is a potential issue we should fix, but I don't think we have to do it in this PR.

@tianchu tianchu merged commit a150798 into main Feb 17, 2021
@tianchu tianchu deleted the tian.chu/fix-incorrect-parent-id branch February 17, 2021 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants