Improve recorded detail of stacktraces #70

larsbutler · 2018-06-22T12:26:17Z

In many X-Ray SDK use cases, including the Django/Flask middleware and
aws_xray_sdk.core.xray_recorder.capture and .capture_async cases, there
is a broad try/except "catch-all" for exceptions which records useful
information in an X-Ray trace in the event of an unexpected exception.

However, the approach (prior to this patch) of using
traceback.extract_stack in the except block to extract stack trace
information isn't terribly useful when viewing X-Ray traces for
debugging purposes. traceback.extract_stack only shows the stack state
up to and including the traceback.extract_stack call in the except
block; it does not include the stack trace information from the caught
exception--which are arguably the most important details to include in
the X-Ray trace! Similarly, including only the exception's stack trace
information can omit important code path context if the
exeception-producing code is wrapped in a decorator. This is a very
subtle but important detail. There is a good discussion of exactly this
problem--specificaly in the context of decorators with a "catch-all"
try/except--on Stackoverflow:
https://stackoverflow.com/questions/14527819/traceback-shows-up-until-decorator.

Description of changes:

This patch introduces the stacktrace utility module and is an attempt
to include more relevant info in recorded X-Ray traces while respecting
the max_trace_back behavior expressed in many places to avoid bloated
trace messages.

To illustrate the difference this makes, consider a trivial Lambda function written in Python:

from aws_xray_sdk.core import xray_recorder 
                                            
                                            
@xray_recorder.capture('handler')           
def handler(event, context):                
    data = foo()                            
    return data                             
                                            
                                            
def foo():                                  
    return bar()                            
                                            
                                                                                   
def bar():                                  
    raise Exception('Something went wrong!')
    return {'data': [1,2,3]}

There are multiple function calls here in order to create an interesting call stack for tracing purposes. To test this function, I've simply built a function zip with the appropriate version of the X-Ray SDK and invoked the function with lambda:Invoke.

With the current implementation of aws_xray_sdk, here's the trace that gets recorded by X-Ray:

Exception: Something went wrong!
at <module> (bootstrap.py:538)
at main (bootstrap.py:533)
at handle_event_request (bootstrap.py:250)

With this patch applied, we get a lot more information about the call stack, as we would expect:

Exception: Something went wrong!
at <module> (bootstrap.py:538)
at main (bootstrap.py:533)
at handle_event_request (bootstrap.py:250)
at handler (myfunction.py:6)
at foo (myfunction.py:11)
at bar (myfunction.py:15)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

I'm not sure how best to test this (since stack information is very runtime-specific) so ideas are welcome. My patch also assumes that this fix applies the original intended behavior so it shouldn't be too controversial, but I'm naturally happy to discuss and attempt alternative solutions. Please let me know if the problem I've described above isn't clear.

larsbutler · 2018-06-25T08:46:01Z

Hey @haotianw465, the CI build failures (related to aiohttp) don't appear to have anything to do with this patch. The issue seems to exist in the previous commit as well: https://travis-ci.org/aws/aws-xray-sdk-python/builds/389484429?utm_source=github_status&utm_medium=notification.

It seems the problem is related to aiohttp versions, but I haven't delved into it yet. Do you have any ideas?

haotianw465 · 2018-06-27T00:17:59Z

Thank you for submitting the PR. There is a known issue on the CI dependencies and I'm working on a fix. I will get back to the PR once the CI is back running. Thank you for your patience.

larsbutler · 2018-06-27T11:03:21Z

@haotianw465 Okay, cool. I'll check back later.

haotianw465 · 2018-08-31T22:58:31Z

Sorry for the late follow up. Thank you for diving deep and contribution.

Do you mind putting two example subsegment outputs with/without the fix recording the same exception? This is both for clarification and future reference should anyone want to know more about this change.

larsbutler · 2018-09-03T08:31:22Z

@haotianw465 Hey, thanks for fixing the build. I'll rebase on that.

Do you mind putting two example subsegment outputs with/without the fix recording the same exception? This is both for clarification and future reference should anyone want to know more about this change.

Sure, I'll come up with a small sample Lambda function and post the before/after traceback details.

Do you want me to record that information in a commit message as well (so that folks digging through the code can find it easier), or is it sufficient to just post it here?

In many X-Ray SDK use cases, including the Django/Flask middleware and aws_xray_sdk.core.xray_recorder.capture and .capture_async cases, there is a broad try/except "catch-all" for exceptions which records useful information in an X-Ray trace in the event of an unexpected exception. However, the approach (prior to this patch) of using `traceback.extract_stack` in the `except` block to extract stack trace information isn't terribly useful when viewing X-Ray traces for debugging purposes. `traceback.extract_stack` only shows the stack state up to and including the `traceback.extract_stack` call in the `except` block; it does not include the stack trace information from the caught exception--which are arguably the most important details to include in the X-Ray trace! Similarly, including _only_ the exception's stack trace information can omit important code path context if the exeception-producing code is wrapped in a decorator. This is a very subtle but important detail. There is a good discussion of exactly this problem--specificaly in the context of decorators with a "catch-all" try/except--on Stackoverflow: https://stackoverflow.com/questions/14527819/traceback-shows-up-until-decorator. This patch introduces the `stacktrace` utility module and is an attempt to include more relevant info in recorded X-Ray traces while respecting the `max_trace_back` behavior expressed in many places to avoid bloated trace messages. To illustrate the differencethis makes, consider a trivial Lambda function written in Python with a single module called `myfunction.py`: ``` from aws_xray_sdk.core import xray_recorder @xray_recorder.capture('handler') def handler(event, context): data = foo() return data def foo(): return bar() def bar(): raise Exception('Something went wrong!') return {'data': [1,2,3]} ``` There are multiple function calls here in order to create an interesting call stack for tracing purposes. With the current implementation of aws_xray_sdk, here's the trace that gets recorded by X-Ray: Exception: Something went wrong! at <module> (bootstrap.py:538) at main (bootstrap.py:533) at handle_event_request (bootstrap.py:250) With this patch applied, we get a lot more information about the call stack, as we would expect: Exception: Something went wrong! at <module> (bootstrap.py:538) at main (bootstrap.py:533) at handle_event_request (bootstrap.py:250) at handler (myfunction.py:6) at foo (myfunction.py:11) at bar (myfunction.py:15) With this we can much more easily trace the origin of a runtime error.

larsbutler · 2018-09-04T12:26:04Z

@haotianw465 I updated the PR text and added the examples as you've request. I also went ahead and decided it would be useful to include in the commit message a summary of the trace differences before and after this patch is applied.

Does that help?

haotianw465 · 2018-09-05T19:36:11Z

Yes that would be great! Thank you for the contribution. I'm merging it in and prepare a new release soon.

haotianw465 added the enhancement label Aug 31, 2018

larsbutler force-pushed the improve-stacktraces branch from 7d02b58 to 2f0e153 Compare September 3, 2018 08:31

larsbutler force-pushed the improve-stacktraces branch from 2f0e153 to 63fbd6d Compare September 4, 2018 12:24

Merge branch 'master' into improve-stacktraces

a0d21a5

haotianw465 approved these changes Sep 5, 2018

View reviewed changes

haotianw465 merged commit 4d79dd5 into aws:master Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve recorded detail of stacktraces #70

Improve recorded detail of stacktraces #70

larsbutler commented Jun 22, 2018 •

edited

Loading

larsbutler commented Jun 25, 2018

haotianw465 commented Jun 27, 2018

larsbutler commented Jun 27, 2018

haotianw465 commented Aug 31, 2018

larsbutler commented Sep 3, 2018

larsbutler commented Sep 4, 2018

haotianw465 commented Sep 5, 2018

Improve recorded detail of stacktraces #70

Improve recorded detail of stacktraces #70

Conversation

larsbutler commented Jun 22, 2018 • edited Loading

larsbutler commented Jun 25, 2018

haotianw465 commented Jun 27, 2018

larsbutler commented Jun 27, 2018

haotianw465 commented Aug 31, 2018

larsbutler commented Sep 3, 2018

larsbutler commented Sep 4, 2018

haotianw465 commented Sep 5, 2018

larsbutler commented Jun 22, 2018 •

edited

Loading