-
Notifications
You must be signed in to change notification settings - Fork 145
Improve recorded detail of stacktraces #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @haotianw465, the CI build failures (related to It seems the problem is related to aiohttp versions, but I haven't delved into it yet. Do you have any ideas? |
Thank you for submitting the PR. There is a known issue on the CI dependencies and I'm working on a fix. I will get back to the PR once the CI is back running. Thank you for your patience. |
@haotianw465 Okay, cool. I'll check back later. |
Sorry for the late follow up. Thank you for diving deep and contribution. Do you mind putting two example subsegment outputs with/without the fix recording the same exception? This is both for clarification and future reference should anyone want to know more about this change. |
@haotianw465 Hey, thanks for fixing the build. I'll rebase on that.
Sure, I'll come up with a small sample Lambda function and post the before/after traceback details. Do you want me to record that information in a commit message as well (so that folks digging through the code can find it easier), or is it sufficient to just post it here? |
7d02b58
to
2f0e153
Compare
In many X-Ray SDK use cases, including the Django/Flask middleware and aws_xray_sdk.core.xray_recorder.capture and .capture_async cases, there is a broad try/except "catch-all" for exceptions which records useful information in an X-Ray trace in the event of an unexpected exception. However, the approach (prior to this patch) of using `traceback.extract_stack` in the `except` block to extract stack trace information isn't terribly useful when viewing X-Ray traces for debugging purposes. `traceback.extract_stack` only shows the stack state up to and including the `traceback.extract_stack` call in the `except` block; it does not include the stack trace information from the caught exception--which are arguably the most important details to include in the X-Ray trace! Similarly, including _only_ the exception's stack trace information can omit important code path context if the exeception-producing code is wrapped in a decorator. This is a very subtle but important detail. There is a good discussion of exactly this problem--specificaly in the context of decorators with a "catch-all" try/except--on Stackoverflow: https://stackoverflow.com/questions/14527819/traceback-shows-up-until-decorator. This patch introduces the `stacktrace` utility module and is an attempt to include more relevant info in recorded X-Ray traces while respecting the `max_trace_back` behavior expressed in many places to avoid bloated trace messages. To illustrate the differencethis makes, consider a trivial Lambda function written in Python with a single module called `myfunction.py`: ``` from aws_xray_sdk.core import xray_recorder @xray_recorder.capture('handler') def handler(event, context): data = foo() return data def foo(): return bar() def bar(): raise Exception('Something went wrong!') return {'data': [1,2,3]} ``` There are multiple function calls here in order to create an interesting call stack for tracing purposes. With the current implementation of aws_xray_sdk, here's the trace that gets recorded by X-Ray: Exception: Something went wrong! at <module> (bootstrap.py:538) at main (bootstrap.py:533) at handle_event_request (bootstrap.py:250) With this patch applied, we get a lot more information about the call stack, as we would expect: Exception: Something went wrong! at <module> (bootstrap.py:538) at main (bootstrap.py:533) at handle_event_request (bootstrap.py:250) at handler (myfunction.py:6) at foo (myfunction.py:11) at bar (myfunction.py:15) With this we can much more easily trace the origin of a runtime error.
2f0e153
to
63fbd6d
Compare
@haotianw465 I updated the PR text and added the examples as you've request. I also went ahead and decided it would be useful to include in the commit message a summary of the trace differences before and after this patch is applied. Does that help? |
Yes that would be great! Thank you for the contribution. I'm merging it in and prepare a new release soon. |
In many X-Ray SDK use cases, including the Django/Flask middleware and
aws_xray_sdk.core.xray_recorder.capture and .capture_async cases, there
is a broad try/except "catch-all" for exceptions which records useful
information in an X-Ray trace in the event of an unexpected exception.
However, the approach (prior to this patch) of using
traceback.extract_stack
in theexcept
block to extract stack traceinformation isn't terribly useful when viewing X-Ray traces for
debugging purposes.
traceback.extract_stack
only shows the stack stateup to and including the
traceback.extract_stack
call in theexcept
block; it does not include the stack trace information from the caught
exception--which are arguably the most important details to include in
the X-Ray trace! Similarly, including only the exception's stack trace
information can omit important code path context if the
exeception-producing code is wrapped in a decorator. This is a very
subtle but important detail. There is a good discussion of exactly this
problem--specificaly in the context of decorators with a "catch-all"
try/except--on Stackoverflow:
https://stackoverflow.com/questions/14527819/traceback-shows-up-until-decorator.
Description of changes:
This patch introduces the
stacktrace
utility module and is an attemptto include more relevant info in recorded X-Ray traces while respecting
the
max_trace_back
behavior expressed in many places to avoid bloatedtrace messages.
To illustrate the difference this makes, consider a trivial Lambda function written in Python:
There are multiple function calls here in order to create an interesting call stack for tracing purposes. To test this function, I've simply built a function zip with the appropriate version of the X-Ray SDK and invoked the function with
lambda:Invoke
.With the current implementation of
aws_xray_sdk
, here's the trace that gets recorded by X-Ray:With this patch applied, we get a lot more information about the call stack, as we would expect:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
I'm not sure how best to test this (since stack information is very runtime-specific) so ideas are welcome. My patch also assumes that this fix applies the original intended behavior so it shouldn't be too controversial, but I'm naturally happy to discuss and attempt alternative solutions. Please let me know if the problem I've described above isn't clear.