Skip to content

Stream Django SQL queries and add flag to toggle their streaming #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 8, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
CHANGELOG
=========

unreleased
==========
* feature: Stream dbapi2 SQL queries and add flag to toggle their streaming

2.2.0
=====
* feature: Added context managers on segment/subsegment capture. `PR97 <https://github.com/aws/aws-xray-sdk-python/pull/97>`_.
Expand Down Expand Up @@ -32,11 +36,11 @@ CHANGELOG
* **Breaking**: The original sampling modules for local defined rules are moved from `models.sampling` to `models.sampling.local`.
* **Breaking**: The default behavior of `patch_all` changed to selectively patches libraries to avoid double patching. You can use `patch_all(double_patch=True)` to force it to patch ALL supported libraries. See more details on `ISSUE63 <https://github.com/aws/aws-xray-sdk-python/issues/63>`_
* **Breaking**: The latest `botocore` that has new X-Ray service API `GetSamplingRules` and `GetSamplingTargets` are required.
* **Breaking**: Version 2.x doesn't support pynamodb and aiobotocore as it requires botocore >= 1.11.3 which isn’t currently supported by the pynamodb and aiobotocore libraries. Please continue to use version 1.x if you’re using pynamodb or aiobotocore until those haven been updated to use botocore > = 1.11.3.
* **Breaking**: Version 2.x doesn't support pynamodb and aiobotocore as it requires botocore >= 1.11.3 which isn’t currently supported by the pynamodb and aiobotocore libraries. Please continue to use version 1.x if you’re using pynamodb or aiobotocore until those haven been updated to use botocore > = 1.11.3.
* feature: Environment variable `AWS_XRAY_DAEMON_ADDRESS` now takes an additional notation in `tcp:127.0.0.1:2000 udp:127.0.0.2:2001` to set TCP and UDP destination separately. By default it assumes a X-Ray daemon listening to both UDP and TCP traffic on `127.0.0.1:2000`.
* feature: Added MongoDB python client support. `PR65 <https://github.com/aws/aws-xray-sdk-python/pull/65>`_.
* bugfix: Support binding connection in sqlalchemy as well as engine. `PR78 <https://github.com/aws/aws-xray-sdk-python/pull/78>`_.
* bugfix: Flask middleware safe request teardown. `ISSUE75 <https://github.com/aws/aws-xray-sdk-python/issues/75>`_.
* bugfix: Support binding connection in sqlalchemy as well as engine. `PR78 <https://github.com/aws/aws-xray-sdk-python/pull/78>`_.
* bugfix: Flask middleware safe request teardown. `ISSUE75 <https://github.com/aws/aws-xray-sdk-python/issues/75>`_.


1.1.2
Expand Down Expand Up @@ -68,7 +72,7 @@ CHANGELOG
* bugfix: Fixed an issue where arbitrary fields in trace header being dropped when calling downstream.
* bugfix: Fixed a compatibility issue between botocore and httplib patcher. `ISSUE48 <https://github.com/aws/aws-xray-sdk-python/issues/48>`_.
* bugfix: Fixed a typo in sqlalchemy decorators. `PR50 <https://github.com/aws/aws-xray-sdk-python/pull/50>`_.
* Updated `README` with more usage examples.
* Updated `README` with more usage examples.

0.97
====
Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,20 @@ with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
pass
```

### Trace SQL queries
By default, if no other value is provided to `.configure()`, SQL trace streaming is enabled
for all the supported DB engines. Those currently are:
- Any engine attached to the Django ORM.
- Any engine attached to SQLAlchemy.
- SQLite3.

The behaviour can be toggled by sending the appropriate `stream_sql` value, for example:
```python
from aws_xray_sdk.core import xray_recorder

xray_recorder.configure(service='fallback_name', stream_sql=True)
```

### Patch third-party libraries

```python
Expand All @@ -260,7 +274,8 @@ libs_to_patch = ('boto3', 'mysql', 'requests')
patch(libs_to_patch)
```

### Add Django middleware
### Django
#### Add middleware

In django settings.py, use the following.

Expand All @@ -275,6 +290,10 @@ MIDDLEWARE = [
# ... other middlewares
]
```
#### SQL tracing
If Django's ORM is patched - either using the `AUTO_INSTRUMENT = True` in your settings file
or explicitly calling `patch_db()` - the SQL query trace streaming can be enabled or disabled
updating the `STREAM_SQL` variable in your settings file.

### Add Flask middleware

Expand Down
15 changes: 14 additions & 1 deletion aws_xray_sdk/core/recorder.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ def __init__(self):
self._dynamic_naming = None
self._aws_metadata = copy.deepcopy(XRAY_META)
self._origin = None
self._stream_sql = False

if type(self.sampler).__name__ == 'DefaultSampler':
self.sampler.load_settings(DaemonConfig(), self.context)
Expand All @@ -81,7 +82,8 @@ def configure(self, sampling=None, plugins=None,
daemon_address=None, service=None,
context=None, emitter=None, streaming=None,
dynamic_naming=None, streaming_threshold=None,
max_trace_back=None, sampler=None):
max_trace_back=None, sampler=None,
stream_sql=True):
"""Configure global X-Ray recorder.

Configure needs to run before patching thrid party libraries
Expand Down Expand Up @@ -130,6 +132,7 @@ class to have your own implementation of the streaming process.
maximum number of subsegments within a segment.
:param int max_trace_back: The maxinum number of stack traces recorded
by auto-capture. Lower this if a single document becomes too large.
:param bool stream_sql: Whether SQL query texts should be streamed.

Environment variables AWS_XRAY_DAEMON_ADDRESS, AWS_XRAY_CONTEXT_MISSING
and AWS_XRAY_TRACING_NAME respectively overrides arguments
Expand Down Expand Up @@ -159,6 +162,8 @@ class to have your own implementation of the streaming process.
self.streaming_threshold = streaming_threshold
if max_trace_back:
self.max_trace_back = max_trace_back
if stream_sql is not None:
self.stream_sql = stream_sql

if plugins:
plugin_modules = get_plugin_modules(plugins)
Expand Down Expand Up @@ -548,3 +553,11 @@ def max_trace_back(self):
@max_trace_back.setter
def max_trace_back(self, value):
self._max_trace_back = value

@property
def stream_sql(self):
return self._stream_sql

@stream_sql.setter
def stream_sql(self, value):
self._stream_sql = value
10 changes: 6 additions & 4 deletions aws_xray_sdk/ext/dbapi2.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,23 @@ def __enter__(self):
@xray_recorder.capture()
def execute(self, query, *args, **kwargs):

add_sql_meta(self._xray_meta)
add_sql_meta(self._xray_meta, query)
return self.__wrapped__.execute(query, *args, **kwargs)

@xray_recorder.capture()
def executemany(self, query, *args, **kwargs):

add_sql_meta(self._xray_meta)
add_sql_meta(self._xray_meta, query)
return self.__wrapped__.executemany(query, *args, **kwargs)

@xray_recorder.capture()
def callproc(self, proc, args):

add_sql_meta(self._xray_meta)
add_sql_meta(self._xray_meta, proc)
return self.__wrapped__.callproc(proc, args)


def add_sql_meta(meta):
def add_sql_meta(meta, query):

subsegment = xray_recorder.current_subsegment()

Expand All @@ -72,5 +72,7 @@ def add_sql_meta(meta):
sql_meta = copy.copy(meta)
if sql_meta.get('name', None):
del sql_meta['name']
if xray_recorder.stream_sql:
sql_meta['sanitized_query'] = query
subsegment.set_sql(sql_meta)
subsegment.namespace = 'remote'
1 change: 1 addition & 0 deletions aws_xray_sdk/ext/django/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ def ready(self):
dynamic_naming=settings.DYNAMIC_NAMING,
streaming_threshold=settings.STREAMING_THRESHOLD,
max_trace_back=settings.MAX_TRACE_BACK,
stream_sql=settings.STREAM_SQL,
)

# if turned on subsegment will be generated on
Expand Down
1 change: 1 addition & 0 deletions aws_xray_sdk/ext/django/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
'DYNAMIC_NAMING': None,
'STREAMING_THRESHOLD': None,
'MAX_TRACE_BACK': None,
'STREAM_SQL': False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. If by default the sql should captured, should this be set to True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, thanks for your comment. I meant it is enabled by default as this is the current behaviour for SQLAlchemy, so I wanted to keep it. But for the rest (in this case Django), I have set it off by default, as it is the behaviour previous to this PR. Hope this makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I would not consider an enhancement to be concerned. We have some other enhancements that added more data and haven't seen any issue.

The SQLAlchemy query capture was also submitted as a PR. In fact the SDK means to have query capture as the default behavior. It's just due to security concerns it has different development and review cycle. I would suggest to keep the query capture behavior consistent, which that behavior is to have parameterized query ready whenever possible. Thoughts?

Copy link
Contributor Author

@hasier hasier Dec 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the default behaviour, I agree, I'll change it so that capture is enabled by default.

Regarding the parametrised queries, afaik there is no way to ensure that the incoming query is a parametrised one... DBAPI2 (PEP 249) recommends this behaviour, but then it is up to the implementation to decide how the queries and parameters are formatted.
Is this the concern you have? Would it be better to attempt the patch in a different place? The only one I can think of is to attempt to patch the Django ORM (I haven't tried, so I don't know how feasible it is). I mention the Django ORM as, for example, psycopg2 also allows different Cursor classes to be used with its connections, which might lead to the same situation again.
But it is also true that it is this XRay SDK the one that controls which drivers are patched, as the patch needs to be included in a module here. As far as I can see, those that implement DBAPI2 and are included here for now are just Django ORM, SQLAlchemy and SQLite, so I guess it should not be that big a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's another thing I'm going to mention besides the default config value. For SQLAlchemy the patcher actually works on the implementation level and there are tests against parameterized queries. https://github.com/aws/aws-xray-sdk-python/blob/master/tests/ext/sqlalchemy/test_query.py.

DBAPI level query capture probably will not always work but if it does work for the current patchers (Django or SQlite) I'm OK to move forward with an internal security review. If not I would suggest to have the toggle function just for SQLAlchemy and everything else in the separate PRs for each actual patcher.

You can see the PR for SQLAlchemy query capture PR here: #34

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the patch for DBAPI2 and just did so for Django. Please let me know if it makes sense now.

}

XRAY_NAMESPACE = 'XRAY_RECORDER'
Expand Down
3 changes: 2 additions & 1 deletion aws_xray_sdk/ext/sqlalchemy/util/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ def wrapper(*args, **kw):
if isinstance(arg, XRayQuery):
try:
sql = parse_bind(arg.session.bind)
sql['sanitized_query'] = str(arg)
if xray_recorder.stream_sql:
sql['sanitized_query'] = str(arg)
except Exception:
sql = None
if sql is not None:
Expand Down
13 changes: 9 additions & 4 deletions tests/ext/flask_sqlalchemy/test_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,15 @@ class User(db.Model):
password = db.Column(db.String(255), nullable=False)


@pytest.fixture()
def session():
@pytest.fixture(
params=[
False,
True,
],
)
def session(request):
"""Test Fixture to Create DataBase Tables and start a trace segment"""
xray_recorder.configure(service='test', sampling=False, context=Context())
xray_recorder.configure(service='test', sampling=False, context=Context(), stream_sql=request.param)
xray_recorder.clear_trace_entities()
xray_recorder.begin_segment('SQLAlchemyTest')
db.create_all()
Expand All @@ -41,8 +46,8 @@ def test_all(capsys, session):
User.query.all()
subsegment = find_subsegment_by_annotation(xray_recorder.current_segment(), 'sqlalchemy', 'sqlalchemy.orm.query.all')
assert subsegment['annotations']['sqlalchemy'] == 'sqlalchemy.orm.query.all'
assert subsegment['sql']['sanitized_query']
assert subsegment['sql']['url']
assert bool(subsegment['sql'].get('sanitized_query', None)) is xray_recorder.stream_sql


def test_add(capsys, session):
Expand Down
25 changes: 22 additions & 3 deletions tests/ext/psycopg2/test_psycopg2.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,34 @@
patch(('psycopg2',))


@pytest.fixture(autouse=True)
def construct_ctx():
@pytest.fixture(
autouse=True,
params=[
False,
True,
],
)
def construct_ctx(request):
"""
Clean up context storage on each test run and begin a segment
so that later subsegment can be attached. After each test run
it cleans up context storage again.
"""
xray_recorder.configure(service='test', sampling=False, context=Context())
xray_recorder.configure(service='test', sampling=False, context=Context(), stream_sql=request.param)
xray_recorder.clear_trace_entities()
xray_recorder.begin_segment('name')
yield
xray_recorder.clear_trace_entities()


def _assert_query(sql_meta, query):
if xray_recorder.stream_sql:
assert 'sanitized_query' in sql_meta
assert sql_meta['sanitized_query'] == query
else:
assert 'sanitized_query' not in sql_meta


def test_execute_dsn_kwargs():
q = 'SELECT 1'
with testing.postgresql.Postgresql() as postgresql:
Expand All @@ -46,6 +60,7 @@ def test_execute_dsn_kwargs():
assert sql['user'] == dsn['user']
assert sql['url'] == url
assert sql['database_version']
_assert_query(sql, q)


def test_execute_dsn_kwargs_alt_dbname():
Expand All @@ -72,6 +87,7 @@ def test_execute_dsn_kwargs_alt_dbname():
assert sql['user'] == dsn['user']
assert sql['url'] == url
assert sql['database_version']
_assert_query(sql, q)


def test_execute_dsn_string():
Expand All @@ -94,6 +110,7 @@ def test_execute_dsn_string():
assert sql['user'] == dsn['user']
assert sql['url'] == url
assert sql['database_version']
_assert_query(sql, q)


def test_execute_in_pool():
Expand All @@ -117,6 +134,7 @@ def test_execute_in_pool():
assert sql['user'] == dsn['user']
assert sql['url'] == url
assert sql['database_version']
_assert_query(sql, q)


def test_execute_bad_query():
Expand Down Expand Up @@ -145,6 +163,7 @@ def test_execute_bad_query():

exception = subsegment.cause['exceptions'][0]
assert exception.type == 'ProgrammingError'
_assert_query(sql, q)


def test_register_extensions():
Expand Down