Skip to content

Commit 6293212

Browse files
Dan RogersCircleCI
Dan Rogers
authored and
CircleCI
committed
Add support for regular expression matching and sanitizing of headers in WSGI. (open-telemetry#1402)
1 parent 828e7f1 commit 6293212

File tree

3 files changed

+140
-48
lines changed

3 files changed

+140
-48
lines changed

CHANGELOG.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1616
([#1369](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1369))
1717
- `opentelemetry-instrumentation-system-metrics` add supports to collect system thread count. ([#1339](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1339))
1818
- `opentelemetry-exporter-richconsole` Fixing RichConsoleExpoter to allow multiple traces, fixing duplicate spans and include resources ([#1336](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1336))
19-
- `opentelemetry-instrumentation-asgi` Add support for regular expression matching of HTTP headers.
19+
- `opentelemetry-instrumentation-asgi` Add support for regular expression matching and sanitization of HTTP headers.
2020
([#1333](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1333))
2121
- `opentelemetry-instrumentation-asgi` metrics record target attribute (FastAPI only)
2222
([#1323](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1323))
23+
- `opentelemetry-instrumentation-wsgi` Add support for regular expression matching and sanitization of HTTP headers.
24+
([#1402](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1402))
2325

2426
### Fixed
2527

instrumentation/opentelemetry-instrumentation-wsgi/src/opentelemetry/instrumentation/wsgi/__init__.py

+103-44
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,15 @@ def GET(self):
8585
Request/Response hooks
8686
**********************
8787
88-
Utilize request/response hooks to execute custom logic to be performed before/after performing a request. Environ is an instance of WSGIEnvironment.
89-
Response_headers is a list of key-value (tuples) representing the response headers returned from the response.
88+
This instrumentation supports request and response hooks. These are functions that get called
89+
right after a span is created for a request and right before the span is finished for the response.
90+
91+
- The client request hook is called with the internal span and an instance of WSGIEnvironment when the method
92+
``receive`` is called.
93+
- The client response hook is called with the internal span, the status of the response and a list of key-value (tuples)
94+
representing the response headers returned from the response when the method ``send`` is called.
95+
96+
For example,
9097
9198
.. code-block:: python
9299
@@ -102,54 +109,93 @@ def response_hook(span: Span, environ: WSGIEnvironment, status: str, response_he
102109
103110
Capture HTTP request and response headers
104111
*****************************************
105-
You can configure the agent to capture predefined HTTP headers as span attributes, according to the `semantic convention <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md#http-request-and-response-headers>`_.
112+
You can configure the agent to capture specified HTTP headers as span attributes, according to the
113+
`semantic convention <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md#http-request-and-response-headers>`_.
106114
107115
Request headers
108116
***************
109-
To capture predefined HTTP request headers as span attributes, set the environment variable ``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST``
110-
to a comma-separated list of HTTP header names.
117+
To capture HTTP request headers as span attributes, set the environment variable
118+
``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST`` to a comma delimited list of HTTP header names.
111119
112120
For example,
113-
114121
::
115122
116123
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST="content-type,custom_request_header"
117124
118-
will extract ``content-type`` and ``custom_request_header`` from request headers and add them as span attributes.
125+
will extract ``content-type`` and ``custom_request_header`` from the request headers and add them as span attributes.
119126
120-
It is recommended that you should give the correct names of the headers to be captured in the environment variable.
121-
Request header names in wsgi are case insensitive and - characters are replaced by _. So, giving header name as ``CUStom_Header`` in environment variable will be able capture header with name ``custom-header``.
127+
Request header names in WSGI are case-insensitive and ``-`` characters are replaced by ``_``. So, giving the header
128+
name as ``CUStom_Header`` in the environment variable will capture the header named ``custom-header``.
122129
123-
The name of the added span attribute will follow the format ``http.request.header.<header_name>`` where ``<header_name>`` being the normalized HTTP header name (lowercase, with - characters replaced by _ ).
124-
The value of the attribute will be single item list containing all the header values.
130+
Regular expressions may also be used to match multiple headers that correspond to the given pattern. For example:
131+
::
132+
133+
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST="Accept.*,X-.*"
134+
135+
Would match all request headers that start with ``Accept`` and ``X-``.
136+
137+
To capture all request headers, set ``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST`` to ``".*"``.
138+
::
125139
126-
Example of the added span attribute,
140+
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*"
141+
142+
The name of the added span attribute will follow the format ``http.request.header.<header_name>`` where ``<header_name>``
143+
is the normalized HTTP header name (lowercase, with ``-`` replaced by ``_``). The value of the attribute will be a
144+
single item list containing all the header values.
145+
146+
For example:
127147
``http.request.header.custom_request_header = ["<value1>,<value2>"]``
128148
129149
Response headers
130150
****************
131-
To capture predefined HTTP response headers as span attributes, set the environment variable ``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE``
132-
to a comma-separated list of HTTP header names.
151+
To capture HTTP response headers as span attributes, set the environment variable
152+
``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE`` to a comma delimited list of HTTP header names.
133153
134154
For example,
135-
136155
::
137156
138157
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE="content-type,custom_response_header"
139158
140-
will extract ``content-type`` and ``custom_response_header`` from response headers and add them as span attributes.
159+
will extract ``content-type`` and ``custom_response_header`` from the response headers and add them as span attributes.
160+
161+
Response header names in WSGI are case-insensitive. So, giving the header name as ``CUStom-Header`` in the environment
162+
variable will capture the header named ``custom-header``.
163+
164+
Regular expressions may also be used to match multiple headers that correspond to the given pattern. For example:
165+
::
166+
167+
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE="Content.*,X-.*"
168+
169+
Would match all response headers that start with ``Content`` and ``X-``.
141170
142-
It is recommended that you should give the correct names of the headers to be captured in the environment variable.
143-
Response header names captured in wsgi are case insensitive. So, giving header name as ``CUStomHeader`` in environment variable will be able capture header with name ``customheader``.
171+
To capture all response headers, set ``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE`` to ``".*"``.
172+
::
173+
174+
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*"
144175
145-
The name of the added span attribute will follow the format ``http.response.header.<header_name>`` where ``<header_name>`` being the normalized HTTP header name (lowercase, with - characters replaced by _ ).
146-
The value of the attribute will be single item list containing all the header values.
176+
The name of the added span attribute will follow the format ``http.response.header.<header_name>`` where ``<header_name>``
177+
is the normalized HTTP header name (lowercase, with ``-`` replaced by ``_``). The value of the attribute will be a
178+
single item list containing all the header values.
147179
148-
Example of the added span attribute,
180+
For example:
149181
``http.response.header.custom_response_header = ["<value1>,<value2>"]``
150182
183+
Sanitizing headers
184+
******************
185+
In order to prevent storing sensitive data such as personally identifiable information (PII), session keys, passwords,
186+
etc, set the environment variable ``OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS``
187+
to a comma delimited list of HTTP header names to be sanitized. Regexes may be used, and all header names will be
188+
matched in a case-insensitive manner.
189+
190+
For example,
191+
::
192+
193+
export OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS=".*session.*,set-cookie"
194+
195+
will replace the value of headers such as ``session-id`` and ``set-cookie`` with ``[REDACTED]`` in the span.
196+
151197
Note:
152-
Environment variable names to capture http headers are still experimental, and thus are subject to change.
198+
The environment variable names used to capture HTTP headers are still experimental, and thus are subject to change.
153199
154200
API
155201
---
@@ -172,8 +218,10 @@ def response_hook(span: Span, environ: WSGIEnvironment, status: str, response_he
172218
from opentelemetry.semconv.trace import SpanAttributes
173219
from opentelemetry.trace.status import Status, StatusCode
174220
from opentelemetry.util.http import (
221+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS,
175222
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST,
176223
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE,
224+
SanitizeValue,
177225
get_custom_headers,
178226
normalise_request_header_name,
179227
normalise_response_header_name,
@@ -293,38 +341,49 @@ def collect_custom_request_headers_attributes(environ):
293341
"""Returns custom HTTP request headers which are configured by the user
294342
from the PEP3333-conforming WSGI environ to be used as span creation attributes as described
295343
in the specification https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md#http-request-and-response-headers"""
296-
attributes = {}
297-
custom_request_headers_name = get_custom_headers(
298-
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST
344+
345+
sanitize = SanitizeValue(
346+
get_custom_headers(
347+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS
348+
)
349+
)
350+
351+
headers = {
352+
key[_CARRIER_KEY_PREFIX_LEN:].replace("_", "-"): val
353+
for key, val in environ.items()
354+
if key.startswith(_CARRIER_KEY_PREFIX)
355+
}
356+
357+
return sanitize.sanitize_header_values(
358+
headers,
359+
get_custom_headers(
360+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST
361+
),
362+
normalise_request_header_name,
299363
)
300-
for header_name in custom_request_headers_name:
301-
wsgi_env_var = header_name.upper().replace("-", "_")
302-
header_values = environ.get(f"HTTP_{wsgi_env_var}")
303-
if header_values:
304-
key = normalise_request_header_name(header_name)
305-
attributes[key] = [header_values]
306-
return attributes
307364

308365

309366
def collect_custom_response_headers_attributes(response_headers):
310367
"""Returns custom HTTP response headers which are configured by the user from the
311368
PEP3333-conforming WSGI environ as described in the specification
312369
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md#http-request-and-response-headers"""
313-
attributes = {}
314-
custom_response_headers_name = get_custom_headers(
315-
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE
370+
371+
sanitize = SanitizeValue(
372+
get_custom_headers(
373+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS
374+
)
316375
)
317376
response_headers_dict = {}
318377
if response_headers:
319-
for header_name, header_value in response_headers:
320-
response_headers_dict[header_name.lower()] = header_value
321-
322-
for header_name in custom_response_headers_name:
323-
header_values = response_headers_dict.get(header_name.lower())
324-
if header_values:
325-
key = normalise_response_header_name(header_name)
326-
attributes[key] = [header_values]
327-
return attributes
378+
response_headers_dict = dict(response_headers)
379+
380+
return sanitize.sanitize_header_values(
381+
response_headers_dict,
382+
get_custom_headers(
383+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE
384+
),
385+
normalise_response_header_name,
386+
)
328387

329388

330389
def _parse_status_code(resp_status):

instrumentation/opentelemetry-instrumentation-wsgi/tests/test_wsgi_middleware.py

+34-3
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
from opentelemetry.test.wsgitestutil import WsgiTestBase
3131
from opentelemetry.trace import StatusCode
3232
from opentelemetry.util.http import (
33+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS,
3334
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST,
3435
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE,
3536
)
@@ -98,6 +99,15 @@ def wsgi_with_custom_response_headers(environ, start_response):
9899
("content-type", "text/plain; charset=utf-8"),
99100
("content-length", "100"),
100101
("my-custom-header", "my-custom-value-1,my-custom-header-2"),
102+
(
103+
"my-custom-regex-header-1",
104+
"my-custom-regex-value-1,my-custom-regex-value-2",
105+
),
106+
(
107+
"My-Custom-Regex-Header-2",
108+
"my-custom-regex-value-3,my-custom-regex-value-4",
109+
),
110+
("My-Secret-Header", "My Secret Value"),
101111
],
102112
)
103113
return [b"*"]
@@ -521,7 +531,8 @@ def iterate_response(self, response):
521531
@mock.patch.dict(
522532
"os.environ",
523533
{
524-
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST: "Custom-Test-Header-1,Custom-Test-Header-2,Custom-Test-Header-3",
534+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS: ".*my-secret.*",
535+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST: "Custom-Test-Header-1,Custom-Test-Header-2,Custom-Test-Header-3,Regex-Test-Header-.*,Regex-Invalid-Test-Header-.*,.*my-secret.*",
525536
},
526537
)
527538
def test_custom_request_headers_non_recording_span(self):
@@ -531,6 +542,9 @@ def test_custom_request_headers_non_recording_span(self):
531542
{
532543
"HTTP_CUSTOM_TEST_HEADER_1": "Test Value 2",
533544
"HTTP_CUSTOM_TEST_HEADER_2": "TestValue2,TestValue3",
545+
"HTTP_REGEX_TEST_HEADER_1": "Regex Test Value 1",
546+
"HTTP_REGEX_TEST_HEADER_2": "RegexTestValue2,RegexTestValue3",
547+
"HTTP_MY_SECRET_HEADER": "My Secret Value",
534548
}
535549
)
536550
app = otel_wsgi.OpenTelemetryMiddleware(
@@ -544,14 +558,18 @@ def test_custom_request_headers_non_recording_span(self):
544558
@mock.patch.dict(
545559
"os.environ",
546560
{
547-
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST: "Custom-Test-Header-1,Custom-Test-Header-2,Custom-Test-Header-3"
561+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS: ".*my-secret.*",
562+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST: "Custom-Test-Header-1,Custom-Test-Header-2,Custom-Test-Header-3,Regex-Test-Header-.*,Regex-Invalid-Test-Header-.*,.*my-secret.*",
548563
},
549564
)
550565
def test_custom_request_headers_added_in_server_span(self):
551566
self.environ.update(
552567
{
553568
"HTTP_CUSTOM_TEST_HEADER_1": "Test Value 1",
554569
"HTTP_CUSTOM_TEST_HEADER_2": "TestValue2,TestValue3",
570+
"HTTP_REGEX_TEST_HEADER_1": "Regex Test Value 1",
571+
"HTTP_REGEX_TEST_HEADER_2": "RegexTestValue2,RegexTestValue3",
572+
"HTTP_MY_SECRET_HEADER": "My Secret Value",
555573
}
556574
)
557575
app = otel_wsgi.OpenTelemetryMiddleware(simple_wsgi)
@@ -563,6 +581,11 @@ def test_custom_request_headers_added_in_server_span(self):
563581
"http.request.header.custom_test_header_2": (
564582
"TestValue2,TestValue3",
565583
),
584+
"http.request.header.regex_test_header_1": ("Regex Test Value 1",),
585+
"http.request.header.regex_test_header_2": (
586+
"RegexTestValue2,RegexTestValue3",
587+
),
588+
"http.request.header.my_secret_header": ("[REDACTED]",),
566589
}
567590
self.assertSpanHasAttributes(span, expected)
568591

@@ -595,7 +618,8 @@ def test_custom_request_headers_not_added_in_internal_span(self):
595618
@mock.patch.dict(
596619
"os.environ",
597620
{
598-
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE: "content-type,content-length,my-custom-header,invalid-header"
621+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS: ".*my-secret.*",
622+
OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE: "content-type,content-length,my-custom-header,invalid-header,my-custom-regex-header-.*,invalid-regex-header-.*,.*my-secret.*",
599623
},
600624
)
601625
def test_custom_response_headers_added_in_server_span(self):
@@ -613,6 +637,13 @@ def test_custom_response_headers_added_in_server_span(self):
613637
"http.response.header.my_custom_header": (
614638
"my-custom-value-1,my-custom-header-2",
615639
),
640+
"http.response.header.my_custom_regex_header_1": (
641+
"my-custom-regex-value-1,my-custom-regex-value-2",
642+
),
643+
"http.response.header.my_custom_regex_header_2": (
644+
"my-custom-regex-value-3,my-custom-regex-value-4",
645+
),
646+
"http.response.header.my_secret_header": ("[REDACTED]",),
616647
}
617648
self.assertSpanHasAttributes(span, expected)
618649

0 commit comments

Comments
 (0)