Skip to content

patch upstream ec2 and eks resource detectors #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 1, 2024
Merged

patch upstream ec2 and eks resource detectors #91

merged 4 commits into from
Mar 1, 2024

Conversation

srprash
Copy link
Contributor

@srprash srprash commented Mar 1, 2024

Issue:
The upstream resource detector package is quite outdated since the last release of v2.0.1. And there is no plan for a new release for now.
https://github.com/open-telemetry/opentelemetry-python-contrib/commits/main/sdk-extension/opentelemetry-sdk-extension-aws

In our distro we want to consume this particular commit where the timeout for EC2 and EKS resource detectors were significantly shortened to 5 seconds. The current timeouts are quite long (1000s and 2000s respectively) and may cause application fail to start in certain cases.

Description of changes:
Using the existing mechanism of temporarily patching the upstream functions, I'm adding patches for the _aws_http_request methods in the upstream.

Testing:
I couldn't come up with a way of writing unit tests to see if the patching works since the only change in the patched method is the timeout. Mocking the urlopen() to return a custom response from the patch didn't work as the mock would return the same for the original method too.
I did test the change with a sample app where I added a log statement "Hello from the patch_ec2_aws_http_request" to the patch to ensure it was being applied.

(venv) ➜  adotDjangoTest opentelemetry-instrument --traces_exporter console --metrics_exporter none --service_name my_service python manage.py runserver --noreload 8000
Configuration of configurator not loaded because aws_configurator is set by OTEL_PYTHON_CONFIGURATOR
Hello from the patch_ec2_aws_http_request
AwsEcsResourceDetector failed: Missing ECS_CONTAINER_METADATA_URI therefore process is not on ECS.
Failed to get k8s token: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/token'
AwsEksResourceDetector failed: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/token'
AwsEc2ResourceDetector failed: <urlopen error [Errno 65] No route to host>
Performing system checks...

System check identified no issues (0 silenced).

You have 18 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.
March 01, 2024 - 17:20:36
Django version 5.0.2, using settings 'adotDjangoTest.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@srprash srprash requested a review from a team March 1, 2024 17:51
Copy link
Contributor

@thpierce thpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing:

  1. Did we do an E2E test to ensure that we no longer hit the 1000s timeout? Sounds like it was trivially reproduced in certain contract tests, right?
  2. I can see why unit tests would be tricky. I think it's ok to skip for this time. Lets make sure we have a backlog item (at some priority) to get upstream to merge these changes into latest release.

@srprash
Copy link
Contributor Author

srprash commented Mar 1, 2024

  1. Did we do an E2E test to ensure that we no longer hit the 1000s timeout? Sounds like it was trivially reproduced in certain contract tests, right?

I did test with varying timeout values in local and the wait time was indeed shorter. Will also verify the change with @XinRanZhAWS since he was getting this issue.

Lets make sure we have a backlog item (at some priority) to get upstream to merge these changes into latest release.

Upstream is not maintaining this package and it is in ADOT team's backlog: open-telemetry/opentelemetry-python-contrib/issues/1899

@XinRanZhAWS
Copy link
Contributor

AwsEc2ResourceDetector failed: timed out
Configuration of configurator not loaded, aws_configurator already loaded
(contract_test_base.py:111)
PASSED

Copy link
Contributor

@XinRanZhAWS XinRanZhAWS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional

@srprash srprash merged commit 2ddee07 into aws-observability:main Mar 1, 2024
@srprash srprash deleted the patch-resource-detectors branch March 1, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants