-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Missing credentials in config
happening intermittently
#692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing credentials in config
happening intermittently
#692
Comments
@davidporter-id-au It looks like the EC2 metadata service is throttling requests from your code. The SDK itself does cache credentials fetched from the metadata service, so multiple simultaneous requests don't bombard the metadata service. See #448 Is your code part of a shell script that is invoked in a loop of some sort? Hitting the metadata service multiple times in succession can cause the requests to be throttled. |
We've been seeing the same issue when our EC2 instances are under heavy load and the application must make many requests to S3 within a short time. We're also using IAM roles applied to EC2 instances, and there are no other applications, cron jobs, or scripts other than a single node.js instance which is using the latest AWS SDK (2.1.49). Sample error message:
|
@AdityaManohar So your point about the endpoint being throttled was my first thought. Regarding the script starting it regularly, no, it's a (koajs) webserver, so it starts and runs indefinitely. I put a I verified this also by intentionally creating a worst-case scenario: We have since also discovered that a delayed retry appears to resolve the issue. However, this is a kludgy workaround rather than something I'd like to rely on. |
I tracked down a detailed error message for my case: {
"message": "Missing credentials in config",
"code": "CredentialsError",
"time": "Thu Sep 03 2015 17:17:33 GMT+0000 (UTC)",
"originalError": {
"message": "Could not load credentials from any providers",
"code": "CredentialsError",
"time": "Thu Sep 03 2015 17:17:33 GMT+0000 (UTC)",
"originalError": {
"message": "Connection timed out after 1000ms",
"code": "TimeoutError",
"time": "Thu Sep 03 2015 17:17:33 GMT+0000 (UTC)"
}
}
} I see we're getting a connection timeout when trying to load credentials instead of a connection refused. That might be a different issue, even though the top level error is the same. |
I'm seeing exactly this issue too. It is easily reproduced by simply having a script that gets a bunch of processes to create a heap of aws-sdk instances and then consume an api endpoint on them. (I realise that this is not a realistic situation, but it allows an intermittent issue to be reproduced reliably) The example code that I use to reproduce this on a t2.micro instance is:
If I invoke this code from 10 different node processes simultaneously, then I can pretty much guarantee that the error will be raised (returned in the err on There is a bigger problem associated with this situation however. I have found that after encountering the issue: a) The EC2 instance becomes unreliable and typically is pretty much a write-off. Usually I cannot SSH into the machine, and the only recourse has been to terminate (even restart often fails). b) The biggest issue of all: Even though the EC2 instance is effectively dead and unreachable, The EC2 console still reports it as healthy, AND therefore any autoscaler that instantiated the instance is unaware of the failure, and does not therefore replace the instance. In my use case, I'm using an autoscaler group with desired = 1 to ensure failover on my instances. Due to this issue I CANNOT rely on instance monitoring on autoscalers. It occurs to me that the resolution to this problem ought to be relatively trivial in the aws-sdk (surely just an incrementally backing-off retry on retrieving the credentials), but I'm concerned that the EC2 instance issues I'm seeing associated with this issue are symptomatic of a deeper underlying bug in the credentials endpoint code on the instance itself. |
If you are spawning multiple Node.js processes you are more likely to be throttled by the EC2 metadata service. The SDK itself will cache credentials after the first fetch. It looks like some of the other issues that you are having are related to EC2 instance itself and not the SDK. I would recommend opening up an issue on the Amazon EC2 Forum. In the meantime, we can definitely look at adding retries and exponential back-off to the EC2 metadata service requests. |
Yep, I understand why I see the issue, I built the scenario explicitly to expose it! The simple facts: it is possible, in fact innevitable, using only AWS products (EC2 & the SDK), to bring an EC2 instance to its knees. Above are outlined the exact steps to reproduce the situation. What's frustrating to me, as a customer, is the difficulty I'm having raising this as a bug report. I guess I assumed that there would be internal process to route it to the appropriate place, but instead I keep getting redirected myself. |
@stemail23 Just as an aside, the healthcheck behaviour you're seeing is - I think - expected. You need to switch your autoscaling-group to use @AdityaManohar We have had some success in addressing the issue with crude retries. If the SDK were able do this without intervention, while also handling backoff that would be good. |
We've also been hit by this today intermittently on code that was working fine before... |
Thanks for the suggestion. Unfortunately, in my case, I don't have an ELB in the equation on these instances (they're job handler machines lifting messages from SQS). I'm exploring other options where I have a monitor machine attempting to recognise the dead instances and terminating them, but it's frustrating to have to expend this effort! |
Exactly, which is why I suspect that some change in EC2 is complicit in the situation, rather than being solely an AWS-SDK issue. |
I suppose there are two issues: There is the single-point-of-failure this reveals in the SDK for this kind of authentication and there's the probable infrastructure problem we're seeing where the metadata endpoint is subject to transient failure. For the latter I had created a ticket, but let it expire. I'll follow that up. @stemail23 @seriousben I notice you're responding when I am. Are you |
@davidporter-id-au Yes, I'm in Sydney |
@davidporter-id-au - us-east for us. |
@davidporter-id-au @stemail23 @seriousben var AWS = require('aws-sdk');
AWS.config.credentials = new AWS.EC2MetadataCredentials({
httpOptions: { timeout: 4000 }
}); This should help alleviate some of the issues with a slow responding metadata service. |
We started seeing the issue this week as well. For us, it happened when we updated our Node install to version 4.2.2 instead of 0.10.17. Our process runs on a cron tab every 15 minutes and sends about 20K messages to SQS. With 0.10.17, we ran with no issues. Within 30 minutes of updating to 4.2.2 we started seeing the intermittent issues. In both cases, we had the same 2.2.18 version of the SDK. A similar issue was discussed here in the past: #445 @willwhite and @mick, have you seen any similar issues since your update was added to the SDK? |
This started happening for us recently. Sporadically when uploading to S3 from the nodejs SDK (v2.2.11 and 2.2.33) we would get the same error posted in #692 (comment). Increasing the timeout to 4000 ms didn't fix it; increasing it to 10000 ms did. We're also not hammering the endpoint (in fact our test server was making a single request at a time) -- it seems like it's a laggy metadata provider endpoint given that the timeout alleviates it. |
Also having this issue +1 |
We fixed this by using only one instance of the sdk. |
We are seeing this too. Can't be a throttling issue, it is on a staging instance that is only hit a few times per hour. Additionally, it is happening at application startup, so the server never starts. |
Also experiencing this issue. I tried increasing the timeout to 10 seconds to no avail. @bbarney have you found any workarounds? I am experiencing the same issue on startup, every single time. |
Also having this issue |
Having the same problem here. Problem comes and goes. Especially happens when I register a new user and log them in. |
Having the same issue. Increasing timeout didn't help |
Same issue while using SQS for us. We're using a single instance of SDK object. |
Same here, single instance of SDK, still problems. |
Same here, it is happening with S3... Strangely enough, it works on Ubuntu but not Mac, I'll have to check my network settings. |
We have an application running as a CRON job executed every 10s on an EC2 instance, and we see this issue very frequently. Since the application runs for about 3-4s every 10s, we "request" AWS-SDK each time we start the app. Is there any way around this issue for a scenario like this? |
You could change this code to start your loop only after |
Indeed, that's essentially what I did. I have updated my comment above with the solution. |
Making sure |
Oddly enough, this is still happening to me even when specifying the credentials in environment variables. |
We're seeing this too now on apps running in node under ECS (using role credentials of course). Is there any signs this will be fixed in the future? |
Hi, We are actively still looking at this issue and appreciate your patience. @dnorth98 Is the error you're getting when running on ECS the same "missing credentials in config" error and is it also intermittent? Can you confirm that the SDK is hitting the ECS credential endpoint rather than hitting the EC2 Metadata service? Thanks |
@LiuJoyceC We are getting the same credentials error (it's actually when making a dynamoDB call)
Regarding how we get the credentials, we're not hitting the metadata service directly. We're just initializing the dynamo object without passing in specific credentials (ie. use the role creds). |
Hi @codan84 The reason for this is due to the implementation of the @stemail23 That said, even though only one request to Metadata Service can be in flight at a time, it is possible to hit the Metadata Service too many times in a short time span (you can keep hitting the Metadata Service as soon the previous response comes back). Given that this error is intermittent ( @dnorth98 mentioned that is happens about 1 out of 1000 times), implementing retries with exponential backoff would likely resolve the problem, as it is unlikely that the 2nd or 3rd try will get the error again. I am actively working on that now and will provide an update when it is finished. Since it was reported above that this problem has also occurred on ECS, I can also implement the exponential backoff in the |
@LiuJoyceC Thanks for the feedback. I was able to reliably reproduce the issue with the code I provided, but I admit, I haven't looked into it since, so it's possible that things have changed since then. I notice though that you don't mention running multiple processes however, so perhaps that indicates why you couldn't reproduce? To reproduce the issue I needed to run the provided script up to ten times concurrently. Thanks for looking into the issue. Hopefully you'll have some success with backed off retries, and hopefully the suggestions above might help you test a fix if you can reproduce the problem. Cheers! |
Hi, The PR for retrying
If that still doesn't work, please let me know! |
Thanks @LiuJoyceC |
So this is still happening in v2.6.9 on an EC2 instance (utilizing elastic beanstalk). {"message":"Missing credentials in config","name":"CredentialsError","stack":"Error: connect ECONNREFUSED 169.254.169.254:80\n at Object.exports._errnoException (util.js:874:11)\n at exports._exceptionWithHostPort (util.js:897:20)\n at TCPConnectWrap.afterConnect as oncomplete","code":"CredentialsError"} |
Ran into this issue locally - was due to some shenanigans with Fix was to manually pass in |
I just hit this issue on AWS ECS (Elastic Container Service) which requires
|
@LiuJoyceC Should this credentials timeout configuration be created once per |
Still happening for me in ECS with aws-sdk version 2.270.1 and node.js version 10.11 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
We've been having some difficulties in working that the SDK is intermittently unable to fetch credentials and this renders our application unauthorised. The ec2 where this is occurring has a particular IAM role and the SDK is therefore reaching out to the metadata endpoint (169.254...) to fetch it's keys. However, when it does so it occasionally appears to throw this type of error:
So, for example this dynamoDB was logged by our application with an SDK error:
More recently, this S3 call had this similar error:
We've experienced the problem with multiple applications intermittently, but as frequently as half a dozen times per day on a single ec2. We're using NodeJS aws-sdk version 2.1.46 in the example above and iojs 2.3.1 here, nodeJS 0.12.x elsewhere. We're in the
ap-southeast-2
region.While it would appear that the connection's being refused, I'd be surprised to see this endpoint actually go down. Is it possible we're doing something stupid with node to create this, or else possibly there be a genuine issue?
The text was updated successfully, but these errors were encountered: