-
Notifications
You must be signed in to change notification settings - Fork 267
Critical error in FlowLog code #1931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't know why is the ec2 API returning an empty FlowLogs array.. Do you see any errors returned by the create step? Also could you confirm whether the resource is getting created or not? cc-ing @nnbu since he worked on this feature. |
Thanks for looking into this @a-hilaly @nnbu
|
Could you change the log level to debug @mattzech ? I believe the API might be returning an error. |
But still very weird that we reach L251 with 0 elements in the FlowLogs array |
I will check further. But if there is not error at L212 , then resource should have been created. Could you also try executing corresponding CLI command to create flowlog and see if you get the flowlog in the response (Of course, edit this with the parameters that you provided) (Taken from https://docs.aws.amazon.com/cli/latest/reference/ec2/create-flow-logs.html ) |
I am assuming VPC and logDestination exist |
Hi @nnbu yes it is very strange that is not hitting the error from L212. {
"ClientToken": "123456789NYzyP0I4Zd9kSRul7eqDlBXPkCGsCFZw=",
"FlowLogIds": [
"fl-0001112223334445555"
],
"Unsuccessful": []
} |
Is the issue reproducible @mattzech ? i'm wondering whether we hit an API hiccup. Also looks like the response has a |
Yes the error is reproducible. Looking at the logs I sent earlier, Debug mode seems to be on already. I tested the command with a nonexistent bucket to see how the {
"ClientToken": "123456789P0I4Zd9kSRul7eqDlBXPkCGsCFZw=",
"FlowLogIds": [],
"Unsuccessful": [
{
"Error": {
"Code": "400",
"Message": "LogDestination: bad-bucket-test does not exist"
},
"ResourceId": "vpc-000112233445566"
}
]
} |
If the error reporting could print the |
What's the HTTP code sent with this response? I assume 200? {
"ClientToken": "123456789P0I4Zd9kSRul7eqDlBXPkCGsCFZw=",
"FlowLogIds": [],
"Unsuccessful": [
{
"Error": {
"Code": "400",
"Message": "LogDestination: bad-bucket-test does not exist"
},
"ResourceId": "vpc-000112233445566"
}
]
} |
This makes more sense now. We should rely on |
This one returned 200. I tried with incorrect permissions and that request returned an error with a 254 Response Code |
resp, err = rm.sdkapi.CreateFlowLogsWithContext(ctx, input)
rm.metrics.RecordAPICall("CREATE", "CreateFlowLogs", err)
if err != nil {
return nil, err
}
if resp.Unsuccessful[0] != nil {
return nil, <error message from response>
} I think checking both |
@mattzech Is your |
I dont think, returning early is correct approach here. We should just skip L251 in case |
It exists, I was able to successfully execute the |
ok. In that case, even after avoiding this crash, your problem would not be solved. Because the resource is not getting created anyway. |
If the error message showed the message inside the |
Additionally, the panic is causing the controller pod to crash and restart every time it gets to this point. So avoiding the index out of bounds error would be beneficial to allow the reconciler to continue |
@nnbu doesn't this mean that resource creation encountered an issue and that we should retry or set a terminal condition? |
Or, maybe, we could return a re-queue error to force recreation after few seconds? |
Issues go stale after 180d of inactivity. |
Stale issues rot after 60d of inactivity. |
Rotten issues close after 60d of inactivity. |
@ack-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issue #, if available: [#1931](aws-controllers-k8s/community#1931) Description of changes: While creating FlowLog, if logDestnation does not exist then CreateFlowLogsWithContext does not return the error. However, Unsuccessful field in the response is set. FlowLog is not created in aws. Due to this, controller crashes while accessing the flow log id from the response (resp.FlowLogIds[0]). Fixing the issue by checking if FlowLogIds has valid length before accessing it. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Describe the bug
Any time a FlowLog. resource is created, the controller crashes with the following error message:
I can propose a change in the ec2-controller repository pointing out the problem
Steps to reproduce
Create a FlowLog with basic spec
Expected outcome
A FlowLog created successfully
Environment
Openshift 4.13.12
The text was updated successfully, but these errors were encountered: