-
Notifications
You must be signed in to change notification settings - Fork 107
[Bug] Timeout happening even though the function fully runs #1552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @rokcarl, thanks for reporting. I tried replicating with a sample timer trigger function and don't encounter the timeout issue. The timeout issue can occur though if there's an exception that occurs somewhere within the function app code. Looking at the logs you provided, the timeout error occurred for the first execution, the timer trigger function. Also, while the logs show 8 " Debugging is a little tricky here since there are multiple components involved, but here are a couple suggestions: |
This sounds like a bug? Why would a function time out, what is the runtime waiting for if the code breaks?
There were, but I chose not to show every log line.
That sounds counterintuitive and definitely not the best way to debug an app. I would think I'd be able to debug locally?
Thanks, I'll try that and report back. |
Did what you suggested, it only solidified my assumptions. new Python code with more logging, less orders
new logs
As you can clearly see, the following happens:
|
Thanks for trying this out. I tried to repro with a more simplified example, but I'm not getting the timeout error. Code:
Logs:
One thing that might be a bit different in this example:
One other thought:
|
I am using
So similar to your code, except for encoding.
Yes, my app is I tried creating a smaller example but failed: function_app.py
output
|
The thing that scares me the most about all of this is that we're in the "let's try this or maybe that" mode of debugging. Imagine having a production workload, a complex app, and then a bug that is debugged using this approach. |
When I didn't encode the message, I saw an error. The execution failed though; it wasn't a timeout. I'm assuming you've formatted the input differently in your code than my simple example so you're not hitting this but wanted to bring that up just in case.
From the logs provided, the I'm still a little confused about the |
To make encoding work, I checked a bunch of Github issues and Stack Overflow. In the end, the only thing that fixed it for me was to set the message encoding to
Sorry I wasn't clear, I meant I failed to reproduce the problem, i.e. everything was working as it should, the timeout in that case is expected, since I sleep for more than the timeout in the queue trigger function.
Whoops, sorry, this was meant to be |
Since the timeout isn't reproduced with the simpler example, I'm leaning towards the issue being caused from one of the other functions being called inside. The timer trigger function itself isn't returning anything, but it will only finish executing when all of the code inside also finishes. The last log line can still execute even if the function as a whole times out, and that would be more of a python related question than functions. Are any of the other methods async, concurrent, blocking, relying on external dependencies, or something similar? For debugging, I would recommend incrementally adding back in the more complicated logic to identify after which step the timeout occurs. |
As we already saw, the timer trigger function does two things:
From that, I don't see a mechanism where some "other functions being called" would cause a timeout in this timer trigger functions. You mention any other async, blocking, etc. methods. But the
I did already try this. But going with some form of bisection on the complex codebase without basic debugging tools, I might as well move to FastAPI + AKS and call it a day. I've already moved away from the promise of Azure Durable Functions as I've had so many bugs that I've encountered (one, two, three). I would expect to be able to debug such issues of timeouts some way. If I spend days now on hunting this one down using bisection, then I know this will happen again, but at that point, we'll be in production and this is basically an outage. |
I was referring to the There are other methods for debugging that might provide assistance as well: |
But just to indulge, I did try with the reduced code, timeout set to 30s, timeouts still occurring. function_app.py
log
|
Apologies for the delay. I was able to repro this scenario. The main issue here is the mixing of and interaction between sync (timer) and async (queue) functions, which isn't recommended. I'll see if we can get this highlighted in our documentation more. Sync and async executions are handled differently. Sync calls are handled through the threadpool, and async are handled through a loop. The worker runs in the loop and only on one thread. When the worker receives an invocation request for the queue trigger, which is async, it serves as a blocking call and control switches to that call. Control isn't returned to the sync invocation to complete, so while the worker is waiting on the queue trigger functions to finish executing, the timer trigger function can't finish and return. This results in the timeout. To address this issue, I would recommend converting the timer function to be async. azure-queue-storage-aio will help with uploading messages to the queue asynchronously. If this function needs to be synchronous, another solution would be to create a separate function app for the timer trigger. |
The team did try with both async and the problem persisted. While I don't consider this solved and I think this is an active problem you have, we are in the process of moving to Azure Container Apps. |
Expected Behavior
I expect the function to run without a timeout.
Actual Behavior
The function timeouts.
Steps to Reproduce
Relevant code being tried
Relevant log output
requirements.txt file
Where are you facing this problem?
Local - Core Tools
Function app name
No response
Additional Information
As you can see from the log, the report found 8 orders and then 8 times saved it to the database, processing all the orders. This all happens in 6 seconds, then 10 minutes later it times out (have 10 min in
hosts.json
).Curiously, the processing of the orders happens through a queue trigger and 3 of the 8 orders also timeout after 10 minutes, but I decided to show this example where it's pretty clear from the code that everything ran.
I would love to give a smaller example, but I don't see a way for me to debug this and figure out what is causing the timeout, so I'd be able to replicate it on a smaller scale.
The text was updated successfully, but these errors were encountered: