Description
Issue: We have recently converted one of our Flask+Gunicorn based service from Python2 to Python3(Py23 complaint code, minor syntactical changes). Post deployment, we are often observing API calls to this service getting stuck. This is not the issue of loaded service as MEM/CPU utilization remains low.
Did some research on the error but not very helpful to resolve the issue. We do monkey patch at very beginning of service main file.(from gevent import monkey; monkey.patch_all())
Setup details
- Python 3.9.16
- gunicorn==19.9.0
- gevent==23.9.1
- Deployed on AWS ECS
Few observations from the issue
- We could not reproduce this in QA environment, happens only in production.
- Occurrence is random. We observed this in least peak hours too.
- Memory grows over time, probably as a side effect to this issue.
- In python2 we never faced this issues.
- We do use ThreadPoolExecutor for some of our API implementation for calling downstream services parallely.
with ThreadPoolExecutor(max_workers=8) as executor: list(executor.map(lambda args : call_apis(*args), [ for args in argList]))
- Most of the time issue happens, there is a Gevent traceback in logs. Have have not seen this traceback in python2.
- `{"log":"\tHub: <Hub '' at 0x7f5b90267680 epoll default pending=0 ref=1 fileno=6 resolver=<gevent.resolver.thread.Resolver at 0x7f5b9022b2e0 pool=<ThreadPool at 0x7f5b90264660 tasks=0 size=2 maxsize=10 hub=<Hub at 0x7f5b90267680 thread_ident=0x7f5b9311fb80>>> threadpool=<ThreadPool at 0x7f5b90264660 tasks=0 size=2 maxsize=10 hub=<Hub at 0x7f5b90267680 thread_ident=0x7f5b9311fb80>> thread_ident=0x7f5b9311fb80>"}
Traceback (most recent call last):
File "/usr/local/pyenv/versions/3.9.16/lib/python3.9/site-packages/gevent/monkey.py", line 868, in _shutdown
orig_shutdown()
File "/usr/local/pyenv/versions/3.9.16/lib/python3.9/threading.py", line 1447, in _shutdown
atexit_call()
File "/usr/local/pyenv/versions/3.9.16/lib/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/usr/local/pyenv/versions/3.9.16/lib/python3.9/threading.py", line 1060, in join
self._wait_for_tstate_lock()
File "/usr/local/pyenv/versions/3.9.16/lib/python3.9/threading.py", line 1080, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
File "/usr/local/pyenv/versions/3.9.16/lib/python3.9/site-packages/gevent/thread.py", line 112, in acquire
acquired = BoundedSemaphore.acquire(self, blocking, timeout)
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 259, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
`