-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
test_asyncio: test_subprocess_consistent_callbacks() fails randomly #108973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agreed. Feel free to suggest a PR that fixes this. |
@kumaraditya303: It looks wrong that the "process_exited" event is reported after "pipe_data_received" event. Any idea on how to make sure that events are delivered in the right order? First, I wanted to just tolerate events in any order in the test. But then I realized that maybe something is wrong here. |
"process_exited" should be the last event, also I am unable to reproduce it on any of my own testing, I guess something is special about freebsd but I don't have bandwidth to investigate for such platforms. |
So far, I didn't manage to design a reliable way to trigger the bug. I will have a look at it later. I wanted to open an issue to keep track of it. There are many unstable tests these days in Python, and it annoys me to have flaky CI. |
Oh sure, it's perfectly fine if you don't have the bandwidth for that. |
If you get a reliable way to trigger it on Linux then I will surely look into it. |
Variant on FreeBSD:
build: https://cirrus-ci.com/task/6142659970990080?logs=test#L878 |
One way to reproduce reliably is to have a child process that exits faster than #97009 is insufficient. There is no timing guarantee, and delaying by two event loop iterations only prevents the error to be observable in most cases. I'm unsure what asyncio wants to promise here. Should we call While the bug itself is not serious, if user code is written following the same pattern as A separate bit of inconsistency is to expect the two
In fact I'm surprised that the two data received are in a consistent order (stdout, then stderr) across thousands of runs. That's also something that shouldn't be guaranteed. I suspect it's an artifact of the order in which the waiters are registered. To be pedantic, it's also not guaranteed that all data on a pipe is readable in a single call. For the purposes of this test it's unlikely to ever matter. In short:
|
Just for fun, I switched the order and I could see that every once in a while stderr comes first :) Best use of a multi-core machine is to have fun with accidental expectations of causality! |
Issue seen on s390x RHEL7 LTO 3.x: https://buildbot.python.org/all/#/builders/402/builds/5394 |
Also seen on:
|
Subprocess events can be delivered in any order: tolerate that in the case.
Subprocess events can be delivered in any order: tolerate that in the case.
Subprocess events can be delivered in any order: tolerate that in the test.
I started to write a fix to ensure that events are always delivered in a specific order. For example, create 4 queues of callbacks:
The problem is that asynchronous programming is hard, system programming is hard, and as @sorcio wrote, multi-core CPUs, multithreading, and different event loop implementation make everything less deterministic. My worry is if that for some reasons one of these events is not delivered, the other ones are queue... maybe forever :-( When a process completes, I want to know it as soon as possible. Getting its pipe output is honestly secondary. Maybe the process was killed by a signal. Maybe something bad happens, who knows?
While it's technically possible to ensure that events are delivered in a determistic way... I'm not sure that it's worth it :-( The drawbacks seem to be expensive and risky to me. Thanks @sorcio to digging deep into the issue and for your long analysis! I'm now more confident that the test... should just endorse that async programming make such events less determistic and tolerate that these events can be delivery in any order. I wrote PR #109431 for that. |
Is everyone here aware that "process exited" and "pipe closed" are orthogonal events? The process that exits may have shared the file descriptor with a child process that continues to exist. Or the process could close the file descriptor without exiting. I don't think we should enforce or require an order between such events. |
It just kills my idea of ordering events.
Ok, so we are on the same page: my PR #109431 fix the test by accepting that events are not ordered. With my change, the only thing which is tested is that we get all events, in any order. |
In addition to the change to the test, it would make sense to revert #97009. It introduces unnecessary delay. The issue it was intended to fix was not diagnosed correctly and is not a bug. As I said before (see also Guido's comment), a program should not depend on The original issue also points to the example in the docs which is not correct because it makes the same wrong assumption. |
Done in my (updated) PR.
Example fixed in my (updated) PR. |
Subprocess events can be delivered in any order: tolerate that in the test. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
Subprocess events can be delivered in any order: tolerate that in the test. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
…ythonGH-109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. (cherry picked from commit ced6924) Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Davide Rizzo <[email protected]>
gh-108973: Fix asyncio SubprocessProtocol doc (#109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the example in the doc. Co-authored-by: Davide Rizzo <[email protected]> (cherry picked from commit ced6924)
…ython#109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
…H-109431) (#109609) gh-108973: Fix asyncio test_subprocess_consistent_callbacks() (GH-109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. (cherry picked from commit ced6924) Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Davide Rizzo <[email protected]>
…ython#109431) SubprocessProtocol process_exited() method can be called before pipe_data_received() and pipe_connection_lost() methods. Document it and adapt the test for that. Revert commit 282edd7. _child_watcher_callback() calls immediately _process_exited(): don't add an additional delay with call_soon(). The reverted change didn't make _process_exited() more determistic: it can still be called before pipe_connection_lost() for example. Co-authored-by: Davide Rizzo <[email protected]>
The following test_asyncio test is unstable and fails randomly on buildbots. I saw failures on Linux and FreeBSD.
build: https://buildbot.python.org/all/#/builders/442/builds/4900
Linked PRs
The text was updated successfully, but these errors were encountered: