-
Notifications
You must be signed in to change notification settings - Fork 1.4k
gvisor prevents AMQP sockets from opening (TCP_SYNCNT) #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I believe the real reason this is failing is that we don't support SO_DEFER_ACCEPT. I1219 22:53:14.348108 2791 x:0] [ 3] python E setsockopt(0x5 socket:[3], 0x1, 0x9, 0x7faf9cffb600, 0x4) I1219 22:53:14.348725 2791 x:0] [ 3] python E close(0x5 socket:[3]) That said its rather strange that its trying to set SO_DEFER_ACCEPT on a connecting socket and not a listening one. |
Just stumbled on it as well with Celery. Do I understand it right that this syscall is simply not implemented, i.e. it's not a matter of a missing capability that could be granted to the container? Thanks! |
SO_DEFER_ACCEPT is now supported. But thats likely not the reason why it's failing. Could you try again and if it fails provide a full strace log and repro steps. |
Actually nm I see the steps above. Let me try then and see . |
Not sure if that gives any insight on the issue, but I'm getting the exact same warning with my Elixir application. |
Hey guys, I'm just fronting the same problem. Any light on it ? |
Apologies this fell of my radar. Let me spend sometime to look at this right now. |
It doesn't look like an issue with running the image. It fails to connect and from the error it seems like its failing to resolve the amqp:// address. Let me take a look docker run --runtime=runsc-d-p 8080:8080 celery The pickle serializer is a security concern as it may give attackers If you depend on pickle then you should set a setting to disable this
You must only enable the serializers that you will actually use. warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED)) [2020-05-06 15:39:38,857: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@rabbit:5672//: [Errno -2] Name or service not known. [2020-05-06 15:39:43,046: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@rabbit:5672//: [Errno -2] Name or service not known. |
Please note that the issue you are seeing with name resolution is not the
one I reported. Name resolution within the container is not the challenge
we are seeing with the failing syscall.
On Wed, May 6, 2020 at 8:41 AM Bhasker Hariharan ***@***.***> wrote:
It doesn't look like an issue with running the image. It fails to connect
and from the error it seems like its failing to resolve the amqp://
address. Let me take a look
docker run --runtime=runsc-d-p 8080:8080 celery
Unable to find image 'celery:latest' locally
latest: Pulling from library/celery
ef0380f84d05: Pull complete
ada810c79ed7: Pull complete
4608a1c4fe47: Pull complete
58086cbb21fb: Pull complete
a7bccb4a3faa: Pull complete
9de06a08ec25: Pull complete
ad6feb8c6a6b: Pull complete
7568ca85d492: Pull complete
2d6f458f7411: Pull complete
Digest:
sha256:5c236059192a0389a2be21fc42d8db59411d953b7af5457faf501d4eec32dc31
Status: Downloaded newer image for celery:latest
[2020-05-06 15:39:34,841: WARNING/MainProcess]
/usr/local/lib/python3.5/site-packages/celery/apps/worker.py:161:
CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.
The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.
If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
You must only enable the serializers that you will actually use.
warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
[2020-05-06 15:39:36,647: ERROR/MainProcess] consumer: Cannot connect to
***@***.*** <https://github.com/rabbit>:5672//: [Errno -2]
Name or service not known.
Trying again in 2.00 seconds...
[2020-05-06 15:39:38,857: ERROR/MainProcess] consumer: Cannot connect to
***@***.*** <https://github.com/rabbit>:5672//: [Errno -2]
Name or service not known.
Trying again in 4.00 seconds...
[2020-05-06 15:39:43,046: ERROR/MainProcess] consumer: Cannot connect to
***@***.*** <https://github.com/rabbit>:5672//: [Errno -2]
Name or service not known.
Trying again in 6.00 seconds...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1441 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOXCTHK2VKZXKDDJU2SCMTRQGAKDANCNFSM4J5SIQCQ>
.
--
-- Matt
|
I started up a rabbitmq server locally and it works. Here are the commands docker run -d --hostname localhost --name some-rabbit rabbitmq:3 docker run --runtime=runsc --link some-rabbit:rabbit -p 8080:8080 celery The pickle serializer is a security concern as it may give attackers If you depend on pickle then you should set a setting to disable this
You must only enable the serializers that you will actually use. warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED)) |
@mcowger hmm I scanned the logs for the repro I tried and I don't see celery making that call at all. Could you provide a bit more details on your rabbitMQ setup and how you are connecting to it. |
If you aren’t actually submitting jobs to the AMQP server via celery (like
the reproduction instructions above) then the test is not complete. The
issue is not running celery under docker, the issue to issuing AMQP message
to celery from python running under runc. There is a clear sample code to
reproduce in the gist linked above.
On Wed, May 6, 2020 at 8:52 AM Bhasker Hariharan ***@***.***> wrote:
I started up a rabbitmq server locally and it works. Here are the commands
docker run -d --hostname localhost --name some-rabbit rabbitmq:3
docker run --runtime=runsc --link some-rabbit:rabbit -p 8080:8080 celery
[2020-05-06 15:50:19,235: WARNING/MainProcess]
/usr/local/lib/python3.5/site-packages/celery/apps/worker.py:161:
CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.
The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.
If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
You must only enable the serializers that you will actually use.
warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
[2020-05-06 15:50:21,965: WARNING/MainProcess] ***@***.*** ready.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1441 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOXCTEIZCQWTO3WZ4VR5RTRQGBSDANCNFSM4J5SIQCQ>
.
--
-- Matt
|
Ah apologies I read the report again so its not celery that's failing but your python application that is connecting to celery? |
Let me try that sorry about that. |
I0506 16:21:29.306868 175117 strace.go:628] [ 3] python X getsockopt(0x5 socket:[19], SOL_TCP, TCP_DEFER_ACCEPT, 0x7fc1f83fb3c0 {value=0}, 0x7fc1f83fb3c4 {length=4}) = 0x0 (227.514µs) Looks like its now not failing on TCP_DEFER_ACCEPT but on another socket option that we don't support yet. TCP_SYNCNT. |
TCP_SYNCNT (since Linux 2.4) Netstack doesn't control retries using a count of SYN's but its just bounded by time. Let me stub out this option for now so that it doesn't return an error and see if it makes progress. |
Strange now I stubbed that out and it flunks in another getsockopt TCP_WINDOW_CLAMP but now I don't see it making the TCP_SYNCNT call.
|
Okay stubbing those two out seems to have fixed it. I now see packets going back and forth w/ AMQP (that said the curl request does not return.) But that seems unrelated to a gvisor issue and probably just that something about my local setup of rabbitmq+ celery is probably not correct. I see packets like this going back and forth.
|
I will open two bugs to add TCP_SYNCNT and TCP_WINDOW_CLAMP. I think we can probably start by just stubbing them out to unblock this and then look at implementing it properly. |
|
Unless celery is setup with workers (which it isn't by default) thats expected behaviorm, as there's no workers to complete the task request. I agree that the curl request isn't relevant to this issue, and that if you've get messages going in and gettng accepted then its probably good to go. |
@prattmic pointed the following out, Looks like AMQP queries TCP options to figure out which ones are supported on what platform so it just sort of skips the setsockopt if a particular one is not supported. https://github.com/celery/py-amqp/blob/ccbe683cfd30aef75118223088984f7c9ace43e4/amqp/transport.py#L215 |
I believe this should now work. @mcowger Could you try and update the bug? |
Unfortunately I dont have the toolset setup/required to build gvisor from source, so I'll just accept your results. thanks! |
May we know approximately when this fix could be available in GKE Sandbox? Thanks! |
Typically GKE sandbox should get the update in the next 2-3 weeks. |
Does this timeline apply to Cloud Run as well? |
Typically yes but please note these are estimates, actual rollout times do vary a bit. We gVisor team have no real control over the rollout processes and CloudRun and GKE teams decide when they pick up a particular gVisor version based on the stability of their corresponding systems. |
Thanks, a much appreciated insight! |
When running a simple application to submit jobs via AMQP to a Celery server, gvisor prevents the sockets from being opened to send the data. Debug logs are attached.
Base simple application that listens via HTTP and makes request, along with python requirements and Dockerfile: https://gist.github.com/mcowger/7d4ab07a75dc1ddddd1f1fb20dc5d8fc
When running this compiled docker image under regular containerd, it runs fine. When run under the runsc runtime, it fails. When run under Google Cloud Run (where I actually first encountered the issue), the error is easier to see (though gvisor tries to disclaim responsibility):
Container Sandbox: Unsupported syscall getsockopt(0x5,0x6,0xa,0x3ee2d2dfb3c0,0x3ee2d2dfb3c4,0x0). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/getsockopt for more information.
Other useful information to include is:
runsc -v
runsc version release-20191213.0
spec: 1.0.1-dev
docker version
ordocker info
if more relevantServer: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:24:29 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
uname -a
-git describe
Linux devbox 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux
runsc.bug.zip
The text was updated successfully, but these errors were encountered: