Skip to content

[Bug] Long time uptime workers lost the connection with RabbitMQ server #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
camiloiglesias96 opened this issue Aug 29, 2021 · 9 comments
Assignees

Comments

@camiloiglesias96
Copy link

camiloiglesias96 commented Aug 29, 2021

  • Laravel/Lumen version: Laravel 8.47.0
  • RabbitMQ version: RabbitMQ 3.8.14
  • Package version: vyuldashev/laravel-queue-rabbitmq 11.2.0

Hi guys, at the moment have a RabbitMQ own instance for DEV purposes and that not receive messages everyday but the workers (Supervisor daemon using the package command) keep up running in the server. A days later i come back to the server and looking the logs the worker lost the connection and log the error Broken pipe or closed connection {"exception":"[object] (PhpAmqpLib\\Exception\\AMQPConnectionClosedException(code: 32). Deeging in the error and reading some documentation of RabbitMQ and issues of php-amqlib, usually this queues handlers has a concept called heartbeat to talk with the handler server and know if this still available and ready to receive messages from our app and i dont know if actually using the package config file we can send this signal to our server and probably avoid this error.

Steps To Reproduce

The steps to reproduce this error is have a long time running the package command to consume messages and the error will appear in you log files.

Current behavior

  • What happens with the worker?

You send a message today and is consummed by the worker then 3 days later you send again a message and this isn't consummed the worker is apparently dead the only way is restart supervisor and this works again at least in the moment.

  • Is the message retried or put back into the queue?
    No

  • Is the message acknowledged or rejected?
    No

  • Is the message unacked?
    Yes

  • Is the message gone?
    I really dont know i suppose if the message is not consummed and still in the queue, the message exist and not is gone 😄

Expected behavior

Send a message today and the worker consume the message, then send a message 3 days, 1 week or 1 month later and do just the same be consumed by the worker, thats the point 😄

Thanks a lot, have a nice weekend ❤️ 🍹

@str1k3r
Copy link

str1k3r commented Sep 3, 2021

We are facing with exactly the same problem

@Slauta
Copy link

Slauta commented Sep 3, 2021

Hi @camiloiglesias96, there is such a problem in AMQPLazyConnection.

Maybe try using AMQPStreamConnection. It helped me.

@camiloiglesias96
Copy link
Author

camiloiglesias96 commented Sep 11, 2021

Hi guys! @str1k3r @Slauta

Talking with one job colleague we got a probable reason why this is happening and maybe the most correct workaround could be recycle the worker process each 5 minutes maybe and using supervisor this will up again the consumers.

Let me test in to share my experience ❤️

@acosta-edgar
Copy link

I'm facing similar problems, although we have not identified the causes yet.
This seems to explain how to properly handle the connection, and I don't see the exceptions being caught here.

@Slauta
Copy link

Slauta commented Sep 17, 2021

@acosta-edgar I think the connection is disconnected at the TCP level after 60 minutes. The task in RabbitMq is returned to the queue after the connection is dropped. We get infinite processing of the task.

@acosta-edgar
Copy link

@Slauta , no, that is not the case.
On the one hand, this problem happens at different times, sometimes after several days of running without issues.
On the other hand, the lumen/laravel worker is not terminated. So, there is really no way to reconnect, or just drop the process letting your supervisord to instantiate a new one.

Now, in fact, I'm seeing this issue while the job is polling the queue, not while handling the job.

@camiloiglesias96
Copy link
Author

@acosta-edgar @vyuldashev @Slauta Maybe the solution at least for this package and talking about of consumer command could be add a timeout param and kill the process based using this param because this allow to supervisor bring up againg the process 💯

@violarium
Copy link
Contributor

I've bumped into similar problem with php artisan queue:work

It happens, because worker only restarts, when exception is considered as "lost connection". Otherwise - it will report an error and just continue.

And the funny part - there is one hard-coded place which decides if exception is "lost connection" - Illuminate\Database\DetectsLostConnections. It's a trait used by Illuminate\Queue\Worker.

It belongs to DB and just check error message to include one of specific phrases like:

  • server has gone away
  • SQLSTATE[HY000] [2002] Connection refused
  • ...

AMQP error messages are not in that list. And there are no proper way to extend it. Ideally, all the queue-adapters should throw special exception which would be handled by worker. But we have what we have.

I found workaround which worked for me:

  1. all the exceptions in my situations are happened on RabbitMQQueue::pop
  2. you can specify custom worker in configuration file
  3. so, I just made my custom worker, which catches AMQP-exception and makes another one, which treated by laravel worker properly
<?php

declare(strict_types=1);

namespace App\Services\RabbitMQ;

use PhpAmqpLib\Exception\AMQPExceptionInterface;
use VladimirYuldashev\LaravelQueueRabbitMQ\Queue\RabbitMQQueue;
use RuntimeException;

class FixedRabbitMQQueue extends RabbitMQQueue
{
    public function pop($queue = null)
    {
        try {
            return parent::pop($queue);
        } catch (AMQPExceptionInterface $e) {
            throw new RuntimeException('Lost connection: ' . $e->getMessage(), 0, $e);
        }
    }
}
// Set to "horizon" if you wish to use Laravel Horizon.
'worker' => env('RABBITMQ_WORKER', App\Services\RabbitMQ\FixedRabbitMQQueue::class),

Probably, it's better to catch only specific errors like AMQPChannelClosedException and AMQPConnectionClosedException.

@vyuldashev
Copy link
Owner

Should be fixed by #457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants