Skip to content

Single Active Consumer on a quorum queue runs into an exception after multiple basic.gets #2164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PaulMilla opened this issue Nov 15, 2019 · 6 comments · Fixed by #2167
Closed
Milestone

Comments

@PaulMilla
Copy link

As the title states, on a 3-node cluster if you define a quorum queue with the single active consumer option set then doing multiple BasicGet without ack/nacking the previous one will immediately cause the queue to go into a bad state. This doesn't happen in a normal quorum queue without single active consumer or if receiving messages by subscription. I haven't tested with a larger cluster yet, however I'm fairly confident this would continue to be an issue.

Environment:

Docker image: bitnami/rabbitmq:3.8.1-debian-9-r0

  • RabbitMQ 3.8.1
  • Erlang 22.1
  • Debian 9 "Stretch"

Reproducing

  1. Create a new quorum queue with the single active consumer flag set to true
  2. Publish 2+ messages onto the queue
  3. Attempt to do 2+ BasicGet operations without acking/nacking the previous one. Fails regardless of protocol
    a. HTTP API using the management plugin (500 Internal Server Error)
    b. AMQP using the C# RabbitMq.Client library
    • Results in the following stack trace:
Connected to RabbitMQ
 BasicGetReturned: message1

Unhandled Exception: RabbitMQ.Client.Exceptions.OperationInterruptedException: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=541, text='INTERNAL_ERROR', classId=0, methodId=0
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.BasicGet(String queue, Boolean autoAck)
   at Program.GetNext(IModel channel) in C:\Program.cs:line 55
   at Program.Main(String[] args) in C:\Program.cs:line 43

Afterwards 2 of the 3 nodes are removed from the 'Online' section, and since it can't reach a quorum with a single node then the queue becomes unusable. The nodes are still up and running fine with other quorum/classic-mirrored queues, however this queue never seems to recover.

image

@michaelklishin
Copy link
Collaborator

Server logs, please.

@michaelklishin michaelklishin changed the title Single Active Consumer in Quorum Queues crashes on multiple BasicGet Single Active Consumer on a quorum queue runs into an exception after multiple basic.gets Nov 15, 2019
@PaulMilla
Copy link
Author

PaulMilla commented Nov 15, 2019

@michaelklishin
Copy link
Collaborator

A couple of things I see in the logs

2019-11-15 16:13:33.206 [error] <0.1507.0> handle_leader err {'EXIT',{{badmatch,{inactive,{rabbit_fifo,{cfg,'%2F_qq-test',{resource,<<"/">>,queue,<<"qq-test">>},{64000,64000},{rabbit_quorum_queue,dead_letter_publish,[undefined,undefined,{resource,<<"/">>,queue,<<"qq-test">>}]},{rabbit_quorum_queue,become_leader,[{resource,<<"/">>,queue,<<"qq-test">>}]},undefined,undefined,single_active,undefined,undefined,undefined},#{2 => {4,{8,{basic_message,{resource,<<"/">>,exchange,<<>>},[<<"qq-test">>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<"message2">>]},<<7,85,142,66,154,135,116,51,137,7,68,128,253,46,71,141>>,true}}},3 => {6,{8,{basic_message,{resource,<<"/">>,exchange,<<>>},[<<"qq-test">>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<"message3">>]},<<200,7,154,14,94,11,150,210,17,73,202,142,185,44,240,238>>,true}}}},2,4,{0,{[],[]}},3,#{},{rabbit_fifo_index,#{2 => '',4 => '',6 => ''},2,6},{0,{[],[]}},#{{<<"1">>,<0.1864.0>} => {consumer,#{},#{0 => {1,{2,{8,{basic_message,{resource,<<"/">>,exchange,<<>>},[<<"qq-test">>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<"message1">>]},<<80,250,183,42,106,21,43,186,4,136,177,248,129,114,241,206>>,true}}}}},1,0,1,simple_prefetch,once,up}},{[],[]},{[],[]},16,8,[{{<<"2">>,<0.1864.0>},{consumer,#{},#{},0,1,0,simple_prefetch,once,up}}],16,2}}},[{rabbit_fifo,apply,3,[{file,"src/rabbit_fifo.erl"},{line,281}]},{ra_server,apply_with,2,[{file,"src/ra_server.erl"},{line,1948}]},{ra_server,'-apply_to/5-lists^foldl/2-0-',3,[{file,"src/ra_server.erl"},{line,1906}]},{ra_server,apply_to,5,[{file,"src/ra_server.erl"},{line,1906}]},{ra_server,handle_leader,2,[{file,"src/ra_server.erl"},{line,326}]},{ra_server_proc,handle_leader,2,[{file,"src/ra_server_proc.erl"},{line,906}]},{ra_server_proc,leader,3,[{file,"src/ra_server_proc.erl"},{line,434}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1159}]}]}}
2019-11-15 16:13:33.207 [info] <0.1507.0> queue 'qq-test' in vhost '/': terminating with {'EXIT',{{badmatch,{inactive,{rabbit_fifo,{cfg,'%2F_qq-test',{resource,<<47>>,queue,<<113,113,45,116,101,115,116>>},{64000,64000},{rabbit_quorum_queue,dead_letter_publish,[undefined,undefined,{resource,<<47>>,queue,<<113,113,45,116,101,115,116>>}]},{rabbit_quorum_queue,become_leader,[{resource,<<47>>,queue,<<113,113,45,116,101,115,116>>}]},undefined,undefined,single_active,undefined,undefined,undefined},#{2 => {4,{8,{basic_message,{resource,<<47>>,exchange,<<>>},[<<113,113,45,116,101,115,116>>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<109,101,115,115,97,103,101,50>>]},<<7,85,142,66,154,135,116,51,137,7,68,128,253,46,71,141>>,true}}},3 => {6,{8,{basic_message,{resource,<<47>>,exchange,<<>>},[<<113,113,45,116,101,115,116>>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<109,101,115,115,97,103,101,51>>]},<<200,7,154,14,94,11,150,210,17,73,202,142,185,44,240,238>>,true}}}},2,4,{0,{[],[]}},3,#{},{rabbit_fifo_index,#{2 => '',4 => '',6 => ''},2,6},{0,{[],[]}},#{{<<49>>,<0.1864.0>} => {consumer,#{},#{0 => {1,{2,{8,{basic_message,{resource,<<47>>,exchange,<<>>},[<<113,113,45,116,101,115,116>>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<109,101,115,115,97,103,101,49>>]},<<80,250,183,42,106,21,43,186,4,136,177,248,129,114,241,206>>,true}}}}},1,0,1,simple_prefetch,once,up}},{[],[]},{[],[]},16,8,[{{<<50>>,<0.1864.0>},{consumer,#{},#{},0,1,0,simple_prefetch,once,up}}],16,2}}},[{rabbit_fifo,apply,3,[{file,[115,114,99,47,114,97,98,98,105,116,95,102,105,102,111,46,101,114,108]},{line,281}]},{ra_server,apply_with,2,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,46,101,114,108]},{line,1948}]},{ra_server,'-apply_to/5-lists^foldl/2-0-',...},...]}} in state leader
2019-11-15 16:13:33.207 [debug] <0.1507.0> queue 'qq-test' in vhost '/': terminating with reason '{'EXIT',{{badmatch,{inactive,{rabbit_fifo,{cfg,'%2F_qq-test',{resource,<<47>>,queue,<<113,113,45,116,101,115,116>>},{64000,64000},{rabbit_quorum_queue,dead_letter_publish,[undefined,undefined,{resource,<<47>>,queue,<<113,113,45,116,101,115,116>>}]},{rabbit_quorum_queue,become_leader,[{resource,<<47>>,queue,<<113,113,45,116,101,115,116>>}]},undefined,undefined,single_active,undefined,undefined,undefined},#{2 => {4,{8,{basic_message,{resource,<<47>>,exchange,<<>>},[<<113,113,45,116,101,115,116>>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<109,101,115,115,97,103,101,50>>]},<<7,85,142,66,154,135,116,51,137,7,68,128,253,46,71,141>>,true}}},3 => {6,{8,{basic_message,{resource,<<47>>,exchange,<<>>},[<<113,113,45,116,101,115,116>>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<109,101,115,115,97,103,101,51>>]},<<200,7,154,14,94,11,150,210,17,73,202,142,185,44,240,238>>,true}}}},2,4,{0,{[],[]}},3,#{},{rabbit_fifo_index,#{2 => '',4 => '',6 => ''},2,6},{0,{[],[]}},#{{<<49>>,<0.1864.0>} => {consumer,#{},#{0 => {1,{2,{8,{basic_message,{resource,<<47>>,exchange,<<>>},[<<113,113,45,116,101,115,116>>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<109,101,115,115,97,103,101,49>>]},<<80,250,183,42,106,21,43,186,4,136,177,248,129,114,241,206>>,true}}}}},1,0,1,simple_prefetch,once,up}},{[],[]},{[],[]},16,8,[{{<<50>>,<0.1864.0>},{consumer,#{},#{},0,1,0,simple_prefetch,once,up}}],16,2}}},[{rabbit_fifo,apply,3,[{file,[115,114,99,47,114,97,98,98,105,116,95,102,105,102,111,46,101,114,108]},{line,281}]},{ra_server,apply_with,2,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,46,101,114,108]},{line,1948}]},{ra_server,'-apply_to/5-lists^foldl/2-0-',3,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,46,101,114,108]},{line,1906}]},{ra_server,apply_to,5,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,46,101,114,108]},{line,1906}]},{ra_server,handle_leader,2,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,46,101,114,108]},{line,326}]},{ra_server_proc,handle_leader,2,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,95,112,114,111,99,46,101,114,108]},{line,906}]},{ra_server_proc,leader,3,[{file,[115,114,99,47,114,97,95,115,101,114,118,101,114,95,112,114,111,99,46,101,114,108]},{line,434}]},{gen_statem,loop_state_callback,11,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1159}]}]}}'
2019-11-15 16:13:33.208 [error] <0.1507.0> ** State machine '%2F_qq-test' terminating
** Last event = {cast,{{'%2F_qq-test','rabbit@vociferous-peacock-rabbitmq-2.vociferous-peacock-rabbitmq-headless.bootstrap.svc.cluster.local'},{append_entries_reply,1,true,10,9,1}}}
** When server state  = [{id,{'%2F_qq-test','rabbit@vociferous-peacock-rabbitmq-0.vociferous-peacock-rabbitmq-headless.bootstrap.svc.cluster.local'}},{opt,terminate},{raft_state,leader},{leader_last_seen,undefined},{num_pending_commands,0},{num_delayed_commands,0},{election_timeout_set,false},{ra_server_state,#{cluster => #{{'%2F_qq-test','rabbit@vociferous-peacock-rabbitmq-0.vociferous-peacock-rabbitmq-headless.bootstrap.svc.cluster.local'} => #{commit_index_sent => 0,match_index => 0,next_index => 1,query_index => 0},{'%2F_qq-test','rabbit@vociferous-peacock-rabbitmq-1.vociferous-peacock-rabbitmq-headless.bootstrap.svc.cluster.local'} => ...,...},...}}]
** Reason for termination = exit:{'EXIT',{{badmatch,{inactive,{rabbit_fifo,{cfg,'%2F_qq-test',{resource,<<"/">>,queue,<<"qq-test">>},{64000,64000},{rabbit_quorum_queue,dead_letter_publish,[undefined,undefined,{resource,<<"/">>,queue,<<"qq-test">>}]},{rabbit_quorum_queue,become_leader,[{resource,<<"/">>,queue,<<"qq-test">>}]},undefined,undefined,single_active,undefined,undefined,undefined},#{2 => {4,{8,{basic_message,{resource,<<"/">>,exchange,<<>>},[<<"qq-test">>],{content,60,{'P_basic',undefined,undefined,[],2,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},none,none,[<<"message2">>]},<<7,85,142,66,154,135,116,51,137,7,68,128,253,46,71,141>>,true}}},3 => {6,{8,{basic_message,{resource,<<"/">>,exchange,<<>>},...}}}},...}}},...}}
** Callback mode = [state_functions,state_enter]
** Stacktrace =
**  [{ra_server_proc,handle_leader,2,[{file,"src/ra_server_proc.erl"},{line,913}]},{ra_server_proc,leader,3,[{file,"src/ra_server_proc.erl"},{line,434}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1159}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]
** Time-outs: {1,[{{timeout,tick},tick_timeout}]}

@kjnilsson @acogoluegnes

@kjnilsson
Copy link
Contributor

Right well this is a bug. It fails because of how Quorum Queues implement basic get (as a single message consumer). With SAC enabled the second basic.get request isn't assigned a message as there is already a "consumer" from the previous request. The question is how it should work and what semantics make sense when using SAC and basic.get together.

@michaelklishin
Copy link
Collaborator

I think basic.get violates the SAC nature as, while not a consumer in protocol terms, it is a consumer semantically. I suggest that QQs with SAC enabled reject basic.get requests with a sensible error.

@michaelklishin
Copy link
Collaborator

Some other team members agree that allowing basic.get next to a SAC queue would complicate the semantics too much so we will reject basic.gets for such queues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants