Skip to content

For 4.1.x, by @aaron-seo: introduce a command that would force QQs to take a checkpoint and truncate its segments #13548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

michaelklishin
Copy link
Collaborator

This is #13175 by @aaron-seo plus some changes from me that to address the QA feedback from @kjnilsson.

@michaelklishin
Copy link
Collaborator Author

@aaron-seo @kjnilsson note that due to how rabbit_fifo:do_checkpoints/4 works, I had to adapt this test to publish more data (which is consistent with how we expect this new command to be used) and wait for a minimum amount of time between checkpoints.

Otherwise the checkpoint is not actually taken in the test, even if the aux state is updated.

@kjnilsson I have rolled back the rabbit_fifo.hrl change that wasn't necessary. Anything else that you'd like to see changed?

Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this functionality force a checkpoint on a given member or all members of a queue? I'm not sure it works correctly in all cases but am somewhat unsure of the goal of this new command.

{QName, {error, Err}}
end
end
|| Q <- rabbit_db_queue:get_all_durable_by_type(?MODULE),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will include queues that do not have any members on the current node but the force_checkpoint_on_queue will only ask any local member to force a checkpoint.

{ok, Q} when ?amqqueue_is_quorum(Q) ->
{RaName, _} = amqqueue:get_pid(Q),
rabbit_log:debug("Sending command to force ~ts to take a checkpoint", [QNameFmt]),
rpc:call(Node, ra, cast_aux_command, [{RaName, Node}, force_checkpoint], ?FORCE_CHECKPOINT_RPC_TIMEOUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to do an rpc here as this function always addresses the quorum queue member on the node it runs on. is this the intent? to force a checkpoint on a single member if there is one locally (there are no checks to ensure that).

def help_section, do: :replication

def description,
do: "Forces checkpoints for all matching quorum queues"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it will only force a checkpoint on the local member of the matching queues, if there is a local member.

@michaelklishin
Copy link
Collaborator Author

When it comes to triggering a checkpoint for all replicas on the given node vs. all replicas, period, I guess the right thing to do is to do it on all replicas.

I'll look into it.

@michaelklishin michaelklishin marked this pull request as draft March 17, 2025 15:28
@michaelklishin
Copy link
Collaborator Author

We have agreed to finish this most likely after 4.1.0 ships. We are still interested in this feature but it's not a blocker for 4.1.0.

@michaelklishin michaelklishin changed the title By @aaron-seo: introduce a command that would force QQs to take a checkpoint and truncate its segments For 4.1.x, by @aaron-seo: introduce a command that would force QQs to take a checkpoint and truncate its segments Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants