Skip to content

Validate key slots used with RedisCluster::Client#with #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

KJTsanaktsidis
Copy link
Contributor

The overall objective of this PR is to unskip the test which asserts that TransactionConsistencyError is raised when using a key belonging to the wrong slot on a connection returned from #with. i.e. it makes this test pass:

        assert_raises(::RedisClient::Cluster::Transaction::ConsistencyError) do
          @client.with(hashtag: 'slot1') do |conn|
            conn.call('GET', '{slot2}')
          end
        end

In order to get there, though, there is a bit of a journey. I've broken it up into five separate commits, which I can make separate PR's for - I did want to show you the whole picture before we drill down on the detail however.

Firstly, and sadly, I discovered that we do actually need to make a custom subclass of RedisClient, and pass that as client_implementation: in our custom config class. I had previously thought I could implement the slot validation just with middleware, but we need a way to communicate between #with, which just has access to the client object, and the middleware. By implementing a custom subclass, we can define a method Client#locked_to_key_slot, which essentially turns on and off the middleware by setting a flag on the client object which the middleware can see.

I do realise that we'd discussed doing this without the client subclass, but unfortunately I could not find another way.

The next phase of this PR involves getting access to the RedisCluster::Client::Command object from inside the middleware. There's a bit of circularity here, because obviously we need clients to run COMMAND on, but we need the Command object inside the client middleware (to be able to call #extract_all_keys on passed in commands). I resolved this by turning Command into a long-lived object which gets passed into the Node::Config redisclient config objects.

Once all that's done, we're finally in a position to actually implement the "lock to a single slot"; that then turns out to be pretty simple in a single file pinning.rb; we just check if the connection is currently supposed to be pinned, and compare the pinned slot against Command#extract_all_keys(command) if so.

Lets debate the general direction of this PR, and I can cut it up into smaller pieces for more detailed review once we're OK with that perhaps?

KJ Tsanaktsidis added 5 commits January 23, 2024 17:21
I tried really hard to avoid doing this, but unfortunately I don't think
we can. In order for pinning support to be safe, the underlying
RedisCluster instances we hand out from the `#with` method need to
know that they're "locked" to a particular slot, so that they can reject
cross-slot accesses that might become invalid on a cluster scale out or
in.

I had thought it would be possible to do this with middleware, but, the
middleware can't easily be accessed from `#with` to set & unset the
pinning state.
This is required because the next commit is going to need to access this
object from inside a RedisClient middleware.
This is so we can pass it into the individual cluster connections, which
obviously need to exist before COMMANDS is run.

The cluster connections themselves are going to need this in order to
implement slot number validation - because they need to know how to
extract the keys.
This is the final phase of pushing down the Command object. Now, it's
available on the Node::Config object, and thus from the connection
objects.
This is achieved by having the #with method "lock" the yielded
connection to a particular slot, and having middleware which implements
that locking.
@KJTsanaktsidis
Copy link
Contributor Author

Poke @supercaracal wondering if you're able to have a look this week? 🙏 I anticipate this is going to need some feedback and reworking :)

@supercaracal
Copy link
Member

Sorry for my delayed response. I'm going to review this pull request within a week.

end

attr_reader :cluster_commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between cluster_commands and commands?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misread it. Forget about that.

@@ -78,14 +79,25 @@ def []=(index, element)
end
end

class SingleNodeRedisClient < ::RedisClient
include Pinning::ClientMixin
Copy link
Member

@supercaracal supercaracal Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following complexity as you said is derived from this mixin and middleware pattern.

  • Config instance holds the command information.
  • Our cluster client uses our own client implementation.
  • Pinning module holds the state of the locking.

I think it would be better to use a simple wrapper object for the slot validation.

class RedisClient
  class Cluster
    class PinningClient
      def initialize(client, key, command, command_builder)
        @client = client
        @slot = ::RedisClient::Cluster::KeySlotConverter.convert(key)
        @command = command
        @command_builder = command_builder
      end

      def call(*args, **kwargs, &block)
        command = @command_builder.generate(args, kwargs)
        call_v(command, &block)
      end

      def call_v(command, &block)
        validate_slot!(command)
        @client.call_v(command, &block)
      end

      # etc...

      private

      def validate_slot!(command)
        # raise an error if command is invalid
      end
    end
  end
end
class RedisClient
  class Cluster
    def with(key: nil, hashtag: nil, write: true, retry_count: 0)
      key = process_with_arguments(key, hashtag)
      node_key = @router.find_node_key_by_key(key, primary: write)
      node = @router.find_node(node_key)
      @router.try_delegate(node, :with, retry_count: retry_count) do |conn|
        yield PinningClient.new(conn, key, @command_builder, @router.command)
      end
    end
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this, but it's a little awkward because of the need to re-wrap the client if pipelined or multi is called. It ends up looking like this: https://github.com/redis-rb/redis-cluster-client/pull/298/files#diff-426e610bbc89f748ae5f9d43790b07d10fed426fa431d6bea29b67f649a7eb72R1 which i remember we were not very happy with.

Most of the complexity here is coming from the fact that we have to change the lifecycle of the RedisClient::Cluster::Command object so that it's available for the pinning middleware. If we made our custom RedisClient subclass support some kind of temporary "validation procs" itself, then the RedisCluster::Client instance can construct the validation proc only when #with is called and @commands is already known.

I think that will be a lot simpler - let me try that first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit #pipelined and #multi in #with method. There is no user to use in the odd way. We can't never care about all use cases completely. Our users should have responsible to use it method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need #multi at least - the whole point is to enable single shard transactions!

I managed to write a much better implementation of the delegator in #298 today though - expect to see another PR tomorrow!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the #multi methods is holded by the cluster client.

@cluster_client = build_cluster_client

@cluster_client.with(...) do |raw_or_pinning_cli|
  raw_or_pinning_cli.call('CMD')

  @cluster_client.multi(...) do |raw_or_pinning_cli|
    # do something
  end
end

Anyway, I think we shouldn't do overengineering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my plan was to delete the #multi implementation in RedisClient::Cluster (or, at least, re-implement it in terms of RedisClient::Cluster#with.

My understanding was that the original idea of adding this kind of “pinning” was to avoid needing to implement transaction support in RedisClient::Cluster at all - instead users can issue transactions directly against the correct node selected by #with

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering your decision. In Redis cluster, I think users should use the transaction feature as simple because CAP theory. I've thought the #with method is just an optional way to be able to use the transaction feature with some complecated ways in cluster mode. I've thought it's not the main interface for the transaction feature. It's a bit different from the redis-client. Please give me time to think it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, the original problem I was solving was that RedisClient::Cluster#multi currently calls the transaction block twice - once itself to figure out what node to send the transaction to, and then RedisClient#multi calls it again to actually build up the transaction to execute. This interface is also different from what's in RedisClient!

From my pretty extensive survey of the options here over the last few months, there's a pretty fundamental tradeoff to make here:

  • If you want to implement transaction support inside redis-cluster-client in terms of RedisClient#multi, you must know the node ahead of time. You can't lazily call RedisClient#multi once you see the first command in the transaction, because you've already yielded control back to the caller, and can't give the caller the real RedisClient::Multi object without yielding again (which was the original bug I was trying to solve!).
  • If you absolutely want RedisClient::Cluster#multi to work without knowing the node ahead of time (and thus, make it fully compatible), the only choice is to implement a kind of 'lazy transaction' system like I did in Support watch & unwatch properly #295. That is, you wait for the first command in the transaction, then issue MULTI on the appropriate node (and, thus, the redis-cluster-client must also take responsibility for calling EXEC/DISCARD/UNWATCH when appropriate too). This will work when issuing client.call('MULTI') ... client.call('EXEC'), which is valid too with RedisClient.

My understanding is that we settled on the #with interface as the best way to implement "knowing the node ahead of time", because you can also make the same code still work with RedisClient (since RedisClient#with does nothing - like I wrote in the docs here:

Because `RedisClient` from the redis-client gem implements `#with` as simply `yield self` and ignores all of its
arguments, it's possible to write code which is compatible with both redis-client and redis-cluster-client; the `#with`
call will pin the connection to a slot when using clustering, or be a no-op when not.
).

If we want to make RedisClient::Cluster#multi keep working how it is now, for backwards compatibility, we can keep its current implementation. However going forward I think we should be recommending people to use RedisClient::Cluter#with since it doesn't have the calls-block-twice behaviour.

For compatability with RedisClient, an approach like #295 would be better - it would let us behave exactly like a drop in replacement for RedisClient.

@supercaracal
Copy link
Member

supercaracal commented Feb 5, 2024

One more thing, I think it would be better to be able to enable or disable the client side validation feature by options.

class RedisClient
  class Cluster
    def with(key: nil, hashtag: nil, write: true, retry_count: 0)
      key = process_with_arguments(key, hashtag)
      node_key = @router.find_node_key_by_key(key, primary: write)
      node = @router.find_node(node_key)
      @router.try_delegate(node, :with, retry_count: retry_count) do |conn|
        if @config.client_side_validation_enabled?
          yield PinningClient.new(conn, key, @command_builder, @router.command)
        else
          yield conn
        end
      end
    end
  end
end

@KJTsanaktsidis
Copy link
Contributor Author

Thanks for your feedback @supercaracal - I opened #317 instead which makes the changes you suggested here (it's so different I figured it was better to open a separate PR rather than clutter this one).

@KJTsanaktsidis KJTsanaktsidis deleted the ktsanaktsidis/validate_keys branch February 19, 2024 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants