Multi-key operations in cluster #838

danieljohannsen · 2018-05-27T09:56:52Z

In a Redis cluster, I get the following error message when retrieving a RedisValue array by parsing a RedisKey array in StringGet():

Multi-key operations must involve a single slot; keys can use 'hash tags' to help this, i.e. '{/users/12345}/account' and '{/users/12345}/contacts' will always be in the same slot

at StackExchange.Redis.ServerSelectionStrategy.Select(Message message) at StackExchange.Redis.ConnectionMultiplexer.TryPushMessageToBridge[T](Message message, ResultProcessor1 processor, ResultBox1 resultBox, ServerEndPoint& server) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server)

What can I do to solve this?

The text was updated successfully, but these errors were encountered:

mgravell · 2018-05-27T10:32:00Z

Can I assume here that you aren't currently using "hash tags", and that these keys are just arbitrary keys?

Assuming that: then currently SE.Redis acts naively here; it assumes any single operation is routable to a single server. Theoretically we could make SE.Redis do the work of noting which keys belong to which servers and doing a scatter/gather between the various nodes (essentially returning a Task.WhenAll over all the strands), but that isn't something that exists today.

Options right now:

use "hash tags" if the data is strongly related (this keeps related data in a single slot)
use individual StringGet
we implement a new API that does a scatter/gather (would need to wait for this)
your code could manually groups by slots to perform multiple varadic gets

Note that 2 doesn't necessarily have performance impact, as it can essentially be pipelined:

RedisKey[] keys = ...
RedisValue[] results = await Task.WhenAll(keys.Select(key => db.StringGetAsync(key));

note that the above does not pay latency per operation

(edit: clarification; swapped Array.ConvertAll for Select)

danieljohannsen · 2018-05-27T10:56:41Z

Thanks, this is very useful. Option 1 and 4 are not possible in the current use case. Option 3 would be great to have, but Option 2 pipelined also looks like a good solution for now. Thank you!

dmwharris · 2018-05-31T14:29:16Z

Just wanted to clarify what are multi-key operations. Does single-key operation mean the operation returns a single key/key-value pair in one call and multi-key that it returns multiple keys/key-value pairs in a single call? If so, of the following operations in my code, are any of these supported in default cluster mode without resorting to any of the options above?

SetMembers
SetCombine(SetOperation.Intersect...
HashGet
SetScan(scankey, scanpattern,...
SetCombine(SetOperation.Difference...
HashGetAll

dmwharris · 2018-05-31T15:21:49Z

One other question. If there are current limitations on clusters to handle multi-key operations, then horizontal scaling (adding servers to a cluster) may be less desirable than going to a memory appliance hosting a single Redis instance. Do you have any info or case studies on massive memory installations using single Redis instances?

NickCraver · 2018-05-31T16:05:20Z

@dmwharris Can you define "massive"? What sort of scale are we talking about here?

mgravell · 2018-05-31T16:25:40Z

Setmembers hashgetall hashget and setscan are all single key operations and should be fine. Just could how many RedisKey parameters a method has: it is is more than one (including via arrays) then it is multi-key, and it will need to be reputable to a single server typically via "hash tags"

mgravell · 2018-05-31T16:35:08Z

Regards the second question (about scale) - frankly, that's a redis server question and would best be posted to the redis server folks. The client library merely exposes the functionality (and limitations) available at the server.

…

On Thu, 31 May 2018, 16:21 dmwharris, ***@***.***> wrote: One other question. If there are current limitations on clusters to handle multi-key operations, then horizontal scaling (adding servers to a cluster) may be less desirable than going to a memory appliance hosting a single Redis instance. Do you have any info or case studies on massive memory installations using single Redis instances? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#838 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDsKw1UDD8xeQ8ijhOwu0-xy3ytEDkks5t4AqOgaJpZM4UPN9a> .

dmwharris · 2018-05-31T16:58:28Z

Sure, sorry, massive as in 500GB to multi-terabyte memory like
IBM Power795 (16TB memory)
HP ProLiant DL380p Gen8 Server (12 TB memory)

I'll take this to the Redis server team.

dmwharris · 2018-05-31T17:29:05Z

So Intersection and Difference are multi-key operations and are not supported by the Redis server in default cluster mode, even with hash tags? For these operations, using one of the 4 options above may provide a solution?

NickCraver · 2018-06-01T22:21:48Z

@dmwharris If using hashtags they should work, it's not that multi-key operations aren't supported, it's multi-node operations aren't supported. If you're on the same node, all of this should behave properly.

Can redis do instances that large? Yep. I haven't personally run at that scale though - we have some in the ~300GB range and run without problems, but you are talking about a single process, single replication (e.g. adding a replica copies all the data) and such so some operations time-wise scale up quite a bit as well. Other auxiliary things like the circular change buffer need to be increased so replicas can initiate and catch up.

It's hard to answer if that's a good approach. How stable is the hardware? How much load? Do you need to scale out? Does rack space matter? How about patching and failovers? I'd say things not really about redis but a single server dependency factor in more at those kinds of scale than the limits of the process (unless 200k+ ops/node consistently, or large blocking transfers is a factor in itself).

If cluster is more appealing (it usually is from an infrastructure and redundancy standpoint if server count isn't a bigger issue), then the multi-key operations is solvable with the tags. Sorry that's not really an answer, but "it depends" with no explanation is crap...those are the factors I think "it depends" on. Happy to chat further on it.

mgravell · 2018-06-01T22:40:16Z

"it's multi-node operations aren't supported." Just for completeness: the library actually tried to enforce that multi **slot** operations aren't supported. The cluster keyspace is divided into 16k slots, with each node servicing some subset of slots. However, crossing two slots is always a semantic error, even if they currently resolve to the same node, because one of the key design points of cluster is that slots can migrate between nodes. This means that a multi-key operation could turn out to be single node today, but multi node tomorrow, causing unexpected failures. To guard against this, the library enforces single slot operations.

…

On Fri, 1 Jun 2018, 23:21 Nick Craver, ***@***.***> wrote: @dmwharris <https://github.com/dmwharris> If using hashtags they should work, it's not that multi-key operations aren't supported, it's multi-node operations aren't supported. If you're on the same node, all of this should behave properly. Can redis do instances that large? Yep. I haven't personally run at that scale though - we have some in the ~300GB range and run without problems, but you are talking about a single process, single replication (e.g. adding a replica copies all the data) and such so some operations time-wise scale up quite a bit as well. Other auxiliary things like the circular change buffer need to be increased so replicas can initiate and catch up. It's hard to answer if that's a good approach. How stable is the hardware? How much load? Do you need to scale out? Does rack space matter? How about patching and failovers? I'd say things *not really about redis* but a single server dependency factor in more at those kinds of scale than the limits of the process (unless 200k+ ops/node consistently, or large blocking transfers is a factor in itself). If cluster is more appealing (it usually is from an infrastructure and redundancy standpoint if server count isn't a bigger issue), then the multi-key operations is solvable with the tags. Sorry that's not really an answer, but "it depends" with no explanation is crap...those are the factors I think "it depends" on. Happy to chat further on it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#838 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDsE3QnbCJ2ALFChqibogb9CAu_M-Gks5t4b58gaJpZM4UPN9a> .

sergii-s · 2018-06-19T15:29:00Z

Hello @mgravell , I have one question about your point above :

`Options right now:

use "hash tags" if the data is strongly related (this keeps related data in a single slot)

use individual StringGet

we implement a new API that does a scatter/gather (would need to wait for this)

your code could manually groups by slots to perform multiple varadic gets`
Note that 2 doesn't necessarily have performance impact, as it can essentially be pipelined

Unfortunately option 1 is not possible for me.
I'm currently testing "2" and "4" options, but I've got a significant performance degradation when trying requesting 100+ keys in parallel due to TPL saturation. Grouping by slot improves perfs a little bit, but it's not enough.

The solution I suppose is to implement explicit Pipeline operation to regroup all parallel request into one. Like that we will have only one TaskCompletionSource on the TPL unlike current implicit pipelining .

Is that what you meant by option 3 ? If no, do you think this solution is justifiable and could be eventually implemented ?

mgravell · 2018-06-19T15:56:39Z

Ah, the TPL; always a giver. Yes, that's broadly what I meant by 3. There's already a method that gets the *slot* by key, but that isn't really granular enough. Let me have a look - it probably isn't too ugly to add.

…

On Tue, 19 Jun 2018, 16:29 Sergii, ***@***.***> wrote: Hello @mgravell <https://github.com/mgravell> , I have one question about your point above : `Options right now: 1. use "hash tags" if the data is strongly related (this keeps related data in a single slot) 2. use individual StringGet 3. we implement a new API that does a scatter/gather (would need to wait for this) 4. your code could manually groups by slots to perform multiple varadic gets` Note that 2 doesn't necessarily have performance impact, as it can essentially be pipelined Unfortunately option 1 is not possible for me. I'm currently testing "2" and "4" options, but I've got a significant performance degradation when trying requesting 100+ keys in parallel due to TPL saturation. Grouping by slot improves perfs *a little bit*, but it's not enough. The solution I suppose is to implement explicit Pipeline operation to regroup all parallel request into one. Unlike current implicit pipelining we will have only one TaskCompletionSource on the TPL. Is that what you meant by option 3 ? If no, do you think this solution is justifiable and could be eventually implemented ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#838 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDsKCoagaYFosQIbFmKad66DhwR5jVks5t-RjBgaJpZM4UPN9a> .

…the muxer's state; also avoid hash-slot overhead when not cluster

mgravell · 2018-07-09T09:35:08Z

I've added a GetHashSlot method into the 2.0 code to make this possible for app-code without having to implement the algorithm, and tests that illustrate usage. After consideration, I do not propose to change the varadic multi-string get - IMO it should be obvious and explicit to callers that something unusual is happening, which this would mask. There is non-trivial cost associated with this kind of scatter/gather operation over multiple nodes. We might reconsider that as a separate issue in due course

ilia-cy · 2022-06-06T08:32:27Z

@mgravell
How would you suggest dealing with this error during lua script executions (in cluster of course)?

We have the following script:

local current
current = redis.call(""incr"", KEYS[1])
if current == 1 then
    redis.call(""expire"", KEYS[1], KEYS[2])
    return 1
else
    return current
end

We run it by calling _db.ScriptEvaluateAsync(Script, new RedisKey[] { key, ttl });

What can be done in order to avoid getting the multi key operation errors?

mgravell · 2022-06-06T08:57:59Z

the second parameter is presumably the timeout value? in that case, it isn't a key - it is a value; so: instead of KEYS[2] specify ARGV[1], and pass the value in the values array:

_db.ScriptEvaluateAsync(Script, new RedisKey[] { key }, new RedisValue[] { ttl });

mgravell added a commit that referenced this issue Jul 9, 2018

Fix #838 - provide mechanism for app-code to compute hash-slot using …

cc8297c

…the muxer's state; also avoid hash-slot overhead when not cluster

mgravell closed this as completed Jul 9, 2018

Aaronontheweb mentioned this issue Feb 3, 2021

Redis cluster mode support akkadotnet/Akka.Persistence.Redis#93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-key operations in cluster #838

Multi-key operations in cluster #838

danieljohannsen commented May 27, 2018

mgravell commented May 27, 2018 •

edited

Loading

danieljohannsen commented May 27, 2018 •

edited

Loading

dmwharris commented May 31, 2018 •

edited

Loading

dmwharris commented May 31, 2018

NickCraver commented May 31, 2018

mgravell commented May 31, 2018

mgravell commented May 31, 2018 via email

dmwharris commented May 31, 2018 •

edited

Loading

dmwharris commented May 31, 2018

NickCraver commented Jun 1, 2018

mgravell commented Jun 1, 2018 via email

sergii-s commented Jun 19, 2018 •

edited

Loading

mgravell commented Jun 19, 2018 via email

mgravell commented Jul 9, 2018

ilia-cy commented Jun 6, 2022

mgravell commented Jun 6, 2022 •

edited

Loading

Multi-key operations in cluster #838

Multi-key operations in cluster #838

Comments

danieljohannsen commented May 27, 2018

mgravell commented May 27, 2018 • edited Loading

danieljohannsen commented May 27, 2018 • edited Loading

dmwharris commented May 31, 2018 • edited Loading

dmwharris commented May 31, 2018

NickCraver commented May 31, 2018

mgravell commented May 31, 2018

mgravell commented May 31, 2018 via email

dmwharris commented May 31, 2018 • edited Loading

dmwharris commented May 31, 2018

NickCraver commented Jun 1, 2018

mgravell commented Jun 1, 2018 via email

sergii-s commented Jun 19, 2018 • edited Loading

mgravell commented Jun 19, 2018 via email

mgravell commented Jul 9, 2018

ilia-cy commented Jun 6, 2022

mgravell commented Jun 6, 2022 • edited Loading

mgravell commented May 27, 2018 •

edited

Loading

danieljohannsen commented May 27, 2018 •

edited

Loading

dmwharris commented May 31, 2018 •

edited

Loading

dmwharris commented May 31, 2018 •

edited

Loading

sergii-s commented Jun 19, 2018 •

edited

Loading

mgravell commented Jun 6, 2022 •

edited

Loading