-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Resiliency: Forbid index names over 100 characters in length #7252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I was just looking at this page: http://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits and 255 seems to be a common limit across main filesystems? It seems that the limit is in bytes though, so maybe we should also check that index names are in ASCII? |
But then 100 might not be enough, I just applied your patch and ran the following code and was still able to reproduce the issue: final char[] chars = new char[110];
int c = 6068;
int len = 0;
while (true) {
final int count = Character.toChars(c, chars, len);
if (len + count > 100) {
break;
} else {
len += count;
}
}
final String indexName = new String(chars, 0, len);
createIndex(indexName); (This weird String uses a code point that is one UTF16 char but 3 UTF8 bytes, so although its length is less than 100, the number of bytes is greater than 255 which is the limit on my filesystem.) So I think we either need to decrease the limit (at least for the case when the host encodes file names using UTF8) or enforce that index names are in ASCII first? |
Or maybe we could temporarily do something like:
|
What about enforcing it is below 100 UTF-8 bytes? We could use |
That would work for me. |
Closing in favor of #6736 |
Reopening, this will be back-ported to 1.3. |
LGTM |
Quick sanity check: my longest index is 44 bytes and I encode a ton of |
@nik9000 @clintongormley I'm afraid just like 640k back in the day, 100 bytes isn't enough for everyone. We also encode a fair bit of prefix stuff into the index name, and due to the way our customers generate their "external" index names, it just isn't enough for all use cases. My suggestion would be to make this configurable (with a default of 100), and I'll submit a patch if this has a chance of being accepted. |
I suppose I should have surrounded my last sentence with sarcasm tags. Its certainly enough for me at this point and will probably stay that way for the foreseeable future but I don't claim to speak for everyone. |
@klausbrunner out of curiosity, how long is your longest index name? |
@dakrone About 140 characters, which can be a bit more in UTF8 bytes as we allow some non ASCII characters. |
@dakrone any problem with making the limit 255 bytes? |
@clintongormley no, no problem with it, that works fine for me. |
ok good - i think we should get that into 1.4? |
@clintongormley sure, want to open another issue for it and assign me? I will work on it when I get a chance, shouldn't take too long. |
Fixes #4417
I picked 100 sort of arbitrarily, I'm open to any suggestions for a better limit.