Skip to content

Commit 251b4fc

Browse files
committed
Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
You need 4 bytes for characters outside the BMP, which includes many emoji and a bunch of less-common writing characters too.
1 parent e302d80 commit 251b4fc

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/reference/mapping/params/ignore-above.asciidoc

+2-2
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,5 @@ limit of `32766`.
5656

5757
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
5858
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
59-
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
60-
3 bytes.
59+
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
60+
4 bytes.

0 commit comments

Comments
 (0)