-
Notifications
You must be signed in to change notification settings - Fork 1.2k
add support for the char_group tokenizer #3427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ses to do full blown endpoint testing (cherry picked from commit d6f1ae5)
(cherry picked from commit f5f0c437871589b1fb90b6c4c6f09f0dfc296d7e)
(cherry picked from commit c74ed51e2c30804ffc1d50f95a17893a93bfa6ea)
(cherry picked from commit f2da9f51b43b188cc1b2d09f616fbf87ca268344)
(cherry picked from commit 7ecbee5435df02810ede7f07985e7bb13f66b6f3)
…ch implements the bulk of the setup and tests (cherry picked from commit 8a6e99493a4174a87cc8680609afd0c482cf10d7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the documentation should be updated.
{ | ||
/// <summary> | ||
/// The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to 255. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the documentation should be
A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a
new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit,
punctuation, symbol.
/// The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to 255. | ||
/// </summary> | ||
[JsonProperty("tokenize_on_chars")] | ||
IEnumerable<string> TokenizeOnCharacters { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specialized type that takes a union of enum and char
? string
is no doubt easier to use.
(cherry picked from commit 9ab4384)
pending #3424