Skip to content

Commit 23b6847

Browse files
committed
Merge pull request #53 from Gasol/icu_transform_doc
Update documentation for ICU Transform
2 parents 97e6016 + 2aea018 commit 23b6847

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

README.md

+46
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,52 @@ Here is a sample settings:
224224
}
225225
```
226226

227+
ICU Transform
228+
-------------
229+
Transforms are used to process Unicode text in many different ways. Some include case mapping, normalization,
230+
transliteration and bidirectional text handling.
231+
232+
You can defined transliterator identifiers by using `id` property, and specify direction to `forward` or `reverse` by
233+
using `dir` property, The default value of both properties are `Null` and `forward`.
234+
235+
For example:
236+
237+
```js
238+
{
239+
"index" : {
240+
"analysis" : {
241+
"analyzer" : {
242+
"latin" : {
243+
"tokenizer" : "keyword",
244+
"filter" : ["myLatinTransform"]
245+
}
246+
},
247+
"filter" : {
248+
"myLatinTransform" : {
249+
"type" : "icu_transform",
250+
"id" : "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC"
251+
}
252+
}
253+
}
254+
}
255+
}
256+
```
257+
258+
This transform transliterated characters to latin, and separates accents from their base characters, removes the accents,
259+
and then puts the remaining text into an unaccented form.
260+
261+
The results are:
262+
263+
`你好` to `ni hao`
264+
265+
`здравствуйте` to `zdravstvujte`
266+
267+
`こんにちは` to `kon'nichiha`
268+
269+
Currently the filter only supports identifier and direction, custom rulesets are not yet supported.
270+
271+
For more documentation, Please see the [user guide of ICU Transform](http://userguide.icu-project.org/transforms/general).
272+
227273
License
228274
-------
229275

0 commit comments

Comments
 (0)