Skip to content

Commit 71414d6

Browse files
authored
[ML] Update trained model docs for truncate parameter for bert tokenization (#79652) (#80010)
1 parent 49b348c commit 71414d6

File tree

3 files changed

+64
-0
lines changed

3 files changed

+64
-0
lines changed

docs/reference/ml/df-analytics/apis/get-trained-models.asciidoc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
195195
(Optional, integer)
196196
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
197197
198+
`truncate`::::
199+
(Optional, string)
200+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
201+
198202
`with_special_tokens`::::
199203
(Optional, boolean)
200204
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -249,6 +253,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
249253
(Optional, integer)
250254
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
251255
256+
`truncate`::::
257+
(Optional, string)
258+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
259+
252260
`with_special_tokens`::::
253261
(Optional, boolean)
254262
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -296,6 +304,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
296304
(Optional, integer)
297305
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
298306
307+
`truncate`::::
308+
(Optional, string)
309+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
310+
299311
`with_special_tokens`::::
300312
(Optional, boolean)
301313
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -366,6 +378,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
366378
(Optional, integer)
367379
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
368380
381+
`truncate`::::
382+
(Optional, string)
383+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
384+
369385
`with_special_tokens`::::
370386
(Optional, boolean)
371387
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -413,6 +429,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
413429
(Optional, integer)
414430
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
415431
432+
`truncate`::::
433+
(Optional, string)
434+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
435+
416436
`with_special_tokens`::::
417437
(Optional, boolean)
418438
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -475,6 +495,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
475495
(Optional, integer)
476496
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
477497
498+
`truncate`::::
499+
(Optional, string)
500+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
501+
478502
`with_special_tokens`::::
479503
(Optional, boolean)
480504
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]

docs/reference/ml/df-analytics/apis/put-trained-models.asciidoc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
454454
(Optional, integer)
455455
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
456456

457+
`truncate`::::
458+
(Optional, string)
459+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
460+
457461
`with_special_tokens`::::
458462
(Optional, boolean)
459463
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -496,6 +500,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
496500
(Optional, integer)
497501
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
498502

503+
`truncate`::::
504+
(Optional, string)
505+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
506+
499507
`with_special_tokens`::::
500508
(Optional, boolean)
501509
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -532,6 +540,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
532540
(Optional, integer)
533541
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
534542

543+
`truncate`::::
544+
(Optional, string)
545+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
546+
535547
`with_special_tokens`::::
536548
(Optional, boolean)
537549
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -591,6 +603,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
591603
(Optional, integer)
592604
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
593605

606+
`truncate`::::
607+
(Optional, string)
608+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
609+
594610
`with_special_tokens`::::
595611
(Optional, boolean)
596612
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -626,6 +642,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
626642
(Optional, integer)
627643
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
628644

645+
`truncate`::::
646+
(Optional, string)
647+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
648+
629649
`with_special_tokens`::::
630650
(Optional, boolean)
631651
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -677,6 +697,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
677697
(Optional, integer)
678698
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]
679699

700+
`truncate`::::
701+
(Optional, string)
702+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
703+
680704
`with_special_tokens`::::
681705
(Optional, boolean)
682706
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]

docs/reference/ml/ml-shared.asciidoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -925,6 +925,22 @@ Specifies if the tokenization lower case the text sequence when building the
925925
tokens.
926926
end::inference-config-nlp-tokenization-bert-do-lower-case[]
927927

928+
tag::inference-config-nlp-tokenization-bert-truncate[]
929+
Indicates how tokens are truncated when they exceed `max_sequence_length`.
930+
The default value is `first`.
931+
+
932+
--
933+
* `none`: No truncation occurs; the inference request receives an error.
934+
* `first`: Only the first sequence is truncated.
935+
* `second`: Only the second sequence is truncated. If there is just one sequence,
936+
that sequence is truncated.
937+
--
938+
939+
NOTE: For `zero_shot_classification`, the hypothesis sequence is always the second
940+
sequence. Therefore, do not use `second` in this case.
941+
942+
end::inference-config-nlp-tokenization-bert-truncate[]
943+
928944
tag::inference-config-nlp-tokenization-bert-with-special-tokens[]
929945
Tokenize with special tokens. The tokens typically included in BERT-style tokenization are:
930946
+

0 commit comments

Comments
 (0)