Skip to content

Commit 4ea2ec1

Browse files
[DOCS] Adds docs to built-in and Eland model support in Inference API (elastic#105500) (elastic#105536)
Co-authored-by: Max Hniebergall <[email protected]>
1 parent e5ebcc1 commit 4ea2ec1

File tree

1 file changed

+91
-11
lines changed

1 file changed

+91
-11
lines changed

docs/reference/inference/put-inference.asciidoc

Lines changed: 91 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@ experimental[]
66

77
Creates a model to perform an {infer} task.
88

9-
IMPORTANT: The {infer} APIs enable you to use certain services, such as ELSER,
10-
OpenAI, or Hugging Face, in your cluster. This is not the same feature that you
11-
can use on an ML node with custom {ml} models. If you want to train and use your
12-
own model, use the <<ml-df-trained-models-apis>>.
9+
IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in
10+
{ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, or
11+
Hugging Face, in your cluster. For built-in models and models uploaded though
12+
Eland, the {infer} APIs offer an alternative way to use and manage trained
13+
models. However, if you do not plan to use the {infer} APIs to use these models
14+
or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>.
1315

1416

1517
[discrete]
@@ -39,6 +41,7 @@ The following services are available through the {infer} API:
3941
* ELSER
4042
* Hugging Face
4143
* OpenAI
44+
* text embedding (for built-in models and models uploaded through Eland)
4245

4346

4447
[discrete]
@@ -70,13 +73,15 @@ Available services:
7073
* `hugging_face`: specify the `text_embedding` task type to use the Hugging Face
7174
service.
7275
* `openai`: specify the `text_embedding` task type to use the OpenAI service.
76+
* `text_embedding`: specify the `text_embedding` task type to use the E5
77+
built-in model or text embedding models uploaded by Eland.
7378

7479
`service_settings`::
7580
(Required, object)
7681
Settings used to install the {infer} model. These settings are specific to the
7782
`service` you specified.
7883
+
79-
.`service_settings` for `cohere`
84+
.`service_settings` for the `cohere` service
8085
[%collapsible%closed]
8186
=====
8287
`api_key`:::
@@ -106,19 +111,22 @@ https://docs.cohere.com/reference/embed[Cohere docs]. Defaults to
106111
`embed-english-v2.0`.
107112
=====
108113
+
109-
.`service_settings` for `elser`
114+
.`service_settings` for the `elser` service
110115
[%collapsible%closed]
111116
=====
112117
`num_allocations`:::
113118
(Required, integer)
114-
The number of model allocations to create.
119+
The number of model allocations to create. `num_allocations` must not exceed the
120+
number of available processors per node divided by the `num_threads`.
115121
116122
`num_threads`:::
117123
(Required, integer)
118-
The number of threads to use by each model allocation.
124+
The number of threads to use by each model allocation. `num_threads` must not
125+
exceed the number of available processors per node divided by the number of
126+
allocations. Must be a power of 2. Max allowed value is 32.
119127
=====
120128
+
121-
.`service_settings` for `hugging_face`
129+
.`service_settings` for the `hugging_face` service
122130
[%collapsible%closed]
123131
=====
124132
`api_key`:::
@@ -138,7 +146,7 @@ the same name and the updated API key.
138146
The URL endpoint to use for the requests.
139147
=====
140148
+
141-
.`service_settings` for `openai`
149+
.`service_settings` for the `openai` service
142150
[%collapsible%closed]
143151
=====
144152
`api_key`:::
@@ -164,13 +172,36 @@ https://platform.openai.com/account/organization[**Settings** > **Organizations*
164172
The URL endpoint to use for the requests. Can be changed for testing purposes.
165173
Defaults to `https://api.openai.com/v1/embeddings`.
166174
=====
175+
+
176+
.`service_settings` for the `text_embedding` service
177+
[%collapsible%closed]
178+
=====
179+
`model_id`:::
180+
(Required, string)
181+
The name of the text embedding model to use for the {infer} task. It can be the
182+
ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or
183+
a text embedding model already
184+
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
185+
186+
`num_allocations`:::
187+
(Required, integer)
188+
The number of model allocations to create. `num_allocations` must not exceed the
189+
number of available processors per node divided by the `num_threads`.
190+
191+
`num_threads`:::
192+
(Required, integer)
193+
The number of threads to use by each model allocation. `num_threads` must not
194+
exceed the number of available processors per node divided by the number of
195+
allocations. Must be a power of 2. Max allowed value is 32.
196+
=====
197+
167198

168199
`task_settings`::
169200
(Optional, object)
170201
Settings to configure the {infer} task. These settings are specific to the
171202
`<task_type>` you specified.
172203
+
173-
.`task_settings` for `text_embedding`
204+
.`task_settings` for the `text_embedding` task type
174205
[%collapsible%closed]
175206
=====
176207
`input_type`:::
@@ -234,6 +265,31 @@ PUT _inference/text_embedding/cohere-embeddings
234265
// TEST[skip:TBD]
235266

236267

268+
[discrete]
269+
[[inference-example-e5]]
270+
===== E5 via the text embedding service
271+
272+
The following example shows how to create an {infer} model called
273+
`my-e5-model` to perform a `text_embedding` task type.
274+
275+
[source,console]
276+
------------------------------------------------------------
277+
PUT _inference/text_embedding/my-e5-model
278+
{
279+
"service": "text_embedding",
280+
"service_settings": {
281+
"num_allocations": 1,
282+
"num_threads": 1,
283+
"model_id": ".multilingual-e5-small" <1>
284+
}
285+
}
286+
------------------------------------------------------------
287+
// TEST[skip:TBD]
288+
<1> The `model_id` must be the ID of one of the built-in E5 models. Valid values
289+
are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`. For
290+
further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation].
291+
292+
237293
[discrete]
238294
[[inference-example-elser]]
239295
===== ELSER service
@@ -304,6 +360,30 @@ endpoint URL. Select the model you want to use on the new endpoint creation page
304360
task under the Advanced configuration section. Create the endpoint. Copy the URL
305361
after the endpoint initialization has been finished.
306362

363+
[discrete]
364+
[[inference-example-eland]]
365+
===== Models uploaded by Eland via the text embedding service
366+
367+
The following example shows how to create an {infer} model called
368+
`my-msmarco-minilm-model` to perform a `text_embedding` task type.
369+
370+
[source,console]
371+
------------------------------------------------------------
372+
PUT _inference/text_embedding/my-msmarco-minilm-model
373+
{
374+
"service": "text_embedding",
375+
"service_settings": {
376+
"num_allocations": 1,
377+
"num_threads": 1,
378+
"model_id": "msmarco-MiniLM-L12-cos-v5" <1>
379+
}
380+
}
381+
------------------------------------------------------------
382+
// TEST[skip:TBD]
383+
<1> The `model_id` must be the ID of a text embedding model which has already
384+
been
385+
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
386+
307387

308388
[discrete]
309389
[[inference-example-openai]]

0 commit comments

Comments
 (0)