@@ -6,10 +6,12 @@ experimental[]
6
6
7
7
Creates a model to perform an {infer} task.
8
8
9
- IMPORTANT: The {infer} APIs enable you to use certain services, such as ELSER,
10
- OpenAI, or Hugging Face, in your cluster. This is not the same feature that you
11
- can use on an ML node with custom {ml} models. If you want to train and use your
12
- own model, use the <<ml-df-trained-models-apis>>.
9
+ IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in
10
+ {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, or
11
+ Hugging Face, in your cluster. For built-in models and models uploaded though
12
+ Eland, the {infer} APIs offer an alternative way to use and manage trained
13
+ models. However, if you do not plan to use the {infer} APIs to use these models
14
+ or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>.
13
15
14
16
15
17
[discrete]
@@ -39,6 +41,7 @@ The following services are available through the {infer} API:
39
41
* ELSER
40
42
* Hugging Face
41
43
* OpenAI
44
+ * text embedding (for built-in models and models uploaded through Eland)
42
45
43
46
44
47
[discrete]
@@ -70,13 +73,15 @@ Available services:
70
73
* `hugging_face`: specify the `text_embedding` task type to use the Hugging Face
71
74
service.
72
75
* `openai`: specify the `text_embedding` task type to use the OpenAI service.
76
+ * `text_embedding`: specify the `text_embedding` task type to use the E5
77
+ built-in model or text embedding models uploaded by Eland.
73
78
74
79
`service_settings`::
75
80
(Required, object)
76
81
Settings used to install the {infer} model. These settings are specific to the
77
82
`service` you specified.
78
83
+
79
- .`service_settings` for `cohere`
84
+ .`service_settings` for the `cohere` service
80
85
[%collapsible%closed]
81
86
=====
82
87
`api_key`:::
@@ -106,19 +111,22 @@ https://docs.cohere.com/reference/embed[Cohere docs]. Defaults to
106
111
`embed-english-v2.0`.
107
112
=====
108
113
+
109
- .`service_settings` for `elser`
114
+ .`service_settings` for the `elser` service
110
115
[%collapsible%closed]
111
116
=====
112
117
`num_allocations`:::
113
118
(Required, integer)
114
- The number of model allocations to create.
119
+ The number of model allocations to create. `num_allocations` must not exceed the
120
+ number of available processors per node divided by the `num_threads`.
115
121
116
122
`num_threads`:::
117
123
(Required, integer)
118
- The number of threads to use by each model allocation.
124
+ The number of threads to use by each model allocation. `num_threads` must not
125
+ exceed the number of available processors per node divided by the number of
126
+ allocations. Must be a power of 2. Max allowed value is 32.
119
127
=====
120
128
+
121
- .`service_settings` for `hugging_face`
129
+ .`service_settings` for the `hugging_face` service
122
130
[%collapsible%closed]
123
131
=====
124
132
`api_key`:::
@@ -138,7 +146,7 @@ the same name and the updated API key.
138
146
The URL endpoint to use for the requests.
139
147
=====
140
148
+
141
- .`service_settings` for `openai`
149
+ .`service_settings` for the `openai` service
142
150
[%collapsible%closed]
143
151
=====
144
152
`api_key`:::
@@ -164,13 +172,36 @@ https://platform.openai.com/account/organization[**Settings** > **Organizations*
164
172
The URL endpoint to use for the requests. Can be changed for testing purposes.
165
173
Defaults to `https://api.openai.com/v1/embeddings`.
166
174
=====
175
+ +
176
+ .`service_settings` for the `text_embedding` service
177
+ [%collapsible%closed]
178
+ =====
179
+ `model_id`:::
180
+ (Required, string)
181
+ The name of the text embedding model to use for the {infer} task. It can be the
182
+ ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or
183
+ a text embedding model already
184
+ {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
185
+
186
+ `num_allocations`:::
187
+ (Required, integer)
188
+ The number of model allocations to create. `num_allocations` must not exceed the
189
+ number of available processors per node divided by the `num_threads`.
190
+
191
+ `num_threads`:::
192
+ (Required, integer)
193
+ The number of threads to use by each model allocation. `num_threads` must not
194
+ exceed the number of available processors per node divided by the number of
195
+ allocations. Must be a power of 2. Max allowed value is 32.
196
+ =====
197
+
167
198
168
199
`task_settings`::
169
200
(Optional, object)
170
201
Settings to configure the {infer} task. These settings are specific to the
171
202
`<task_type>` you specified.
172
203
+
173
- .`task_settings` for `text_embedding`
204
+ .`task_settings` for the `text_embedding` task type
174
205
[%collapsible%closed]
175
206
=====
176
207
`input_type`:::
@@ -234,6 +265,31 @@ PUT _inference/text_embedding/cohere-embeddings
234
265
// TEST[skip:TBD]
235
266
236
267
268
+ [discrete]
269
+ [[inference-example-e5]]
270
+ ===== E5 via the text embedding service
271
+
272
+ The following example shows how to create an {infer} model called
273
+ `my-e5-model` to perform a `text_embedding` task type.
274
+
275
+ [source,console]
276
+ ------------------------------------------------------------
277
+ PUT _inference/text_embedding/my-e5-model
278
+ {
279
+ "service": "text_embedding",
280
+ "service_settings": {
281
+ "num_allocations": 1,
282
+ "num_threads": 1,
283
+ "model_id": ".multilingual-e5-small" <1>
284
+ }
285
+ }
286
+ ------------------------------------------------------------
287
+ // TEST[skip:TBD]
288
+ <1> The `model_id` must be the ID of one of the built-in E5 models. Valid values
289
+ are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`. For
290
+ further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation].
291
+
292
+
237
293
[discrete]
238
294
[[inference-example-elser]]
239
295
===== ELSER service
@@ -304,6 +360,30 @@ endpoint URL. Select the model you want to use on the new endpoint creation page
304
360
task under the Advanced configuration section. Create the endpoint. Copy the URL
305
361
after the endpoint initialization has been finished.
306
362
363
+ [discrete]
364
+ [[inference-example-eland]]
365
+ ===== Models uploaded by Eland via the text embedding service
366
+
367
+ The following example shows how to create an {infer} model called
368
+ `my-msmarco-minilm-model` to perform a `text_embedding` task type.
369
+
370
+ [source,console]
371
+ ------------------------------------------------------------
372
+ PUT _inference/text_embedding/my-msmarco-minilm-model
373
+ {
374
+ "service": "text_embedding",
375
+ "service_settings": {
376
+ "num_allocations": 1,
377
+ "num_threads": 1,
378
+ "model_id": "msmarco-MiniLM-L12-cos-v5" <1>
379
+ }
380
+ }
381
+ ------------------------------------------------------------
382
+ // TEST[skip:TBD]
383
+ <1> The `model_id` must be the ID of a text embedding model which has already
384
+ been
385
+ {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
386
+
307
387
308
388
[discrete]
309
389
[[inference-example-openai]]
0 commit comments