[DOCS] Adds docs to built-in and Eland model support in Inference API (elastic#105500) (elastic#105536)

szabosteve · maxhniebergall · web-flow · commit 4ea2ec11119b · 2024-02-15T09:53:21.000+01:00
Co-authored-by: Max Hniebergall &lt;137079448+maxhniebergall@users.noreply.github.com&gt;
diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc
@@ -6,10 +6,12 @@ experimental[]
 
 Creates a model to perform an {infer} task.
 
-IMPORTANT: The {infer} APIs enable you to use certain services, such as ELSER, 
-OpenAI, or Hugging Face, in your cluster. This is not the same feature that you 
-can use on an ML node with custom {ml} models. If you want to train and use your 
-own model, use the <<ml-df-trained-models-apis>>.
+IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in 
+{ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, or 
+Hugging Face, in your cluster. For built-in models and models uploaded though 
+Eland, the {infer} APIs offer an alternative way to use and manage trained 
+models. However, if you do not plan to use the {infer} APIs to use these models 
+or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>.
 
 
 [discrete]
@@ -39,6 +41,7 @@ The following services are available through the {infer} API:
 * ELSER
 * Hugging Face
 * OpenAI
+* text embedding (for built-in models and models uploaded through Eland)
 
 
 [discrete]
@@ -70,13 +73,15 @@ Available services:
 * `hugging_face`: specify the `text_embedding` task type to use the Hugging Face 
 service.
 * `openai`: specify the `text_embedding` task type to use the OpenAI service.
+* `text_embedding`: specify the `text_embedding` task type to use the E5 
+built-in model or text embedding models uploaded by Eland.
 
 `service_settings`::
 (Required, object)
 Settings used to install the {infer} model. These settings are specific to the
 `service` you specified.
 +
-.`service_settings` for `cohere`
+.`service_settings` for the `cohere` service
 [%collapsible%closed]
 =====
 `api_key`:::
@@ -106,19 +111,22 @@ https://docs.cohere.com/reference/embed[Cohere docs]. Defaults to
 `embed-english-v2.0`.
 =====
 +
-.`service_settings` for `elser`
+.`service_settings` for the `elser` service
 [%collapsible%closed]
 =====
 `num_allocations`:::
 (Required, integer)
-The number of model allocations to create. 
+The number of model allocations to create. `num_allocations` must not exceed the 
+number of available processors per node divided by the `num_threads`.
 
 `num_threads`:::
 (Required, integer)
-The number of threads to use by each model allocation.
+The number of threads to use by each model allocation. `num_threads` must not 
+exceed the number of available processors per node divided by the number of 
+allocations. Must be a power of 2. Max allowed value is 32.
 =====
 +
-.`service_settings` for `hugging_face`
+.`service_settings` for the `hugging_face` service
 [%collapsible%closed]
 =====
 `api_key`:::
@@ -138,7 +146,7 @@ the same name and the updated API key.
 The URL endpoint to use for the requests.
 =====
 +
-.`service_settings` for `openai`
+.`service_settings` for the `openai` service
 [%collapsible%closed]
 =====
 `api_key`:::
@@ -164,13 +172,36 @@ https://platform.openai.com/account/organization[**Settings** > **Organizations*
 The URL endpoint to use for the requests. Can be changed for testing purposes.
 Defaults to `https://api.openai.com/v1/embeddings`.
 =====
++
+.`service_settings` for the `text_embedding` service
+[%collapsible%closed]
+=====
+`model_id`:::
+(Required, string)
+The name of the text embedding model to use for the {infer} task. It can be the 
+ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or 
+a text embedding model already
+{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
+
+`num_allocations`:::
+(Required, integer)
+The number of model allocations to create. `num_allocations` must not exceed the 
+number of available processors per node divided by the `num_threads`.
+
+`num_threads`:::
+(Required, integer)
+The number of threads to use by each model allocation. `num_threads` must not 
+exceed the number of available processors per node divided by the number of 
+allocations. Must be a power of 2. Max allowed value is 32.
+=====
+
 
 `task_settings`::
 (Optional, object)
 Settings to configure the {infer} task. These settings are specific to the
 `<task_type>` you specified.
 +
-.`task_settings` for `text_embedding`
+.`task_settings` for the `text_embedding` task type
 [%collapsible%closed]
 =====
 `input_type`:::
@@ -234,6 +265,31 @@ PUT _inference/text_embedding/cohere-embeddings
 // TEST[skip:TBD]
 
 
+[discrete]
+[[inference-example-e5]]
+===== E5 via the text embedding service
+
+The following example shows how to create an {infer} model called
+`my-e5-model` to perform a `text_embedding` task type.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/text_embedding/my-e5-model
+{
+  "service": "text_embedding",
+  "service_settings": {
+    "num_allocations": 1,
+    "num_threads": 1,
+    "model_id": ".multilingual-e5-small" <1>
+  }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
+<1> The `model_id` must be the ID of one of the built-in E5 models. Valid values 
+are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`. For 
+further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation].
+
+
 [discrete]
 [[inference-example-elser]]
 ===== ELSER service
@@ -304,6 +360,30 @@ endpoint URL. Select the model you want to use on the new endpoint creation page
 task under the Advanced configuration section. Create the endpoint. Copy the URL 
 after the endpoint initialization has been finished.
 
+[discrete]
+[[inference-example-eland]]
+===== Models uploaded by Eland via the text embedding service
+
+The following example shows how to create an {infer} model called
+`my-msmarco-minilm-model` to perform a `text_embedding` task type.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/text_embedding/my-msmarco-minilm-model
+{
+  "service": "text_embedding",
+  "service_settings": {
+    "num_allocations": 1,
+    "num_threads": 1,
+    "model_id": "msmarco-MiniLM-L12-cos-v5" <1>
+  }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
+<1> The `model_id` must be the ID of a text embedding model which has already 
+been 
+{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
+
 
 [discrete]
 [[inference-example-openai]]