Skip to content

docs: remove NeMo Service (nemollm) related documentation #1077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,6 @@ These results are using the _Simple_ prompt defined in the LLM Self-Checking met
| gpt-3.5-turbo-instruct | 78 | 0 | 97 |
| gpt-3.5-turbo | 70 | 0 | 100 |
| text-davinci-003 | 80 | 0 | 97 |
| nemollm-43b | 88 | 0 | 84 |
| gemini-1.0-pro | 63 | 36<sup>*</sup> | 97 |

<sup>*</sup> Note that as of Mar 13, 2024 `gemini-1.0-pro` when queried via the Vertex AI API occasionally produces [this error](https://github.com/GoogleCloudPlatform/generative-ai/issues/344). Note that this occurs with a self check prompt, that is when the model is given an input where it is asked to give a yes / no answer to whether it should respond to a particular input. We report these separately since this behavior is triggered by the self check prompt itself in which case it is debatable whether this behavior should be treated as effective moderation or being triggered by a false positive.
Expand Down
40 changes: 1 addition & 39 deletions docs/user-guides/configuration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ To use any of the providers, you must install additional packages; when you firs
```

```{important}
Although you can instantiate any of the previously mentioned LLM providers, depending on the capabilities of the model, the NeMo Guardrails toolkit works better with some providers than others. The toolkit includes prompts that have been optimized for certain types of models, such as `openai` and `nemollm`. For others, you can optimize the prompts yourself following the information in the [LLM Prompts](#llm-prompts) section.
Although you can instantiate any of the previously mentioned LLM providers, depending on the capabilities of the model, the NeMo Guardrails toolkit works better with some providers than others. The toolkit includes prompts that have been optimized for certain types of models, such as models provided by`openai` or `llama3` models. For others, you can optimize the prompts yourself following the information in the [LLM Prompts](#llm-prompts) section.
```

#### Using LLMs with Reasoning Traces
Expand Down Expand Up @@ -197,44 +197,6 @@ models:
base_url: http://your_base_url
```

#### NeMo LLM Service

In addition to the LLM providers supported by LangChain, NeMo Guardrails also supports NeMo LLM Service. For example, to use the GPT-43B-905 model as the main LLM, you should use the following configuration:

```yaml
models:
- type: main
engine: nemollm
model: gpt-43b-905
```

You can also use customized NeMo LLM models for specific tasks, e.g., self-checking the user input or the bot output. For example:

```yaml
models:
# ...
- type: self_check_input
engine: nemollm
model: gpt-43b-002
parameters:
tokens_to_generate: 10
customization_id: 6e5361fa-f878-4f00-8bc6-d7fbaaada915
```

You can specify additional parameters when using NeMo LLM models using the `parameters` key. The supported parameters are:

- `temperature`: the temperature that should be used for making the calls;
- `api_host`: points to the NeMo LLM Service host (default '<https://api.llm.ngc.nvidia.com>');
- `api_key`: the NeMo LLM Service key that should be used;
- `organization_id`: the NeMo LLM Service organization ID that should be used;
- `tokens_to_generate`: the maximum number of tokens to generate;
- `stop`: the list of stop words that should be used;
- `customization_id`: if a customization is used, the id should be specified.

The `api_host`, `api_key`, and `organization_id` are fetched automatically from the environment variables `NGC_API_HOST`, `NGC_API_KEY`, and `NGC_ORGANIZATION_ID`, respectively.

For more details, please refer to the NeMo LLM Service documentation and check out the [NeMo LLM example configuration](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/llm/nemollm/README.md).

#### TRT-LLM

NeMo Guardrails also supports connecting to a TRT-LLM server.
Expand Down
40 changes: 20 additions & 20 deletions docs/user-guides/llm-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,26 +20,26 @@ Any new LLM available in Guardrails should be evaluated using at least this set
The following tables summarize the LLM support for the main features of NeMo Guardrails, focusing on the different rails available out of the box.
If you want to use an LLM and you cannot see a prompt in the [prompts folder](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/llm/prompts), please also check the configuration defined in the [LLM examples' configurations](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/llm/README.md).

| Feature | gpt-3.5-turbo-instruct | text-davinci-003 | nemollm-43b | llama-2-13b-chat | falcon-7b-instruct | gpt-3.5-turbo | gpt-4 | gpt4all-13b-snoozy | vicuna-7b-v1.3 | mpt-7b-instruct | dolly-v2-3b | HF Pipeline model |
|----------------------------------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------------|----------------------|----------------------|----------------------|----------------------|------------------------------------|
| Dialog Rails | ✔ (0.74) | ✔ (0.83) | ✔ (0.82) | ✔ (0.77) | ✔ (0.76) | ❗ (0.45) | ❗ | ❗ (0.54) | ❗ (0.54) | ❗ (0.50) | ❗ (0.40) | ❗ _(DEPENDS ON MODEL)_ |
| • Single LLM call | ✔ (0.83) | ✔ (0.81) | ✔ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| • Multi-step flow generation | _EXPERIMENTAL_ | _EXPERIMENTAL_ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| Streaming | ✔ | ✔ | ✔ | - | - | ✔ | ✔ | - | - | - | - | ✔ |
| Hallucination detection (SelfCheckGPT with AskLLM) | ✔ | ✔ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| AskLLM rails | | | | | | | | | | | | |
| • Jailbreak detection | ✔ (0.88) | ✔ (0.88) | ✔ (0.86) | ✖ | ✖ | ✔ (0.85) | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| • Output moderation | ✔ | ✔ | ✔ | ✖ | ✖ | ✔ (0.85) | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| • Fact-checking | ✔ (0.81) | ✔ (0.82) | ✔ (0.81) | ✔ (0.80) | ✖ | ✔ (0.83) | ✖ | ✖ | ✖ | ✖ | ✖ | ❗ _(DEPENDS ON MODEL)_ |
| AlignScore fact-checking _(LLM independent)_ | ✔ (0.89) | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| ActiveFence moderation _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Llama Guard moderation _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Got It AI RAG TruthChecker _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Patronus Lynx RAG Hallucination detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| GCP Text Moderation _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Patronus Evaluate API _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Fiddler Fast Faitfhulness Hallucination Detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔
| Fiddler Fast Safety & Jailbreak Detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Feature | gpt-3.5-turbo-instruct | text-davinci-003 | llama-2-13b-chat | falcon-7b-instruct | gpt-3.5-turbo | gpt-4 | gpt4all-13b-snoozy | vicuna-7b-v1.3 | mpt-7b-instruct | dolly-v2-3b | HF Pipeline model |
|----------------------------------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------------|----------------------|----------------------|----------------------|----------------------|------------------------------------|
| Dialog Rails | ✔ (0.74) | ✔ (0.83) | ✔ (0.77) | ✔ (0.76) | ❗ (0.45) | ❗ | ❗ (0.54) | ❗ (0.54) | ❗ (0.50) | ❗ (0.40) | ❗ _(DEPENDS ON MODEL)_ |
| • Single LLM call | ✔ (0.83) | ✔ (0.81) | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| • Multi-step flow generation | _EXPERIMENTAL_ | _EXPERIMENTAL_ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| Streaming | ✔ | ✔ | - | - | ✔ | ✔ | - | - | - | - | ✔ |
| Hallucination detection (SelfCheckGPT with AskLLM) | ✔ | ✔ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| AskLLM rails | | | | | | | | | | | |
| • Jailbreak detection | ✔ (0.88) | ✔ (0.88) | ✖ | ✖ | ✔ (0.85) | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| • Output moderation | ✔ | ✔ | ✖ | ✖ | ✔ (0.85) | ✖ | ✖ | ✖ | ✖ | ✖ | ✖ |
| • Fact-checking | ✔ (0.81) | ✔ (0.82) | ✔ (0.80) | ✖ | ✔ (0.83) | ✖ | ✖ | ✖ | ✖ | ✖ | ❗ _(DEPENDS ON MODEL)_ |
| AlignScore fact-checking _(LLM independent)_ | ✔ (0.89) | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| ActiveFence moderation _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Llama Guard moderation _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Got It AI RAG TruthChecker _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Patronus Lynx RAG Hallucination detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| GCP Text Moderation _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Patronus Evaluate API _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Fiddler Fast Faitfhulness Hallucination Detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔
| Fiddler Fast Safety & Jailbreak Detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |

Table legend:

Expand Down
Loading