Skip to content

Commit 8443ed9

Browse files
docs: multimodal rails support (#1061)
Signed-off-by: Mike McKiernan <[email protected]>
1 parent 07de0e7 commit 8443ed9

File tree

3 files changed

+190
-0
lines changed

3 files changed

+190
-0
lines changed

docs/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ user-guides/guardrails-library
3030
user-guides/guardrails-process
3131
user-guides/colang-language-syntax-guide
3232
user-guides/llm-support
33+
Multimodal Data <user-guides/multimodal>
3334
user-guides/python-api
3435
user-guides/cli
3536
user-guides/server-guide

docs/user-guides/multimodal.md

+115
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
-->
5+
# Multimodal Data with NeMo Guardrails
6+
7+
## About Working with Multimodal Data
8+
9+
NeMo Guardrails toolkit supports adding safety checks to multimodal content---images and text.
10+
The support is for input and output guardrails only.
11+
Depending on the image reasoning model, you can specify the image to check as a base64 encoded data or as a URL.
12+
13+
The safety check uses the image reasoning model as LLM as-a-judge to determine if the content is safe.
14+
The OpenAI, Llama Vision, and Llama Guard models can accept multimodal input and act as a judge model.
15+
16+
You must ensure the image size and prompt length do not exceed the maximum context length of the model.
17+
18+
## Sample Configuration
19+
20+
1. Create a directory, such as `configs/content_safety_vision`, and add a `config.yml` file with the following content:
21+
22+
```{literalinclude} ../../examples/configs/content_safety_vision/config.yml
23+
:language: yaml
24+
```
25+
26+
1. Add a `configs/content_safety_vision/prompts.yml` file with the following content:
27+
28+
```{literalinclude} ../../examples/configs/content_safety_vision/prompts.yml
29+
:language: yaml
30+
```
31+
32+
## Example
33+
34+
The following sample code uses the preceding configuration and sends requests to OpenAI endpoints.
35+
The sample image is a handgun.
36+
37+
1. Set the OpenAI environment variable with your token:
38+
39+
```console
40+
export OPENAI_API_KEY=<api-key>
41+
```
42+
43+
1. Import required libraries:
44+
45+
```{literalinclude} ../../examples/configs/content_safety_vision/demo.py
46+
:language: python
47+
:start-after: "# start-prerequisites"
48+
:end-before: "# end-prerequisites"
49+
```
50+
51+
1. Load the vision content safety configuration:
52+
53+
```{literalinclude} ../../examples/configs/content_safety_vision/demo.py
54+
:language: python
55+
:start-after: "# start-config"
56+
:end-before: "# end-config"
57+
```
58+
59+
1. Send an image reasoning request:
60+
61+
```{literalinclude} ../../examples/configs/content_safety_vision/demo.py
62+
:language: python
63+
:start-after: "# start-image-reasoning"
64+
:end-before: "# end-image-reasoning"
65+
```
66+
67+
1. Send a potentially unsafe request:
68+
69+
```{literalinclude} ../../examples/configs/content_safety_vision/demo.py
70+
:language: python
71+
:start-after: "# start-potentially-unsafe"
72+
:end-before: "# end-potentially-unsafe"
73+
```
74+
75+
## Tips for Base 64 Encoded Images
76+
77+
Some models, such as the Llama Vision models, do not read an image from a URL.
78+
For these models, encode the image in base 64 and provide the encoded image to the model.
79+
80+
The following code sample shows the common Python statements.
81+
82+
```{code-block} python
83+
:emphasize-lines: 11, 23
84+
85+
import base64
86+
import json
87+
88+
from nemoguardrails import RailsConfig
89+
from nemoguardrails.rails.llm.llmrails import LLMRails
90+
91+
config = RailsConfig.from_path("./content_safety_vision")
92+
rails = LLMRails(config)
93+
94+
with open("<path-to-image>", "rb") as image_file:
95+
base64_image = base64.b64encode(image_file.read()).decode()
96+
97+
messages = [{
98+
"role": "user",
99+
"content": [
100+
{
101+
"type": "text",
102+
"text": "what is the surface color that the object is placed on?",
103+
},
104+
{
105+
"type": "image_url",
106+
"image_url": {
107+
"url": f"data:image/jpeg;base64,{base64_image}"
108+
},
109+
},
110+
],
111+
}]
112+
113+
response = rails.generate(messages=messages)
114+
print(json.dumps(response, indent=2))
115+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# fmt: off
17+
# isort: skip_file
18+
19+
# start-prerequisites
20+
import base64
21+
import io
22+
import json
23+
import urllib.request
24+
25+
import requests
26+
# end-prerequisites
27+
28+
# start-config
29+
from nemoguardrails import RailsConfig
30+
from nemoguardrails.rails.llm.llmrails import LLMRails
31+
32+
config = RailsConfig.from_path("./config/")
33+
rails = LLMRails(config, verbose=False)
34+
# end-config
35+
36+
# start-image-reasoning
37+
image_reasoning_message = [{
38+
"role": "user",
39+
"content": [{
40+
"type": "text",
41+
"text": "What is the surface color that the object is placed on?",
42+
},
43+
{
44+
"type": "image_url",
45+
"image_url": {
46+
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/320px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
47+
},
48+
}],
49+
}]
50+
51+
reasoning_response = rails.generate(messages=image_reasoning_message)
52+
print(f"Image Reasoning Response: {reasoning_response}")
53+
print(json.dumps(reasoning_response, indent=2))
54+
# end-image-reasoning
55+
56+
# start-potentially-unsafe
57+
potentially_unsafe_message = [{
58+
"role": "user",
59+
"content": [{
60+
"type": "text",
61+
"text": "How can I use the item in the photo to get a higher salary without working hard?",
62+
},
63+
{
64+
"type": "image_url",
65+
"image_url": {
66+
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/320px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
67+
},
68+
}],
69+
}]
70+
71+
potentially_unsafe_response = rails.generate(messages=potentially_unsafe_message)
72+
print(f"Potentially Unsafe Response: {potentially_unsafe_response}")
73+
print(json.dumps(potentially_unsafe_response, indent=2))
74+
# end-potentially-unsafe

0 commit comments

Comments
 (0)