Skip to content

Commit 17f68cc

Browse files
ilias-tboichee
andauthored
feat: add clavata community integration (#1027)
* feat(clavata): add clavata integration - Users of the Clavata integration can now specify the exact labels that must match for the input/output to cause the rail to trigger and abort the flow. - Fixing some aspects of how the configuration is put together - Policy ID aliases make it easier to specify a policy by name instead of ID. - The new action `EvaluateUserInputWithClavataPolicy` allows you to evaluate the user input against a Clavata policy part of a flow that a user has written. - Added the ability for a user to specify ANY/ALL logic for label matches. Co-authored-by: Brett Levenson <[email protected]> Signed-off-by: Ilias Tsangaris <[email protected]> * Add tests * Updated Clavata integration to support v1 and v2 colang - v1 colang requires that the policies for input and output rails be specified in the config because parameters cannot be passed to flows - v2 colang allows parameters in flow definitions, so it should be possible to simply pass the policy and label arguments at the time the flow is defined. Also improved the Clavata Client to handle rate limit and exponential backoff situations. * Adding notebook demostrating use of Clavata.ai wtth Guardrails - Fixed a small bug with UUID format when sent to server * PR feedback * Updated PR to address feedback - Separating examples for 1.0 and 2.x colang - Consolidating action so there's only 1 "action" function for both 1.0 and 2.x (with optional parameters-the action will figure out which approach to use based on what is passed to it). - Removed the extra action that was an example of returning something other than an boolean. * Addressed all remaining review comments - Added test coverage for all pydantic models used by the integration - Added test coverage for the exp backoff decorator and the calculation of next retry time. - Updated the example rails configuration for Colang v2.x so it parses correctly on startup with `nemoguardrails chat` - Changed policy alias topology to use a dict to prevent accidental re-use of the same alias with a different ID. Updated example configs and documentation to show this difference. Notebook updated as well. - Consolidated code for the Clavata check action: It now uses two helpers to determine the correct policy ID and labels to use. Precedence is given to policy/label that are passed directly to the action, but if values are not passed, the config will then be checked to obtain the correct values. This should make migration from colang 1 to v2.x smoother. * run pre-commit * use str in favor of httpurl * fix issue with dashes getting stripped from policy ids * python 3.9 compat * Use union operator rather than pipe for python 3.9 --------- Signed-off-by: Ilias Tsangaris <[email protected]> Co-authored-by: Brett Levenson <[email protected]> Co-authored-by: Brett Levenson <[email protected]>
1 parent 2a4f576 commit 17f68cc

File tree

20 files changed

+2353
-1
lines changed

20 files changed

+2353
-1
lines changed

.vscode/settings.json

+1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
// CSPELL
6969
"cSpell.words": [
7070
"appendleft",
71+
"clavata",
7172
"colang",
7273
"elif",
7374
"interruptible",

docs/user-guides/community/clavata.md

+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Clavata Integration
2+
3+
[Clavata](https://clavata.ai) provides real-time moderation capabilities allowing anyone to detect and filter content. The exact rules of what to filter are up to you, but we do provide a number of rulesets for common issues.
4+
5+
This integration enables NeMo Guardrails to use Clavata for content moderation, topic moderation, and dialog moderation in both input and output flows.
6+
7+
## Getting Access
8+
9+
To sign up for Clavata or obtain an API key:
10+
11+
- [Request access](https://www.clavata.ai/) through the website
12+
- Contact support at <[email protected]>
13+
14+
## Setup
15+
16+
1. Ensure you have access to the Clavata platform and have configured your content moderation policies. You'll need:
17+
- Your Clavata API key
18+
- Policy IDs for the content types you want to moderate
19+
- (Optional) A custom server endpoint if provided by Clavata.ai
20+
21+
2. Set the `CLAVATA_API_KEY` environment variable with your Clavata API key:
22+
23+
```bash
24+
export CLAVATA_API_KEY="your-api-key"
25+
```
26+
27+
3. Configure your `config.yml` according to the following example:
28+
29+
```yaml
30+
rails:
31+
config:
32+
clavata:
33+
policies:
34+
Threats: 00000000-0000-0000-0000-000000000000
35+
Toxicity: 00000000-0000-0000-0000-000000000000
36+
label_match_logic: ALL # "ALL" | "ANY"
37+
input:
38+
# Reference an alias above in `policies`
39+
policy: Threats
40+
output:
41+
policy: Toxicity
42+
# Optional: Specify labels to require specific matches
43+
labels:
44+
- Hate Speech
45+
- Self-harm
46+
# Optional: Only provide this if you've been told to by Clavata.ai
47+
server_endpoint: "https://some-alt-endpoint.com"
48+
# Optional: reference the built-in flows
49+
input:
50+
flows:
51+
- clavata check input
52+
output:
53+
flows:
54+
- clavata check output
55+
```
56+
57+
## Configuration Details
58+
59+
- `server_endpoint`: The Clavata API endpoint (only if provided by Clavata.ai)
60+
- `policies`: Map of policy aliases to each policy's unique ID in your Clavata.ai account
61+
- `label_match_logic`: (Optional) `ALL` requires all labels specified for a rail to match, `ANY` requires at least one match. Defaults to `ANY` if not set.
62+
- `input/output`: Flow-specific configurations
63+
- `policy`: The policy alias to use for this flow
64+
- `labels`: (Optional) List of specific labels to check for
65+
66+
## Usage
67+
68+
The Clavata integration provides two ways to implement content moderation:
69+
70+
### 1. Built-in Flows
71+
72+
#### For users of Colang 1.0
73+
74+
Add these flows to your configuration to automatically check content when using _Colang 1.0_:
75+
76+
```yaml
77+
rails:
78+
input:
79+
flows:
80+
- clavata check input # Check user input
81+
output:
82+
flows:
83+
- clavata check output # Check LLM output
84+
```
85+
86+
#### For users of Colang 2.0
87+
88+
If you're using Colang 2.0, there's no need to specify configuration for input and output rails in your `config.yml`. In fact, doing so is now deprecated. The good news is that because Colang 2.0 supports flows with variables, you can specify which policy to use (and even which labels to match) inline in the definitions for any of your rails (i.e., input, output, dialog, etc.)
89+
90+
Here's an example of how to configure an input rail to check against a specific Clavata policy:
91+
92+
```colang
93+
import guardrails
94+
import nemoguardrails.library.clavata
95+
96+
97+
# Check the input against the "Toxicity" policy
98+
flow input rails $input_text
99+
clavata check for ($input_text, Toxicity)
100+
101+
# To make the check even more strict so it only matches particular labels in the policy, you can add a comma-separated list of labels at the end:
102+
flow input rails $input_text
103+
clavata check for ($input_text, Toxicity, ["Hate Speech","Harassment"])
104+
```
105+
106+
> The same is true for `output` flows, of course. See [our example](../../../examples/configs/clavata_v2/rails.co) for more.
107+
108+
### 2. Programmatic Usage
109+
110+
If you are using colang 2.x, you can make use of the Clavata action in your own flows:
111+
112+
```colang
113+
# Check content
114+
$is_match = await ClavataCheckAction(text=$some_text, policy=$some_policy_alias)
115+
```
116+
117+
The action returns `True` if the content matches the specified policy's criteria.
118+
119+
## Customization
120+
121+
You can customize the content moderation behavior by:
122+
123+
1. Configuring different policies for input and output flows
124+
2. Specifying which labels must match within a policy
125+
3. Setting the label match logic to either "ALL" (all specified labels must match) or "ANY" (at least one label must match)
126+
127+
## Error Handling
128+
129+
If the Clavata API request fails, the system will raise a `ClavataPluginAPIError`. The integration will also raise a `ClavataPluginValueError` if there are configuration issues, such as:
130+
131+
- Invalid policy aliases
132+
- Missing required configuration
133+
- Invalid flow types
134+
135+
## Notes
136+
137+
- Ensure that your Clavata API key is properly set up and accessible
138+
- The integration currently supports content moderation checks for input and output flows
139+
- You can configure different policies and label requirements for input and output flows
140+
- If no labels are specified for a policy, any label match will be considered a hit
141+
142+
For more information on Clavata and its capabilities, please refer to the [Clavata documentation](https://clavata.helpscoutdocs.com).

examples/configs/clavata/README.md

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Clavata Example
2+
3+
This example demonstrates how to integrate with the [Clavata](https://clavata.ai) API for content moderation.
4+
5+
To test this configuration you can use the CLI Chat by running the following command from the `examples/configs/clavata` directory:
6+
7+
```bash
8+
nemoguardrails chat
9+
```
10+
11+
The structure of the config folder is the following:
12+
13+
- `config.yml` - The config file holding all the configuration options for Clavata integration.
14+
15+
Please see the docs for more details about:
16+
17+
- [Full Clavata integration guide](../../../docs/user-guides/community/clavata.md)
18+
- [Configuration options and setup instructions](../../../docs/user-guides/community/clavata.md#setup)
19+
- [Error handling and best practices](../../../docs/user-guides/community/clavata.md#error-handling)

examples/configs/clavata/config.yml

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Example for colang 1.0
2+
models:
3+
- type: main
4+
engine: openai
5+
model: gpt-3.5-turbo-instruct
6+
7+
rails:
8+
config:
9+
clavata:
10+
policies:
11+
Threats: 00000000-0000-0000-0000-000000000000
12+
Toxicity: 00000000-0000-0000-0000-000000000000
13+
# With colang 1.0, we can't pass parameters to flows, so we need to specify the policy to use
14+
# in the input and output rails.
15+
input:
16+
policy: Threats
17+
output:
18+
policy: Toxicity
19+
# You can specify labels to match against as part of the input/output rail configuration
20+
labels:
21+
- Hate Speech
22+
- Self-Harm
23+
input:
24+
flows:
25+
- clavata check input
26+
output:
27+
flows:
28+
- clavata check output

examples/configs/clavata_v2/README.md

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Clavata Example
2+
3+
This example demonstrates how to integrate with the [Clavata](https://clavata.ai) API for content moderation.
4+
5+
To test this configuration you can use the CLI Chat by running the following command from the `examples/configs/clavata` directory:
6+
7+
```bash
8+
nemoguardrails chat
9+
```
10+
11+
The structure of the config folder is the following:
12+
13+
- `config.yml` - The config file holding all the configuration options for Clavata integration.
14+
- `rails.co` - This file shows how to use Clavata's rails if you're using `colang v2.x`. In version `2.x` configuration of rails via the `config.yml` file is deprecated and the new approach shown in `rails.co` is something you would include in your flows.
15+
16+
Please see the docs for more details about:
17+
18+
- [Full Clavata integration guide](../../../docs/user-guides/community/clavata.md)
19+
- [Configuration options and setup instructions](../../../docs/user-guides/community/clavata.md#setup)
20+
- [Error handling and best practices](../../../docs/user-guides/community/clavata.md#error-handling)
21+
22+
## Policy Matching vs Label Matching
23+
24+
If you take a peek at the example [config.yml](config.yml), you'll notice that the integration no longer specifies `input` or `output` keys. Instead, the input and output rails are specified in a `rails.co` file as shown [here](./rails.co).
25+
26+
Additionally, because the `$input_text` or `$output_text` is passed directly to the rail, both input and output rails use the same Clavata action flow `clavata check for`. Just pass the text to be evaluated, the policy to use, and, optionally, the specific labels to look for, and the flow will take care of the rest.
27+
28+
Whether specific label matches are necessary will depend on both the policy being used, and what you're trying to accomplish. For most use cases, matching on the policy will likely be sufficient.
29+
30+
**As an example of when you might want to specify labels:**
31+
32+
Suppose you are using the same policy for input and output rails. In this case, the policy might include many labels, with some relevant to input, some relevant to output, and some relevant to both. You can specify the labels for input and output rails to ensure each rail only triggers when appropriate.
33+
34+
> Note: When setting specific labels to match, you can also control whether the logic is `ALL` (meaning all specified labels must be found), or `ANY`, meaning any of the specific labels being found will trigger the rail. The default is `ANY`, but you can control this by setting the `label_match_logic` key in your [rails config](./config.yml).
+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
colang_version: 2.x
2+
models:
3+
- type: main
4+
engine: openai
5+
model: gpt-3.5-turbo-instruct
6+
7+
# Rail flows are configured inline in the flows file. See ./rails.co for an example.
8+
rails:
9+
config:
10+
clavata:
11+
policies:
12+
Threats: 00000000-0000-0000-0000-000000000000
13+
Toxicity: 00000000-0000-0000-0000-000000000000
14+
# Optional [ALL, ANY], defaults to ANY. When set to ALL, all labels must match for rail to trigger.
15+
label_match_logic: ANY

examples/configs/clavata_v2/main.co

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
import core
2+
import llm
3+
4+
flow main
5+
activate llm continuation

examples/configs/clavata_v2/rails.co

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import guardrails
2+
3+
import nemoguardrails.library.clavata
4+
5+
# Check the input against the "Toxicity" policy, but only match for the "Hate Speech" label
6+
flow input rails $input_text
7+
# In order, the arguments below map to $text, $policy, $labels
8+
clavata check for ($input_text, "Toxicity", ["Hate Speech"])
9+
10+
11+
# Check the output against the "Threats" policy. If any label in the policy matches, the rail will trigger.
12+
flow output rails $output_text
13+
# In order, the arguments below map to $text, $policy (labels may be omitted if you don't care which labels trigger)
14+
clavata check for ($output_text, "Threats")

0 commit comments

Comments
 (0)