|
| 1 | +# Clavata Integration |
| 2 | + |
| 3 | +[Clavata](https://clavata.ai) provides real-time moderation capabilities allowing anyone to detect and filter content. The exact rules of what to filter are up to you, but we do provide a number of rulesets for common issues. |
| 4 | + |
| 5 | +This integration enables NeMo Guardrails to use Clavata for content moderation, topic moderation, and dialog moderation in both input and output flows. |
| 6 | + |
| 7 | +## Getting Access |
| 8 | + |
| 9 | +To sign up for Clavata or obtain an API key: |
| 10 | + |
| 11 | +- [Request access](https://www.clavata.ai/) through the website |
| 12 | +- Contact support at <[email protected]> |
| 13 | + |
| 14 | +## Setup |
| 15 | + |
| 16 | +1. Ensure you have access to the Clavata platform and have configured your content moderation policies. You'll need: |
| 17 | + - Your Clavata API key |
| 18 | + - Policy IDs for the content types you want to moderate |
| 19 | + - (Optional) A custom server endpoint if provided by Clavata.ai |
| 20 | + |
| 21 | +2. Set the `CLAVATA_API_KEY` environment variable with your Clavata API key: |
| 22 | + |
| 23 | + ```bash |
| 24 | + export CLAVATA_API_KEY="your-api-key" |
| 25 | + ``` |
| 26 | + |
| 27 | +3. Configure your `config.yml` according to the following example: |
| 28 | + |
| 29 | +```yaml |
| 30 | +rails: |
| 31 | + config: |
| 32 | + clavata: |
| 33 | + policies: |
| 34 | + Threats: 00000000-0000-0000-0000-000000000000 |
| 35 | + Toxicity: 00000000-0000-0000-0000-000000000000 |
| 36 | + label_match_logic: ALL # "ALL" | "ANY" |
| 37 | + input: |
| 38 | + # Reference an alias above in `policies` |
| 39 | + policy: Threats |
| 40 | + output: |
| 41 | + policy: Toxicity |
| 42 | + # Optional: Specify labels to require specific matches |
| 43 | + labels: |
| 44 | + - Hate Speech |
| 45 | + - Self-harm |
| 46 | + # Optional: Only provide this if you've been told to by Clavata.ai |
| 47 | + server_endpoint: "https://some-alt-endpoint.com" |
| 48 | + # Optional: reference the built-in flows |
| 49 | + input: |
| 50 | + flows: |
| 51 | + - clavata check input |
| 52 | + output: |
| 53 | + flows: |
| 54 | + - clavata check output |
| 55 | +``` |
| 56 | +
|
| 57 | +## Configuration Details |
| 58 | +
|
| 59 | +- `server_endpoint`: The Clavata API endpoint (only if provided by Clavata.ai) |
| 60 | +- `policies`: Map of policy aliases to each policy's unique ID in your Clavata.ai account |
| 61 | +- `label_match_logic`: (Optional) `ALL` requires all labels specified for a rail to match, `ANY` requires at least one match. Defaults to `ANY` if not set. |
| 62 | +- `input/output`: Flow-specific configurations |
| 63 | + - `policy`: The policy alias to use for this flow |
| 64 | + - `labels`: (Optional) List of specific labels to check for |
| 65 | + |
| 66 | +## Usage |
| 67 | + |
| 68 | +The Clavata integration provides two ways to implement content moderation: |
| 69 | + |
| 70 | +### 1. Built-in Flows |
| 71 | + |
| 72 | +#### For users of Colang 1.0 |
| 73 | + |
| 74 | +Add these flows to your configuration to automatically check content when using _Colang 1.0_: |
| 75 | + |
| 76 | +```yaml |
| 77 | +rails: |
| 78 | + input: |
| 79 | + flows: |
| 80 | + - clavata check input # Check user input |
| 81 | + output: |
| 82 | + flows: |
| 83 | + - clavata check output # Check LLM output |
| 84 | +``` |
| 85 | + |
| 86 | +#### For users of Colang 2.0 |
| 87 | + |
| 88 | +If you're using Colang 2.0, there's no need to specify configuration for input and output rails in your `config.yml`. In fact, doing so is now deprecated. The good news is that because Colang 2.0 supports flows with variables, you can specify which policy to use (and even which labels to match) inline in the definitions for any of your rails (i.e., input, output, dialog, etc.) |
| 89 | + |
| 90 | +Here's an example of how to configure an input rail to check against a specific Clavata policy: |
| 91 | + |
| 92 | +```colang |
| 93 | +import guardrails |
| 94 | +import nemoguardrails.library.clavata |
| 95 | +
|
| 96 | +
|
| 97 | +# Check the input against the "Toxicity" policy |
| 98 | +flow input rails $input_text |
| 99 | + clavata check for ($input_text, Toxicity) |
| 100 | +
|
| 101 | +# To make the check even more strict so it only matches particular labels in the policy, you can add a comma-separated list of labels at the end: |
| 102 | +flow input rails $input_text |
| 103 | + clavata check for ($input_text, Toxicity, ["Hate Speech","Harassment"]) |
| 104 | +``` |
| 105 | + |
| 106 | +> The same is true for `output` flows, of course. See [our example](../../../examples/configs/clavata_v2/rails.co) for more. |
| 107 | + |
| 108 | +### 2. Programmatic Usage |
| 109 | + |
| 110 | +If you are using colang 2.x, you can make use of the Clavata action in your own flows: |
| 111 | + |
| 112 | +```colang |
| 113 | +# Check content |
| 114 | +$is_match = await ClavataCheckAction(text=$some_text, policy=$some_policy_alias) |
| 115 | +``` |
| 116 | + |
| 117 | +The action returns `True` if the content matches the specified policy's criteria. |
| 118 | + |
| 119 | +## Customization |
| 120 | + |
| 121 | +You can customize the content moderation behavior by: |
| 122 | + |
| 123 | +1. Configuring different policies for input and output flows |
| 124 | +2. Specifying which labels must match within a policy |
| 125 | +3. Setting the label match logic to either "ALL" (all specified labels must match) or "ANY" (at least one label must match) |
| 126 | + |
| 127 | +## Error Handling |
| 128 | + |
| 129 | +If the Clavata API request fails, the system will raise a `ClavataPluginAPIError`. The integration will also raise a `ClavataPluginValueError` if there are configuration issues, such as: |
| 130 | + |
| 131 | +- Invalid policy aliases |
| 132 | +- Missing required configuration |
| 133 | +- Invalid flow types |
| 134 | + |
| 135 | +## Notes |
| 136 | + |
| 137 | +- Ensure that your Clavata API key is properly set up and accessible |
| 138 | +- The integration currently supports content moderation checks for input and output flows |
| 139 | +- You can configure different policies and label requirements for input and output flows |
| 140 | +- If no labels are specified for a policy, any label match will be considered a hit |
| 141 | + |
| 142 | +For more information on Clavata and its capabilities, please refer to the [Clavata documentation](https://clavata.helpscoutdocs.com). |
0 commit comments