Adding ShieldGemma 2 notebook to Responsible AI Toolkit docs (#564)

RyanMullins · web-flow · commit c4bff62ae125 · 2025-03-24T08:48:42.000-07:00
* Adding ShieldGemma 2 notebook to Responsible AI Toolkit docs

* Addressing review feedback

* fixing linter errors
diff --git a/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb b/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb
@@ -0,0 +1,243 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cLCmbOz_5tWH"
+      },
+      "source": [
+        "##### Copyright 2025 Google LLC"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "vdPaBz5y5LHW"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "# https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "3Zd1278P5wt_"
+      },
+      "source": [
+        "# Evaluating content safety with ShieldGemma 2 and Hugging Face Transformers"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "2b40722aa1a9"
+      },
+      "source": [
+        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://ai.google.dev/responsible/docs/safeguards/shieldgemma2_on_huggingface\"><img src=\"https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png\" height=\"32\" width=\"32\" />View on ai.google.dev</a>\n",
+        "  </td>\n",
+        "    <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/google/generative-ai-docs/blob/main/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4IlgEYUj7xdW"
+      },
+      "source": [
+        "The **ShieldGemma 2** model is trained to detect key harms detailed in the [model card](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2). This guide demonstrates how to use Hugging Face Transformers to build robust data and models.\n",
+        "\n",
+        "Note that `ShieldGemma 2` is trained to classify only one harm type at a time, so you will need to make a separate call to `ShieldGemma 2` for each harm type you want to check against. You may have additional that you can use model tuning techniques on `ShieldGemma 2`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RhlnMQoK9fZG"
+      },
+      "source": [
+        "# Supported safety checks\n",
+        "\n",
+        "**ShieldGemma2** is a model trained on Gemma 3's 4B IT checkpoint and is trained to detect and predict violations of key harm types listed below:\n",
+        "\n",
+        "* **Dangerous Content**:  The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).\n",
+        "\n",
+        "* **Sexually Explicit**: The image shall not contain content that depicts explicit or graphic sexual acts (e.g., pornography, erotic nudity, depictions of rape or sexual assault).\n",
+        "\n",
+        "* **Violence/Gore**: The image shall not contain content that depicts shocking, sensational, or gratuitous violence (e.g., excessive blood and gore, gratuitous violence against animals, extreme injury or moment of death).\n",
+        "\n",
+        "This serves as a foundation, but users can provide customized safety policies as input to the model, allowing for fine-grained control and specific use-case requirements."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "t3aq-ToeAmRM"
+      },
+      "source": [
+        "# Supported Use Case\n",
+        "\n",
+        "ShieldGemma 2 is should be used as an input filter to vision language models or as an output filter of image generation systems or both.**  ShieldGemma 2 offers the following key advantages:\n",
+        "\n",
+        "* **Policy-Aware Classification**: ShieldGemma 2 accepts both a user-defined safety policy and an image as input, providing classifications for both real and generated images, tailored to the specific policy guidelines.\n",
+        "* **Probability-Based Output and Thresholding**: ShieldGemma 2 outputs a probability score for its predictions, allowing downstream users to flexibly tune the classification threshold based on their specific use cases and risk tolerance. This enables a more nuanced and adaptable approach to safety classification.\n",
+        "\n",
+        "The input/output format are as follows:\n",
+        "* **Input**: Image + Prompt Instruction with policy definition\n",
+        "* **Output**: Probability of 'Yes'/'No' tokens, 'Yes' meaning that the image violated the specific policy. The higher the score for the 'Yes' token, the higher the model's confidence that the image violates the specified policy."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0WhRozADVJos"
+      },
+      "source": [
+        "# Usage example"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "K_XERopLUZhk"
+      },
+      "outputs": [],
+      "source": [
+        "# @title install Hugging Face Transformers v4.50+\n",
+        "! pip install -q 'transformers>=4.50.0'"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Qg-Hy0ffbwvE"
+      },
+      "outputs": [],
+      "source": [
+        "# @title Authenticate with Hugging Face Hub\n",
+        "# @markdown ShieldGemma is a gated model. To access the weights, you must accept\n",
+        "# @markdown the license on Hugging Face Hub under your account and then provide\n",
+        "# @markdown an [Access Token](https://huggingface.co/docs/hub/en/security-tokens)\n",
+        "# @markdown to authenticate with the Hugging Face Hub API. If using Colab, the\n",
+        "# @markdown easiest way to do this is by creating a read-only token specifically\n",
+        "# @markdown for Colab and setting this as the value of the `HF_TOKEN` secret;\n",
+        "# @markdown this token will then be reusable across all Colab notebooks. Other\n",
+        "# @markdown Python notebook platforms may provide a similar mechanism. For those\n",
+        "# @markdown that do not, un-comment the lines in this cell to install the\n",
+        "# @markdown Hugging Face Hub CLI and log in interactively.\n",
+        "# ! pip install -q 'huggingface_hub[cli]'\n",
+        "# ! huggingface-cli login"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "40Rm46Xt7wqW"
+      },
+      "outputs": [],
+      "source": [
+        "from transformers import AutoProcessor, AutoModelForImageClassification\n",
+        "import torch\n",
+        "\n",
+        "model_id = \"google/shieldgemma-2-4b-it\"\n",
+        "\n",
+        "processor = AutoProcessor.from_pretrained(model_id)\n",
+        "model = AutoModelForImageClassification.from_pretrained(model_id)\n",
+        "model.to(torch.device(\"cuda\"))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "a436de5a4e95"
+      },
+      "outputs": [],
+      "source": [
+        "from PIL import Image\n",
+        "import requests\n",
+        "\n",
+        "# The image included in this Colab is benign and will not violate any of\n",
+        "# ShieldGemma's built-in content policies. Change this URL or otherwise update\n",
+        "# this code to use an image that may be violative.\n",
+        "url = \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg\"\n",
+        "image = Image.open(requests.get(url, stream=True).raw)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "AK1PrHnYz4fv"
+      },
+      "outputs": [],
+      "source": [
+        "inputs = processor(images=[image], return_tensors=\"pt\").to(torch.device(\"cuda\"))\n",
+        "\n",
+        "with torch.no_grad():\n",
+        "  scores = model(**inputs)\n",
+        "\n",
+        "# `scores` is a `ShieldGemma2ImageClassifierOutputWithNoAttention` instance\n",
+        "# continaing the logits and probabilities associated with the model predicting\n",
+        "# the `Yes` or `No` tokens as the response to the prompt batch, captured in the\n",
+        "# following properties.\n",
+        "#\n",
+        "#   *   `logits` (`torch.Tensor` of shape `(batch_size, 2)`): The first position\n",
+        "#       along dim=1 is the logits for the `Yes` token and the second position\n",
+        "#       along dim=1 is the logits for the `No` token.\n",
+        "#   *   `probabilities` (`torch.Tensor` of shape `(batch_size, 2)`): The first\n",
+        "#       position along dim=1 is the probability of predicting the `Yes` token\n",
+        "#       and the second position along dim=1 is the probability of predicting the\n",
+        "#       `No` token.\n",
+        "#\n",
+        "# When used with the `ShieldGemma2Processor`, the `batch_size` will be equal to\n",
+        "# `len(images) * len(policies)`, and the order within the batch will be\n",
+        "# img1_policy1, ... img1_policyN, ... imgM_policyN.\n",
+        "print(scores.logits)\n",
+        "print(scores.probabilities)\n",
+        "\n",
+        "# ShieldGemma prompts are constructed such that predicting the `Yes` token means\n",
+        "# the content violates the policy. If you are only interested in the violative\n",
+        "# condition, you can extract only that slice from the output tensors.\n",
+        "p_violated = scores.probabilities[:, 0]\n",
+        "print(p_violated)\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "name": "shieldgemma2_on_huggingface.ipynb",
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}