Multimodal (#157)

atroyn · web-flow · commit ab3d05b922b6 · 2023-11-06T21:33:17.000-08:00
* Docs and Ordering

* Migration notes, typo
diff --git a/docs/about.md b/docs/about.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 15
 ---
 
 # 👽 About
diff --git a/docs/api-reference.md b/docs/api-reference.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 6
 title: "📖 API Cheatsheet"
 ---
 
diff --git a/docs/contributing.md b/docs/contributing.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 10
+sidebar_position: 14
 title: "🍻 Contributing"
 ---
 
diff --git a/docs/deployment.md b/docs/deployment.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 8
+sidebar_position: 9
 title: "☁️ Deployment"
 ---
 
diff --git a/docs/integrations.md b/docs/integrations.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 7
+sidebar_position: 8
 ---
 
 # 🔌 Integrations
diff --git a/docs/migration.md b/docs/migration.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 7
+sidebar_position: 10
 ---
 
 # ✈️ Migration
@@ -22,6 +22,38 @@ We will aim to provide:
 
 ## Migration Log
 
+### Migration to 0.4.16 - November 7, 2023
+
+This release adds support for multi-modal embeddings, with an accompanying change to the definitions of `EmbeddingFunction`.
+This change mainly affects users who have implemented their own `EmbeddingFunction` classes. If you are using Chroma's built-in embedding functions, you do not need to take any action.
+
+**EmbeddingFunction**
+
+Previously, `EmbeddingFunction`s were defined as:
+
+```python
+class EmbeddingFunction(Protocol):
+    def __call__(self, texts: Documents) -> Embeddings:
+        ...
+```
+
+After this update, `EmbeddingFunction`s are defined as:
+
+```python
+Embeddable = Union[Documents, Images]
+D = TypeVar("D", bound=Embeddable, contravariant=True)
+
+class EmbeddingFunction(Protocol[D]):
+    def __call__(self, input: D) -> Embeddings:
+        ...
+```
+
+The key differences are:
+- `EmbeddingFunction` is now generic, and takes a type parameter `D` which is a subtype of `Embeddable`. This allows us to define `EmbeddingFunction`s which can embed multiple modalities.
+- `__call__` now takes a single argument, `input`, to support data of any type `D`. The `texts` argument has been removed.
+
+
+
 ### Migration from >0.4.0 to 0.4.0 - July 17, 2023
 
 What's new in this version?
diff --git a/docs/multi-modal.md b/docs/multi-modal.md
@@ -0,0 +1,151 @@
+---
+sidebar_position: 5
+title: "🖼️ Multi-modal"
+---
+
+# 🖼️ Multi-modal
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<div class="select-language">Select a language</div>
+
+<Tabs queryString groupId="lang">
+<TabItem value="py" label="Python"></TabItem>
+<TabItem value="js" label="JavaScript"></TabItem>
+</Tabs>
+
+---
+
+<Tabs queryString groupId="lang" className="hideTabSwitcher">
+<TabItem value="py" label="Python">
+
+Chroma supports multimodal collections, i.e. collections which can store, and can be queried by, multiple modalities of data.
+
+## Multi-modal Embedding Functions
+
+Chroma supports multi-modal embedding functions, which can be used to embed data from multiple modalities into a single embedding space.
+
+Chroma has the OpenCLIP embedding function built in, which supports both text and images. 
+
+```python
+from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
+embedding_function = OpenCLIPEmbeddingFunction()
+```
+
+## Data Loaders
+
+Chroma supports data loaders, for storing and querying with data stored outside Chroma itself, via URI. Chroma will not store this data, but will instead store the URI, and load the data from the URI when needed.
+
+Chroma has an data loader for loading images from a filesystem built in.
+
+```python
+from chromadb.utils.data_loaders import ImageDataLoader
+data_loader = ImageDataLoader()
+```
+
+## Multi-modal Collections
+
+You can create a multi-modal collection by passing in a multi-modal embedding function. In order to load data from a URI, you must also pass in a data loader.
+
+```python
+import chromadb
+
+client = chromadb.Client()
+
+collection = client.create_collection(
+    name='multimodal_collection', 
+    embedding_function=embedding_function, 
+    data_loader=image_loader)
+
+```
+
+### Adding data 
+
+You can add data to a multi-modal collection by specifying the data modality. For now, images are supported:
+
+```python
+collection.add(
+    ids=['id1', 'id2', 'id3'],
+    images=[...] # A list of numpy arrays representing images
+)
+```
+
+Note that Chroma will not store the data for you, and you will have to maintain a mapping from IDs to data yourself. 
+
+However, you can use Chroma in combination with data stored elsewhere, by adding it via URI. Note that this requires that you have specified a data loader when creating the collection.
+
+```python
+collection.add(
+    ids=['id1', 'id2', 'id3'],
+    uris=[...] #  A list of strings representing URIs to data
+)
+```
+
+Since the embedding function is multi-modal, you can also add text to the same collection:
+
+```python
+collection.add(
+    ids=['id4', 'id5', 'id6'],
+    texts=["This is a document", "This is another document", "This is a third document"]
+)
+```
+
+### Querying
+
+You can query a multi-modal collection with any of the modalities that it supports. For example, you can query with images:
+
+```python
+results = collection.query(
+    query_images=[...] # A list of numpy arrays representing images
+)
+```
+
+Or with text:
+
+```python
+results = collection.query(
+    query_texts=["This is a query document", "This is another query document"]
+)
+```
+
+If a data loader is set for the collection, you can also query with URIs which reference data stored elsewhere of the supported modalities:
+
+```python
+results = collection.query(
+    query_uris=[...] # A list of strings representing URIs to data
+)
+```
+
+Additionally, if a data loader is set for the collection, and URIs are available, you can include the data in the results:
+
+```python
+results = collection.query(
+    query_images=[...], # # list of numpy arrays representing images
+    includes=['data']
+)
+```
+
+This will automatically call the data loader for any available URIs, and include the data in the results. `uris` are also available as an `includes` field. 
+
+### Updating
+
+You can update a multi-modal collection by specifying the data modality, in the same way as `add`. For now, images are supported:
+
+```python
+collection.update(
+    ids=['id1', 'id2', 'id3'],
+    images=[...] # A list of numpy arrays representing images
+)
+```
+
+Note that a given entry with a specific ID can only have one associated modality at a time. Updates will over-write the existing modality, so for example, an entry which originally has corresponding text and updated with an image, will no longer have that text after an update with images.
+
+</TabItem>
+<TabItem value="js" label="JavaScript">
+
+Support for multi-modal retrieval for Chroma's JavaScript client is coming soon! 
+
+</TabItem>
+
+</Tabs>
diff --git a/docs/observability.md b/docs/observability.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 8
+sidebar_position: 11
 title: "👀 Observability"
 ---
 
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 9
+sidebar_position: 13
 title: "🛣️ Roadmap"
 ---
 
diff --git a/docs/telemetry.md b/docs/telemetry.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 8
+sidebar_position: 12
 title: "📏 Telemetry"
 ---
 
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 10
+sidebar_position: 7
 title: "🔍 Troubleshooting"
 ---