You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initial refactor
* move BasePipeline to a new file
* test fix
* anothe test fix
* fix import
* revert
* initial refactor
* add tests for BasePipeline
* move BasePipeline to a new file
* initial refactor
* update test; finish off initial refactoring changes post local testing
* initial commit for clip zero-shot
* add basic structure for text branch and zeroshot
* add schema details
* update pipelines after running mock engine tests
* add zeroshot tests
* rebase fix
* clean-up comments; add note about onnx export issue
* move paths to fixtures
* rebase fix
* rebase fix
* refactor pipelines to separate visual, text, and zeroshot. also add pytest skips until model issues are resolved
* fix rebase
* initial refactor
* move BasePipeline to a new file
* initial refactor
* move BasePipeline to a new file
* initial refactor
* rebase fix
* move paths to fixtures
* initial refactor
* initial caption functionality
* debugging
* more debugging
* post debugging code
* fix imports
* cleanup post model fix
* fix variable names, some clean-up
* remove image embs loading
* update dimensions
* rebase
* remove extra param
* remove typo
* update README instructions; fix linalg import
* clean-up pipelines, updatetyping and descriptions
* rebase fix
* expose pipeline engine args
Copy file name to clipboardExpand all lines: src/deepsparse/clip/README.md
+54-5
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,7 @@ DeepSparse allows inference on [CLIP](https://github.com/mlfoundations/open_clip
4
4
5
5
The CLIP integration currently supports the following task:
6
6
-**Zero-shot Image Classification** - Classifying images given possible classes
7
+
-**Caption Generation** - Generate a caption given an image
7
8
8
9
## Getting Started
9
10
@@ -13,24 +14,38 @@ Before you start your adventure with the DeepSparse Engine, make sure that your
13
14
```pip install deepsparse[clip]```
14
15
15
16
### Model Format
16
-
By default, to deploy CLIP models using the DeepSparse Engine, it is required to supply the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment. To see examples of pulling CLIP models and exporting them to ONNX, please see the [sparseml documentation](https://github.com/neuralmagic/sparseml/tree/main/integrations/clip). For the Zero-shot image classification workflow, two ONNX models are required, a visual model for CLIP's visual branch, and a text model for CLIP's text branch. Both of these model should be produced through the sparseml integration linked above.
17
+
By default, to deploy CLIP models using the DeepSparse Engine, it is required to supply the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment. To see examples of pulling CLIP models and exporting them to ONNX, please see the [sparseml documentation](https://github.com/neuralmagic/sparseml/tree/main/integrations/clip).
18
+
19
+
For the Zero-shot image classification workflow, two ONNX models are required, a visual model for CLIP's visual branch, and a text model for CLIP's text branch. Both of these models can be produced through the sparseml integration linked above. For caption generation, specific models called CoCa models are required and instructions on how to export CoCa models are also provided in the sparseml documentation above. The CoCa exporting pathway will generate one additional decoder model, along with the text and visual models.
17
20
18
21
### Deployment examples:
19
-
The following example uses pipelines to run the CLIP models for inference. As input, the pipeline ingests a list of images and a list of possible classes. A class is returned for each of the provided images.
22
+
The following example uses pipelines to run the CLIP models for inference. For Zero-shot prediction, the pipeline ingests a list of images and a list of possible classes. A class is returned for each of the provided images. For caption generation, only an image file is required.
20
23
21
24
If you don't have images ready, pull down the sample images using the following commands:
This will pull down 3 images, a happy dog, St.Peter's basilica, and two elephants.
30
45
31
46
#### Zero-shot Prediction
32
47
33
-
Let's run an example to clasify the images. We'll provide the images in a list with their file names as well as a list of possible classes. We'll also provide paths to the exported ONNX models.
48
+
Let's run an example to clasify the images. We'll provide the images in a list with their file names as well as a list of possible classes. We'll also provide paths to the exported ONNX models under the `zeroshot_research` root folder.
34
49
35
50
```python
36
51
import numpy as np
@@ -43,7 +58,7 @@ from deepsparse.clip import (
43
58
)
44
59
45
60
possible_classes = ["ice cream", "an elephant", "a dog", "a building", "a church"]
@@ -72,4 +87,38 @@ DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230727 C
72
87
73
88
Image basilica.jpg is a picture of a church
74
89
Image buddy.jpeg is a picture of a dog
90
+
Image thailand.jpg is a picture of an elephant
91
+
```
92
+
93
+
#### Caption Generation
94
+
Let's try a caption generation example. We'll leverage the `thailand.jpg` file that was pulled down earlier. We'll also provide the 3 exported CoCa ONNX models under the `caption_models` folder.
95
+
96
+
```python
97
+
from deepsparse import BasePipeline
98
+
from deepsparse.clip import CLIPCaptionInput, CLIPVisualInput
0 commit comments