Skip to content

Commit aedfb95

Browse files
authored
Merge pull request #926 from PrefectHQ/images
2 parents 8c0c083 + 30a4680 commit aedfb95

40 files changed

+875
-1446
lines changed

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -236,13 +236,13 @@ marvin.paint("a simple cup of coffee, still warm")
236236

237237
Learn more about image generation [here](https://askmarvin.ai/docs/images/generation).
238238

239-
## 🔍 Classify images (beta)
239+
## 🔍 Converting images to data
240240

241-
In addition to text, Marvin has beta support for captioning, classifying, transforming, and extracting entities from images using the GPT-4 vision model:
241+
In addition to text, Marvin has support for captioning, classifying, transforming, and extracting entities from images using the GPT-4 vision model:
242242

243243
```python
244-
marvin.beta.classify(
245-
marvin.beta.Image("docs/images/coffee.png"),
244+
marvin.classify(
245+
marvin.Image.from_path("docs/images/coffee.png"),
246246
labels=["drink", "food"],
247247
)
248248

cookbook/flows/insurance_claim.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ def build_damage_report_model(damages: list[DamagedPart]) -> type[M]:
5252

5353
@task(cache_key_fn=task_input_hash)
5454
def marvin_extract_damages_from_url(image_url: str) -> list[DamagedPart]:
55-
return marvin.beta.extract(
56-
data=marvin.beta.Image.from_url(image_url),
55+
return marvin.extract(
56+
data=marvin.Image.from_url(image_url),
5757
target=DamagedPart,
5858
instructions=(
5959
"Give extremely brief, high-level descriptions of the damage. Only include"

docs/api_reference/beta/vision.md

-4
This file was deleted.
656 KB
Loading
-51.9 KB
Binary file not shown.

docs/docs/video/recording.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ counter = 0
2626
for image in recorder.stream():
2727
counter += 1
2828
# process each image
29-
marvin.beta.caption(image)
29+
marvin.caption(image)
3030

3131
# stop recording
3232
if counter == 3:

docs/docs/vision/captioning.md

+38-9
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,6 @@
22

33
Marvin can use OpenAI's vision API to process images as inputs.
44

5-
!!! tip "Beta"
6-
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
7-
85
<div class="admonition abstract">
96
<p class="admonition-title">What it does</p>
107
<p>
@@ -18,19 +15,18 @@ Marvin can use OpenAI's vision API to process images as inputs.
1815

1916
Generate a description of the following image, hypothetically available at `/path/to/marvin.png`:
2017

21-
![](/assets/images/docs/vision/marvin.webp)
18+
![](/assets/images/docs/vision/marvin.png)
2219

2320

2421
```python
2522
import marvin
26-
from pathlib import Path
2723

28-
caption = marvin.beta.caption(image=Path('/path/to/marvin.png'))
24+
caption = marvin.caption(marvin.Image.from_path('/path/to/marvin.png'))
2925
```
3026

3127
!!! success "Result"
3228

33-
"This is a digital illustration featuring a stylized, cute character resembling a Funko Pop vinyl figure with large, shiny eyes and a square-shaped head, sitting on abstract wavy shapes that simulate a landscape. The whimsical figure is set against a dark background with sparkling, colorful bokeh effects, giving it a magical, dreamy atmosphere."
29+
"A cute, small robot with a square head and large, glowing eyes sits on a surface of wavy, colorful lines. The background is dark with scattered, glowing particles, creating a magical and futuristic atmosphere."
3430

3531

3632
<div class="admonition info">
@@ -41,6 +37,23 @@ Marvin can use OpenAI's vision API to process images as inputs.
4137
</div>
4238

4339

40+
## Providing instructions
41+
42+
The `instructions` parameter offers an additional layer of control, enabling more nuanced caption generation, especially in ambiguous or complex scenarios.
43+
44+
## Captions for multiple images
45+
46+
To generate a single caption for multiple images, pass a list of `Image` objects to `caption`:
47+
48+
```python
49+
marvin.caption(
50+
[
51+
marvin.Image.from_path('/path/to/img1.png'),
52+
marvin.Image.from_path('/path/to/img2.png')
53+
],
54+
instructions='...'
55+
)
56+
```
4457

4558

4659
## Model parameters
@@ -53,5 +66,21 @@ You can pass parameters to the underlying API via the `model_kwargs` argument of
5366
If you are using Marvin in an async environment, you can use `caption_async`:
5467

5568
```python
56-
caption = await marvin.beta.caption_async(image=Path('/path/to/marvin.png'))
57-
```
69+
caption = await marvin.caption_async(image=Path('/path/to/marvin.png'))
70+
```
71+
## Mapping
72+
73+
To generate individual captions for a list of inputs at once, use `.map`. Note that this is different than generating a single caption for multiple images, which is done by passing a list of `Image` objects to `caption`.
74+
75+
```python
76+
inputs = [
77+
marvin.Image.from_path('/path/to/img1.png'),
78+
marvin.Image.from_path('/path/to/img2.png')
79+
]
80+
result = marvin.caption.map(inputs)
81+
assert len(result) == 2
82+
```
83+
84+
(`marvin.cast_async.map` is also available for async environments.)
85+
86+
Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.

docs/docs/vision/classification.md

+7-11
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,6 @@
22

33
Marvin can use OpenAI's vision API to process images and classify them into categories.
44

5-
The `marvin.beta.classify` function is an enhanced version of `marvin.classify` that accepts images as well as text.
6-
7-
!!! tip "Beta"
8-
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
95

106
<div class="admonition abstract">
117
<p class="admonition-title">What it does</p>
@@ -36,14 +32,14 @@ The `marvin.beta.classify` function is an enhanced version of `marvin.classify`
3632
```python
3733
import marvin
3834

39-
img = marvin.beta.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')
35+
img = marvin.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')
4036

41-
animal = marvin.beta.classify(
37+
animal = marvin.classify(
4238
img,
4339
labels=['dog', 'cat', 'bird', 'fish', 'deer']
4440
)
4541

46-
dry_or_wet = marvin.beta.classify(
42+
dry_or_wet = marvin.classify(
4743
img,
4844
labels=['dry', 'wet'],
4945
instructions='Is the animal wet?'
@@ -60,15 +56,15 @@ The `marvin.beta.classify` function is an enhanced version of `marvin.classify`
6056

6157

6258
## Model parameters
63-
You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `classify`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
59+
You can pass parameters to the underlying API via the `model_kwargs` argument of `classify`. These parameters are passed directly to the API, so you can use any supported parameter.
6460

6561

6662
## Async support
6763

6864
If you are using Marvin in an async environment, you can use `classify_async`:
6965

7066
```python
71-
result = await marvin.beta.classify_async(
67+
result = await marvin.classify_async(
7268
"The app crashes when I try to upload a file.",
7369
labels=["bug", "feature request", "inquiry"]
7470
)
@@ -85,10 +81,10 @@ inputs = [
8581
"The app crashes when I try to upload a file.",
8682
"How do change my password?"
8783
]
88-
result = marvin.beta.classify.map(inputs, ["bug", "feature request", "inquiry"])
84+
result = marvin.classify.map(inputs, ["bug", "feature request", "inquiry"])
8985
assert result == ["bug", "inquiry"]
9086
```
9187

92-
(`marvin.beta.classify_async.map` is also available for async environments.)
88+
(`marvin.classify_async.map` is also available for async environments.)
9389

9490
Mapping automatically issues parallel requests to the API, making it a highly efficient way to classify multiple inputs at once. The result is a list of classifications in the same order as the inputs.

docs/docs/vision/extraction.md

+6-10
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,8 @@
22

33
Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.
44

5-
The `marvin.beta.extract` function is an enhanced version of `marvin.extract` that accepts images as well as text.
65

76

8-
!!! tip "Beta"
9-
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
10-
117
<div class="admonition abstract">
128
<p class="admonition-title">What it does</p>
139
<p>
@@ -37,11 +33,11 @@ The `marvin.beta.extract` function is an enhanced version of `marvin.extract` th
3733
```python
3834
import marvin
3935

40-
img = marvin.beta.Image(
36+
img = marvin.Image(
4137
"https://images.unsplash.com/photo-1548199973-03cce0bbc87b?",
4238
)
4339

44-
result = marvin.beta.extract(img, target=str, instructions="dog breeds")
40+
result = marvin.extract(img, target=str, instructions="dog breeds")
4541
```
4642

4743
!!! success "Result"
@@ -50,14 +46,14 @@ The `marvin.beta.extract` function is an enhanced version of `marvin.extract` th
5046
```
5147

5248
## Model parameters
53-
You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `extract`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
49+
You can pass parameters to the underlying API via the `model_kwargs` argument of `extract`. These parameters are passed directly to the API, so you can use any supported parameter.
5450

5551

5652
## Async support
5753
If you are using Marvin in an async environment, you can use `extract_async`:
5854

5955
```python
60-
result = await marvin.beta.extract_async(
56+
result = await marvin.extract_async(
6157
"I drove from New York to California.",
6258
target=str,
6359
instructions="2-letter state codes",
@@ -75,10 +71,10 @@ inputs = [
7571
"I drove from New York to California.",
7672
"I took a flight from NYC to BOS."
7773
]
78-
result = marvin.beta.extract.map(inputs, target=str, instructions="2-letter state codes")
74+
result = marvin.extract.map(inputs, target=str, instructions="2-letter state codes")
7975
assert result == [["NY", "CA"], ["NY", "MA"]]
8076
```
8177

82-
(`marvin.beta.extract_async.map` is also available for async environments.)
78+
(`marvin.extract_async.map` is also available for async environments.)
8379

8480
Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.

docs/docs/vision/transformation.md

+10-15
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,6 @@
22

33
Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.
44

5-
The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that accepts images as well as text.
6-
7-
!!! tip "Beta"
8-
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
95

106
<div class="admonition abstract">
117
<p class="admonition-title">What it does</p>
@@ -41,10 +37,10 @@ The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that acc
4137
state: str = Field(description="2-letter state abbreviation")
4238

4339

44-
img = marvin.beta.Image(
40+
img = marvin.Image(
4541
"https://images.unsplash.com/photo-1568515387631-8b650bbcdb90",
4642
)
47-
result = marvin.beta.cast(img, target=Location)
43+
result = marvin.cast(img, target=Location)
4844
```
4945

5046
!!! success "Result"
@@ -70,10 +66,10 @@ The `marvin.beta.cast` function is an enhanced version of `marvin.cast` that acc
7066
authors: list[str]
7167

7268

73-
img = marvin.beta.Image(
69+
img = marvin.Image(
7470
"https://hastie.su.domains/ElemStatLearn/CoverII_small.jpg",
7571
)
76-
result = marvin.beta.cast(img, target=Book)
72+
result = marvin.cast(img, target=Book)
7773
```
7874

7975
!!! success "Result"
@@ -101,8 +97,8 @@ If the target type isn't self-documenting, or you want to provide additional gui
10197

10298
shopping_list = ["bagels", "cabbage", "eggs", "apples", "oranges"]
10399

104-
missing_items = marvin.beta.cast(
105-
marvin.beta.Image("https://images.unsplash.com/photo-1588964895597-cfccd6e2dbf9"),
100+
missing_items = marvin.cast(
101+
marvin.Image("https://images.unsplash.com/photo-1588964895597-cfccd6e2dbf9"),
106102
target=list[str],
107103
instructions=f"Did I forget anything on my list: {shopping_list}?",
108104
)
@@ -113,15 +109,14 @@ If the target type isn't self-documenting, or you want to provide additional gui
113109
```python
114110
assert missing_items == ["eggs", "oranges"]
115111
```
116-
117112
## Model parameters
118-
You can pass parameters to the underlying API via the `model_kwargs` and `vision_model_kwargs` arguments of `cast`. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
113+
You can pass parameters to the underlying API via the `model_kwargs` argument of `cast`. These parameters are passed directly to the API, so you can use any supported parameter.
119114

120115
## Async support
121116
If you are using `marvin` in an async environment, you can use `cast_async`:
122117

123118
```python
124-
result = await marvin.beta.cast_async("one", int)
119+
result = await marvin.cast_async("one", int)
125120

126121
assert result == 1
127122
```
@@ -135,10 +130,10 @@ inputs = [
135130
"I bought two donuts.",
136131
"I bought six hot dogs."
137132
]
138-
result = marvin.beta.cast.map(inputs, int)
133+
result = marvin.cast.map(inputs, int)
139134
assert result == [2, 6]
140135
```
141136

142-
(`marvin.beta.cast_async.map` is also available for async environments.)
137+
(`marvin.cast_async.map` is also available for async environments.)
143138

144139
Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.

docs/examples/webcam_narration.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ By combining a few Marvin tools, you can quickly create a live narration of your
3131
3232
# if there are no more frames to process, generate a caption from the most recent 5
3333
if len(recorder) == 0:
34-
caption = marvin.beta.caption(
34+
caption = marvin.caption(
3535
frames[-5:],
3636
instructions=f"""
3737
You are a parody of a nature documentary narrator, creating an

docs/examples/xkcd_bird.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@
88
```python
99
import marvin
1010

11-
photo = marvin.beta.Image(
11+
photo = marvin.Image(
1212
"https://images.unsplash.com/photo-1613891188927-14c2774fb8d7",
1313
)
1414

15-
result = marvin.beta.classify(
15+
result = marvin.classify(
1616
photo,
1717
labels=["bird", "not bird"]
1818
)

docs/static/css/tailwind.css

+14-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
! tailwindcss v3.4.1 | MIT License | https://tailwindcss.com
2+
! tailwindcss v3.4.3 | MIT License | https://tailwindcss.com
33
*/
44

55
/*
@@ -211,6 +211,8 @@ textarea {
211211
/* 1 */
212212
line-height: inherit;
213213
/* 1 */
214+
letter-spacing: inherit;
215+
/* 1 */
214216
color: inherit;
215217
/* 1 */
216218
margin: 0;
@@ -234,9 +236,9 @@ select {
234236
*/
235237

236238
button,
237-
[type='button'],
238-
[type='reset'],
239-
[type='submit'] {
239+
input:where([type='button']),
240+
input:where([type='reset']),
241+
input:where([type='submit']) {
240242
-webkit-appearance: button;
241243
/* 1 */
242244
background-color: transparent;
@@ -492,6 +494,10 @@ video {
492494
--tw-backdrop-opacity: ;
493495
--tw-backdrop-saturate: ;
494496
--tw-backdrop-sepia: ;
497+
--tw-contain-size: ;
498+
--tw-contain-layout: ;
499+
--tw-contain-paint: ;
500+
--tw-contain-style: ;
495501
}
496502

497503
::backdrop {
@@ -542,6 +548,10 @@ video {
542548
--tw-backdrop-opacity: ;
543549
--tw-backdrop-saturate: ;
544550
--tw-backdrop-sepia: ;
551+
--tw-contain-size: ;
552+
--tw-contain-layout: ;
553+
--tw-contain-paint: ;
554+
--tw-contain-style: ;
545555
}
546556

547557
.absolute {

0 commit comments

Comments
 (0)