Skip to content

Automatic image blob creation doesn't handle RGBA images with JPEG. #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FrostyTheSouthernSnowman opened this issue Jan 2, 2024 · 7 comments · Fixed by #374
Closed
Labels
component:python sdk Issue/PR related to Python SDK status:awaiting user response Awaiting a response from the author status:stale Issue/PR will be closed automatically if there's no further activity type:bug Something isn't working

Comments

@FrostyTheSouthernSnowman

Description of the bug:

Calling generate_content on a Gemini Pro Vision model returns an error when it receives a PNG image saying KeyError: 'RGBA' which causes another execption saying OSError: cannot write mode RGBA as JPEG. This seems to indicate that PNG is not supported, but according to the Gemini API docs, PNG is a supported MIME type. Note that the png example from that docs page doesn't seem to work. It uses a contents kwarg to generate_content, but that argument doesn't exist. Modifying the code to use the right arguments gives the error google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.

Actual vs expected behavior:

The expected behavior is for this code:

screenshot = get_screen_data()

prompt = "What are your thoughts on this screenshot? I think"

response = model.generate_content(
    [prompt, screenshot], stream=True
)

response.resolve()

print(response.text)

to work successfully. This code was modified from the text from image and text example in the quickstart. Instead, it outputs the KeyError and OSError above. Changing the code to:

screenshot = get_screen_data()

screenshot_data = {
    'mime_type': 'image/png',
    'data': screenshot.tobytes()
}

prompt = "What are your thoughts on this screenshot? I think"

response = model.generate_content(
    [prompt, screenshot_data], stream=True
)

response.resolve()
print(response.text)

Raises a 400 error as described above. This code is modified from that Gemini API Overview

Any other information you'd like to share?

#112 is related to this. Specifically, it deals with my second attempt at solving this problem. This issue is about the fact that generate_content doesn't handle PNG by default even though it is supposedly supported.

@FrostyTheSouthernSnowman FrostyTheSouthernSnowman added component:python sdk Issue/PR related to Python SDK type:bug Something isn't working labels Jan 2, 2024
@Andy963
Copy link
Contributor

Andy963 commented Mar 14, 2024

it seems that the code in Gemini API Overview is not correct,

model = genai.GenerativeModel('gemini-pro-vision')

cookie_picture = [{
    'mime_type': 'image/png',
    'data': Path('cookie.png').read_bytes()
}]
prompt = "Do these look store-bought or homemade?"

response = model.generate_content(
    model="gemini-pro-vision", # parameter model is no need here
    content=[prompt, cookie_picture]
)
print(response.text)

@FrostyTheSouthernSnowman
Copy link
Author

Definitely seems to be the case

@MarkDaoust
Copy link
Collaborator

In my tests PNG is working fine.

IDK what your screenshot = get_screen_data() function is.

Can you share a colab that reproducs es the problem?

it seems that the code in Gemini API Overview is not correct,

Thanks, I'm sending a fix for this.

@ya-stack
Copy link

Hi, I am trying to read image from https: URL, but it seems to be not working, it's showing below error:
ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 Add an image to use models/gemini-pro-vision, or switch your model to a text model.

@FrostyTheSouthernSnowman
Copy link
Author

In my tests PNG is working fine.

IDK what your screenshot = get_screen_data() function is.

Can you share a colab that reproducs es the problem?

it seems that the code in Gemini API Overview is not correct,

Thanks, I'm sending a fix for this.

Here's the get_screen_data():

    screen = ImageGrab.grab(bbox=(0, 0, *primary_monitor_dimensions))

    screen = draw_mouse(screen)

    screen = screen.resize((int(screen.size[0] / 2), int(screen.size[1] / 2)))

    if save_screenshot:
        screen.save('screen.png')

    return screen```

ImageGrab comes from PIL.

Copy link

github-actions bot commented Jun 2, 2024

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

@github-actions github-actions bot added the status:stale Issue/PR will be closed automatically if there's no further activity label Jun 2, 2024
@MarkDaoust MarkDaoust changed the title Gemini Pro Vision generate_content doesn't handle PNG by default Automatic image blob creation doesn't handle RGBA images with JPEG. Jun 3, 2024
@MarkDaoust
Copy link
Collaborator

This is caused because the code generates the bytes to send tries to create a JPEG file, but the image is RGBA.
Adding a .convert('RGB') before saving it fixes this.

In [13]: model = genai.GenerativeModel(model_name='gemini-pro-vision')

In [14]: model.generate_content([img2, "what's this"])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/Projects/venv3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:650, in _save(im, fp, filename)
    649 try:
--> 650     rawmode = RAWMODE[im.mode]
    651 except KeyError as e:

KeyError: 'RGBA'

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[14], line 1
----> 1 model.generate_content([img2, "what's this"])

File ~/Projects/generative-ai-python/google/generativeai/generative_models.py:236, in GenerativeModel.generate_content(self, contents, generation_config, safety_settings, stream, tools, tool_config, request_options)
    233 if not contents:
    234     raise TypeError("contents must not be empty")
--> 236 request = self._prepare_request(
    237     contents=contents,
    238     generation_config=generation_config,
    239     safety_settings=safety_settings,
    240     tools=tools,
    241     tool_config=tool_config,
    242 )
    243 if self._client is None:
    244     self._client = client.get_default_generative_client()

File ~/Projects/generative-ai-python/google/generativeai/generative_models.py:139, in GenerativeModel._prepare_request(self, contents, generation_config, safety_settings, tools, tool_config)
    136 else:
    137     tool_config = content_types.to_tool_config(tool_config)
--> 139 contents = content_types.to_contents(contents)
    141 generation_config = generation_types.to_generation_config_dict(generation_config)
    142 merged_gc = self._generation_config.copy()

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:293, in to_contents(contents)
    288     except TypeError:
    289         # If you get a TypeError here it's probably because that was a list
    290         # of parts, not a list of contents, so fall back to `to_content`.
    291         pass
--> 293 contents = [to_content(contents)]
    294 return contents

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:256, in to_content(content)
    254     return content
    255 elif isinstance(content, Iterable) and not isinstance(content, str):
--> 256     return protos.Content(parts=[to_part(part) for part in content])
    257 else:
    258     # Maybe this is a Part?
    259     return protos.Content(parts=[to_part(content)])

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:256, in <listcomp>(.0)
    254     return content
    255 elif isinstance(content, Iterable) and not isinstance(content, str):
--> 256     return protos.Content(parts=[to_part(part) for part in content])
    257 else:
    258     # Maybe this is a Part?
    259     return protos.Content(parts=[to_part(content)])

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:224, in to_part(part)
    220     return protos.Part(function_response=part)
    222 else:
    223     # Maybe it can be turned into a blob?
--> 224     return protos.Part(inline_data=to_blob(part))

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:164, in to_blob(blob)
    162     return blob
    163 elif isinstance(blob, IMAGE_TYPES):
--> 164     return image_to_blob(blob)
    165 else:
    166     if isinstance(blob, Mapping):

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:89, in image_to_blob(image)
     87 if PIL is not None:
     88     if isinstance(image, PIL.Image.Image):
---> 89         return pil_to_blob(image)
     91 if IPython is not None:
     92     if isinstance(image, IPython.display.Image):

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:79, in pil_to_blob(img)
     77     mime_type = "image/png"
     78 else:
---> 79     img.save(bytesio, format="JPEG")
     80     mime_type = "image/jpeg"
     81 bytesio.seek(0)

File ~/Projects/venv3/lib/python3.11/site-packages/PIL/Image.py:2439, in Image.save(self, fp, format, **params)
   2436         fp = builtins.open(filename, "w+b")
   2438 try:
-> 2439     save_handler(self, fp, filename)
   2440 except Exception:
   2441     if open_fp:

File ~/Projects/venv3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:653, in _save(im, fp, filename)
    651 except KeyError as e:
    652     msg = f"cannot write mode {im.mode} as JPEG"
--> 653     raise OSError(msg) from e
    655 info = im.encoderinfo
    657 dpi = [round(x) for x in info.get("dpi", (0, 0))]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:python sdk Issue/PR related to Python SDK status:awaiting user response Awaiting a response from the author status:stale Issue/PR will be closed automatically if there's no further activity type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants