Skip to content

Jhakulin/image input for assistants #40410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 15, 2025
7 changes: 4 additions & 3 deletions sdk/ai/azure-ai-projects/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@

### Features added
* Utilities to load prompt template strings and Prompty file content
* Add BingCustomSearchTool class with sample
* Add list_threads API to agents namespace
* Added BingCustomSearchTool class with sample
* Added list_threads API to agents namespace
* Added image input support for agents create_message

### Sample updates
* Added `project_client.agents.enable_auto_function_calls(toolset=toolset)` to all samples that has `toolcalls` executed by `azure-ai-project` SDK
* New BingCustomSearchTool sample
* Add list_threads usage to agent basics sample
* New samples added for image input from url, file and base64

### Bugs Fixed

Expand Down
84 changes: 83 additions & 1 deletion sdk/ai/azure-ai-projects/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ To report an issue with the client library, or request additional features, plea
- [Create message](#create-message) with:
- [File search attachment](#create-message-with-file-search-attachment)
- [Code interpreter attachment](#create-message-with-code-interpreter-attachment)
- [Create Message with Image Inputs](#create-message-with-image-inputs)
- [Execute Run, Run_and_Process, or Stream](#create-run-run_and_process-or-stream)
- [Retrieve message](#retrieve-message)
- [Retrieve file](#retrieve-file)
Expand Down Expand Up @@ -609,7 +610,6 @@ agent = project_client.agents.create_agent(

Currently, the Azure Function integration for the AI Agent has the following limitations:

- Azure Functions integration is available **only for non-streaming scenarios**.
- Supported trigger for Azure Function is currently limited to **Queue triggers** only.
HTTP or other trigger types and streaming responses are not supported at this time.

Expand Down Expand Up @@ -985,6 +985,88 @@ message = project_client.agents.create_message(

<!-- END SNIPPET -->

#### Create Message with Image Inputs

You can send messages to Azure agents with image inputs in following ways:

- **Using an image stored as a uploaded file**
- **Using a public image accessible via URL**
- **Using a base64 encoded image string**

The following examples demonstrate each method:

##### Create message using uploaded image file

```python
# Upload the local image file
image_file = project_client.agents.upload_file_and_poll(file_path="image_file.png", purpose="assistants")

# Construct content using uploaded image
file_param = MessageImageFileParam(file_id=image_file.id, detail="high")
content_blocks = [
MessageInputTextBlock(text="Hello, what is in the image?"),
MessageInputImageFileBlock(image_file=file_param),
]

# Create the message
message = project_client.agents.create_message(
thread_id=thread.id,
role="user",
content=content_blocks
)
```

##### Create message with an image URL input

```python
# Specify the public image URL
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

# Create content directly referencing image URL
url_param = MessageImageUrlParam(url=image_url, detail="high")
content_blocks = [
MessageInputTextBlock(text="Hello, what is in the image?"),
MessageInputImageUrlBlock(image_url=url_param),
]

# Create the message
message = project_client.agents.create_message(
thread_id=thread.id,
role="user",
content=content_blocks
)
```

##### Create message with base64-encoded image input

```python
import base64

def image_file_to_base64(path: str) -> str:
with open(path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")

# Convert your image file to base64 format
image_base64 = image_file_to_base64("image_file.png")

# Prepare the data URL
img_data_url = f"data:image/png;base64,{image_base64}"

# Use base64 encoded string as image URL parameter
url_param = MessageImageUrlParam(url=img_data_url, detail="high")
content_blocks = [
MessageInputTextBlock(text="Hello, what is in the image?"),
MessageInputImageUrlBlock(image_url=url_param),
]

# Create the message
message = project_client.agents.create_message(
thread_id=thread.id,
role="user",
content=content_blocks
)
```

#### Create Run, Run_and_Process, or Stream

To process your message, you can use `create_run`, `create_and_process_run`, or `create_stream`.
Expand Down
12 changes: 10 additions & 2 deletions sdk/ai/azure-ai-projects/apiview-properties.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
"azure.ai.projects.models.BingGroundingToolDefinition": "Azure.AI.Projects.Agents.BingGroundingToolDefinition",
"azure.ai.projects.models.CodeInterpreterToolDefinition": "Azure.AI.Projects.Agents.CodeInterpreterToolDefinition",
"azure.ai.projects.models.CodeInterpreterToolResource": "Azure.AI.Projects.Agents.CodeInterpreterToolResource",
"azure.ai.projects.models.ConnectedAgentDetails": "Azure.AI.Projects.Agents.ConnectedAgentDetails",
"azure.ai.projects.models.ConnectedAgentToolDefinition": "Azure.AI.Projects.Agents.ConnectedAgentToolDefinition",
"azure.ai.projects.models.Trigger": "Azure.AI.Projects.Trigger",
"azure.ai.projects.models.CronTrigger": "Azure.AI.Projects.CronTrigger",
"azure.ai.projects.models.Dataset": "Azure.AI.Projects.Dataset",
Expand Down Expand Up @@ -60,7 +62,13 @@
"azure.ai.projects.models.MessageDeltaTextUrlCitationDetails": "Azure.AI.Projects.Agents.MessageDeltaTextUrlCitationDetails",
"azure.ai.projects.models.MessageImageFileContent": "Azure.AI.Projects.Agents.MessageImageFileContent",
"azure.ai.projects.models.MessageImageFileDetails": "Azure.AI.Projects.Agents.MessageImageFileDetails",
"azure.ai.projects.models.MessageImageFileParam": "Azure.AI.Projects.Agents.MessageImageFileParam",
"azure.ai.projects.models.MessageImageUrlParam": "Azure.AI.Projects.Agents.MessageImageUrlParam",
"azure.ai.projects.models.MessageIncompleteDetails": "Azure.AI.Projects.Agents.MessageIncompleteDetails",
"azure.ai.projects.models.MessageInputContentBlock": "Azure.AI.Projects.Agents.MessageInputContentBlock",
"azure.ai.projects.models.MessageInputImageFileBlock": "Azure.AI.Projects.Agents.MessageInputImageFileBlock",
"azure.ai.projects.models.MessageInputImageUrlBlock": "Azure.AI.Projects.Agents.MessageInputImageUrlBlock",
"azure.ai.projects.models.MessageInputTextBlock": "Azure.AI.Projects.Agents.MessageInputTextBlock",
"azure.ai.projects.models.MessageTextAnnotation": "Azure.AI.Projects.Agents.MessageTextAnnotation",
"azure.ai.projects.models.MessageTextContent": "Azure.AI.Projects.Agents.MessageTextContent",
"azure.ai.projects.models.MessageTextDetails": "Azure.AI.Projects.Agents.MessageTextDetails",
Expand Down Expand Up @@ -156,7 +164,6 @@
"azure.ai.projects.models.UpdateCodeInterpreterToolResourceOptions": "Azure.AI.Projects.Agents.UpdateCodeInterpreterToolResourceOptions",
"azure.ai.projects.models.UpdateFileSearchToolResourceOptions": "Azure.AI.Projects.Agents.UpdateFileSearchToolResourceOptions",
"azure.ai.projects.models.UpdateToolResourcesOptions": "Azure.AI.Projects.Agents.UpdateToolResourcesOptions",
"azure.ai.projects.models.UploadFileRequest": "Azure.AI.Projects.Agents.uploadFile.Request.anonymous",
"azure.ai.projects.models.VectorStore": "Azure.AI.Projects.Agents.VectorStore",
"azure.ai.projects.models.VectorStoreChunkingStrategyRequest": "Azure.AI.Projects.Agents.VectorStoreChunkingStrategyRequest",
"azure.ai.projects.models.VectorStoreAutoChunkingStrategyRequest": "Azure.AI.Projects.Agents.VectorStoreAutoChunkingStrategyRequest",
Expand All @@ -182,6 +189,8 @@
"azure.ai.projects.models.ResponseFormat": "Azure.AI.Projects.Agents.ResponseFormat",
"azure.ai.projects.models.ListSortOrder": "Azure.AI.Projects.Agents.ListSortOrder",
"azure.ai.projects.models.MessageRole": "Azure.AI.Projects.Agents.MessageRole",
"azure.ai.projects.models.MessageBlockType": "Azure.AI.Projects.Agents.MessageBlockType",
"azure.ai.projects.models.ImageDetailLevel": "Azure.AI.Projects.Agents.ImageDetailLevel",
"azure.ai.projects.models.MessageStatus": "Azure.AI.Projects.Agents.MessageStatus",
"azure.ai.projects.models.MessageIncompleteDetailsReason": "Azure.AI.Projects.Agents.MessageIncompleteDetailsReason",
"azure.ai.projects.models.RunStatus": "Azure.AI.Projects.Agents.RunStatus",
Expand Down Expand Up @@ -238,7 +247,6 @@
"azure.ai.projects.AIProjectClient.agents.get_run_step": "Azure.AI.Projects.Agents.getRunStep",
"azure.ai.projects.AIProjectClient.agents.list_run_steps": "Azure.AI.Projects.Agents.listRunSteps",
"azure.ai.projects.AIProjectClient.agents.list_files": "Azure.AI.Projects.Agents.listFiles",
"azure.ai.projects.AIProjectClient.agents.upload_file": "Azure.AI.Projects.Agents.uploadFile",
"azure.ai.projects.AIProjectClient.agents.delete_file": "Azure.AI.Projects.Agents.deleteFile",
"azure.ai.projects.AIProjectClient.agents.get_file": "Azure.AI.Projects.Agents.getFile",
"azure.ai.projects.AIProjectClient.agents.list_vector_stores": "Azure.AI.Projects.Agents.listVectorStores",
Expand Down
3 changes: 2 additions & 1 deletion sdk/ai/azure-ai-projects/azure/ai/projects/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

from typing import TYPE_CHECKING, Union
from typing import List, TYPE_CHECKING, Union

if TYPE_CHECKING:
from . import models as _models
Expand All @@ -17,5 +17,6 @@
"_models.AgentsApiResponseFormat",
"_models.ResponseFormatJsonSchemaType",
]
MessageInputContent = Union[str, List["_models.MessageInputContentBlock"]]
MessageAttachmentToolDefinition = Union["_models.CodeInterpreterToolDefinition", "_models.FileSearchToolDefinition"]
AgentsApiToolChoiceOption = Union[str, str, "_models.AgentsApiToolChoiceOptionMode", "_models.AgentsNamedToolChoice"]
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
from ... import _model_base, models as _models
from ..._model_base import SdkJSONEncoder, _deserialize
from ..._serialization import Deserializer, Serializer
from ..._vendor import FileType, prepare_multipart_form_data
from ..._vendor import prepare_multipart_form_data
from ...operations._operations import (
build_agents_cancel_run_request,
build_agents_cancel_vector_store_file_batch_request,
Expand Down Expand Up @@ -1443,7 +1443,7 @@ async def create_message(
thread_id: str,
*,
role: Union[str, _models.MessageRole],
content: str,
content: "_types.MessageInputContent",
content_type: str = "application/json",
attachments: Optional[List[_models.MessageAttachment]] = None,
metadata: Optional[Dict[str, str]] = None,
Expand All @@ -1454,18 +1454,16 @@ async def create_message(
:param thread_id: Identifier of the thread. Required.
:type thread_id: str
:keyword role: The role of the entity that is creating the message. Allowed values include:


* ``user``\\ : Indicates the message is sent by an actual user and should be used in most
cases to represent user-generated messages.
* ``assistant``\\ : Indicates the message is generated by the agent. Use this value to insert
messages from the agent into the
conversation. Known values are: "user" and "assistant". Required.
``user``, which indicates the message is sent by an actual user (and should be
used in most cases to represent user-generated messages), and ``assistant``,
which indicates the message is generated by the agent (use this value to insert
messages from the agent into the conversation). Known values are: "user" and "assistant".
Required.
:paramtype role: str or ~azure.ai.projects.models.MessageRole
:keyword content: The textual content of the initial message. Currently, robust input including
images and annotated text may only be provided via
a separate call to the create message API. Required.
:paramtype content: str
:keyword content: The content of the initial message. This may be a basic string (if you only
need text) or an array of typed content blocks (for example, text, image_file,
image_url, and so on). Is either a str type or a [MessageInputContentBlock] type. Required.
:paramtype content: str or list[~azure.ai.projects.models.MessageInputContentBlock]
:keyword content_type: Body Parameter content-type. Content type parameter for JSON body.
Default value is "application/json".
:paramtype content_type: str
Expand Down Expand Up @@ -1525,7 +1523,7 @@ async def create_message(
body: Union[JSON, IO[bytes]] = _Unset,
*,
role: Union[str, _models.MessageRole] = _Unset,
content: str = _Unset,
content: "_types.MessageInputContent" = _Unset,
attachments: Optional[List[_models.MessageAttachment]] = None,
metadata: Optional[Dict[str, str]] = None,
**kwargs: Any
Expand All @@ -1537,17 +1535,16 @@ async def create_message(
:param body: Is either a JSON type or a IO[bytes] type. Required.
:type body: JSON or IO[bytes]
:keyword role: The role of the entity that is creating the message. Allowed values include:

* ``user``\\ : Indicates the message is sent by an actual user and should be used in most
cases to represent user-generated messages.
* ``assistant``\\ : Indicates the message is generated by the agent. Use this value to insert
messages from the agent into the
conversation. Known values are: "user" and "assistant". Required.
``user``, which indicates the message is sent by an actual user (and should be
used in most cases to represent user-generated messages), and ``assistant``,
which indicates the message is generated by the agent (use this value to insert
messages from the agent into the conversation). Known values are: "user" and "assistant".
Required.
:paramtype role: str or ~azure.ai.projects.models.MessageRole
:keyword content: The textual content of the initial message. Currently, robust input including
images and annotated text may only be provided via
a separate call to the create message API. Required.
:paramtype content: str
:keyword content: The content of the initial message. This may be a basic string (if you only
need text) or an array of typed content blocks (for example, text, image_file,
image_url, and so on). Is either a str type or a [MessageInputContentBlock] type. Required.
:paramtype content: str or list[~azure.ai.projects.models.MessageInputContentBlock]
:keyword attachments: A list of files attached to the message, and the tools they should be
added to. Default value is None.
:paramtype attachments: list[~azure.ai.projects.models.MessageAttachment]
Expand Down Expand Up @@ -3546,59 +3543,18 @@ async def list_files(
return deserialized # type: ignore

@overload
async def upload_file(
self, *, file: FileType, purpose: Union[str, _models.FilePurpose], filename: Optional[str] = None, **kwargs: Any
) -> _models.OpenAIFile:
"""Uploads a file for use by other operations.

:keyword file: The file data, in bytes. Required.
:paramtype file: ~azure.ai.projects._vendor.FileType
:keyword purpose: The intended purpose of the uploaded file. Use ``assistants`` for Agents and
Message files, ``vision`` for Agents image file inputs, ``batch`` for Batch API, and
``fine-tune`` for Fine-tuning. Known values are: "fine-tune", "fine-tune-results",
"assistants", "assistants_output", "batch", "batch_output", and "vision". Required.
:paramtype purpose: str or ~azure.ai.projects.models.FilePurpose
:keyword filename: The name of the file. Default value is None.
:paramtype filename: str
:return: OpenAIFile. The OpenAIFile is compatible with MutableMapping
:rtype: ~azure.ai.projects.models.OpenAIFile
:raises ~azure.core.exceptions.HttpResponseError:
"""

async def _upload_file(self, body: _models._models.UploadFileRequest, **kwargs: Any) -> _models.OpenAIFile: ...
@overload
async def upload_file(self, body: JSON, **kwargs: Any) -> _models.OpenAIFile:
"""Uploads a file for use by other operations.

:param body: Required.
:type body: JSON
:return: OpenAIFile. The OpenAIFile is compatible with MutableMapping
:rtype: ~azure.ai.projects.models.OpenAIFile
:raises ~azure.core.exceptions.HttpResponseError:
"""
async def _upload_file(self, body: JSON, **kwargs: Any) -> _models.OpenAIFile: ...

@distributed_trace_async
async def upload_file(
self,
body: JSON = _Unset,
*,
file: FileType = _Unset,
purpose: Union[str, _models.FilePurpose] = _Unset,
filename: Optional[str] = None,
**kwargs: Any
async def _upload_file(
self, body: Union[_models._models.UploadFileRequest, JSON], **kwargs: Any
) -> _models.OpenAIFile:
"""Uploads a file for use by other operations.

:param body: Is one of the following types: JSON Required.
:type body: JSON
:keyword file: The file data, in bytes. Required.
:paramtype file: ~azure.ai.projects._vendor.FileType
:keyword purpose: The intended purpose of the uploaded file. Use ``assistants`` for Agents and
Message files, ``vision`` for Agents image file inputs, ``batch`` for Batch API, and
``fine-tune`` for Fine-tuning. Known values are: "fine-tune", "fine-tune-results",
"assistants", "assistants_output", "batch", "batch_output", and "vision". Required.
:paramtype purpose: str or ~azure.ai.projects.models.FilePurpose
:keyword filename: The name of the file. Default value is None.
:paramtype filename: str
:param body: Multipart body. Is either a UploadFileRequest type or a JSON type. Required.
:type body: ~azure.ai.projects.models._models.UploadFileRequest or JSON
:return: OpenAIFile. The OpenAIFile is compatible with MutableMapping
:rtype: ~azure.ai.projects.models.OpenAIFile
:raises ~azure.core.exceptions.HttpResponseError:
Expand All @@ -3616,13 +3572,6 @@ async def upload_file(

cls: ClsType[_models.OpenAIFile] = kwargs.pop("cls", None)

if body is _Unset:
if file is _Unset:
raise TypeError("missing required argument: file")
if purpose is _Unset:
raise TypeError("missing required argument: purpose")
body = {"file": file, "filename": filename, "purpose": purpose}
body = {k: v for k, v in body.items() if v is not None}
_body = body.as_dict() if isinstance(body, _model_base.Model) else body
_file_fields: List[str] = ["file"]
_data_fields: List[str] = ["purpose", "filename"]
Expand All @@ -3636,12 +3585,16 @@ async def upload_file(
params=_params,
)
path_format_arguments = {
"endpoint": self._serialize.url("self._config.endpoint", self._config.endpoint, "str"),
"subscriptionId": self._serialize.url("self._config.subscription_id", self._config.subscription_id, "str"),
"endpoint": self._serialize.url("self._config.endpoint", self._config.endpoint, "str", skip_quote=True),
"subscriptionId": self._serialize.url(
"self._config.subscription_id", self._config.subscription_id, "str", skip_quote=True
),
"resourceGroupName": self._serialize.url(
"self._config.resource_group_name", self._config.resource_group_name, "str"
"self._config.resource_group_name", self._config.resource_group_name, "str", skip_quote=True
),
"projectName": self._serialize.url(
"self._config.project_name", self._config.project_name, "str", skip_quote=True
),
"projectName": self._serialize.url("self._config.project_name", self._config.project_name, "str"),
}
_request.url = self._client.format_url(_request.url, **path_format_arguments)

Expand Down
Loading