Skip to content

Commit 4203530

Browse files
authored
[Docs Agent] Docs Agent release version 0.4.0 (#533)
* [Docs Agent] Docs Agent release version 0.4.0 - **Multi-modal support:** The Docs Agent CLI supports image, audio, and video files as part of a prompt to the Gemini model. - **Formatted output:** Select the format of Docs Agent CLI's responses with the `--response_type json` and `--plaintext` options. - **Autocomplete script:** The `autocomplete.sh` script is added to include Docs Agent CLI commands and options, making it easier and faster to use the Docs Agent CLI on a terminal. * [Docs Agent] Docs Agent release version 0.4.0 (Files missed in the previous commit) - **Multi-modal support:** The Docs Agent CLI supports image, audio, and video files as part of a prompt to the Gemini model. - **Formatted output:** Select the format of Docs Agent CLI's responses with the `--response_type json` and `--plaintext` options. - **Autocomplete script:** The `autocomplete.sh` script is added to include Docs Agent CLI commands and options, making it easier and faster to use the Docs Agent CLI on a terminal.
1 parent 213c6ce commit 4203530

20 files changed

+2819
-2049
lines changed

examples/gemini/python/docs-agent/README.md

+21-3
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ check out the [Set up Docs Agent][set-up-docs-agent] section below.
2626

2727
Docs Agent's `agent runtask` command allows you to run pre-defined chains of prompts,
2828
which are referred to as **tasks**. These tasks simplify complex interactions by defining
29-
a series of steps that the Docs Agent will execute. The tasks are defined in `.yaml` files
30-
stored in the [`tasks`][tasks-dir] directory of your Docs Agent project. The tasks are
29+
a series of steps that the Docs Agent CLI will execute. The tasks are defined in `.yaml`
30+
files stored in the [`tasks`][tasks-dir] directory of your Docs Agent project. The tasks are
3131
designed to be reusable and can be used to automate common workflows, such as generating
32-
release notes, updating documentation, or analyzing complex information.
32+
release notes, drafting overview pages, or analyzing complex information.
3333

3434
A task file example:
3535

@@ -101,6 +101,16 @@ The list below summarizes the tasks and features supported by Docs Agent:
101101
agent runtask --task DraftReleaseNotes
102102
```
103103

104+
- **Multi-modal support**: Docs Agent's `agent helpme` command can include image,
105+
audio, and video files as part of a prompt to the Gemini 1.5 model, for example:
106+
107+
```sh
108+
agent helpme Provide a concise, descriptive alt text for this PNG image --file ./my_image_example.png
109+
```
110+
111+
You can use this feature for creating tasks as well. For example, see the
112+
[DescribeImages][describe-images] task.
113+
104114
For more information on Docs Agent's architecture and features,
105115
see the [Docs Agent concepts][docs-agent-concepts] page.
106116

@@ -241,6 +251,13 @@ Clone the Docs Agent project and install dependencies:
241251
**Important**: From this point, all `agent` command lines below need to
242252
run in this `poetry shell` environment.
243253
254+
5. (**Optional**) To enable autocomplete commands and flags related to
255+
Docs Agent in your shell environment, run the following command:
256+
257+
```
258+
source scripts/autocomplete.sh
259+
```
260+
244261
### 5. Edit the Docs Agent configuration file
245262
246263
This guide uses the [open source Flutter documents][flutter-docs-src] as an example dataset,
@@ -458,3 +475,4 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
458475
[chunking-process]: docs/chunking-process.md
459476
[new-15-mode]: docs/config-reference.md#app_mode
460477
[tasks-dir]: tasks/
478+
[describe-images]: tasks/describe-images-for-alt-text-task.yaml

examples/gemini/python/docs-agent/apps_script/drive_to_markdown.gs

+1-1
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,6 @@ function convertDriveFolder(folderName, outputFolderName="", indexFile="") {
235235
insertRichText(sheet, md_chip, "E", row_number);
236236
insertRichText(sheet, folder_chip, "I", row_number);
237237
}
238+
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
238239
}
239-
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
240240
}

examples/gemini/python/docs-agent/docs/cli-reference.md

+30
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,20 @@ For example:
258258
agent helpme write a concept doc covering all features in this project? --allfiles ~/my-project --new
259259
```
260260

261+
### Ask the model to print the output in JSON
262+
263+
The command below prints the output from the model in JSON format:
264+
265+
```sh
266+
agent helpme <REQUEST> --response_type json
267+
```
268+
269+
For example:
270+
271+
```sh
272+
agent helpme how do I cook pasta? --response_type json
273+
```
274+
261275
### Ask the model to run a pre-defined chain of prompts
262276

263277
The command below runs a task (a sequence of prompts) defined in
@@ -297,6 +311,22 @@ For example:
297311
agent runtask --task IndexPageGenerator --custom_input ~/my_example/docs/development/
298312
```
299313

314+
### Ask the model to print the output in plain text
315+
316+
By default, the `agent runtask` command uses Python's Rich console
317+
to format its output. You can disable it by using the `--plaintext`
318+
flag:
319+
320+
```sh
321+
agent runtask --task <TASK> --plaintext
322+
```
323+
324+
For example:
325+
326+
```sh
327+
agent runtask --task DraftReleaseNotes --plaintext
328+
```
329+
300330
## Managing online corpora
301331

302332
### List all existing online corpora

examples/gemini/python/docs-agent/docs_agent/agents/docs_agent.py

+68
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"""Docs Agent"""
1818

1919
import typing
20+
import os, pathlib
2021

2122
from absl import logging
2223
import google.api_core
@@ -573,6 +574,73 @@ def ask_content_model_to_fact_check_prompt(self, context: str, prev_response: st
573574
def generate_embedding(self, text, task_type: str = "SEMANTIC_SIMILARITY"):
574575
return self.gemini.embed(text, task_type)[0]
575576

577+
# Generate a response to an image
578+
def ask_model_about_image(self, prompt: str, image):
579+
if not prompt:
580+
prompt = f"Describe this image:"
581+
if self.context_model.startswith("models/gemini-1.5"):
582+
try:
583+
# Adding prompt in the beginning allows long contextual
584+
# information to be added.
585+
response = self.gemini.generate_content([prompt, image])
586+
except google.api_core.exceptions.InvalidArgument:
587+
return self.config.conditions.model_error_message
588+
else:
589+
logging.error(f"The {self.context_model} can't read an image.")
590+
response = None
591+
exit(1)
592+
return response
593+
594+
# Generate a response to audio
595+
def ask_model_about_audio(self, prompt: str, audio):
596+
if not prompt:
597+
prompt = f"Describe this audio clip:"
598+
audio_size = os.path.getsize(audio)
599+
# Limit is 20MB
600+
if audio_size > 20000000:
601+
logging.error(f"The audio clip {audio} is too large: {audio_size} bytes.")
602+
exit(1)
603+
# Get the mime type of the audio file and trim the . from the extension.
604+
mime_type = "audio/" + pathlib.Path(audio).suffix[:1]
605+
audio_clip = {
606+
"mime_type": mime_type,
607+
"data": pathlib.Path(audio).read_bytes()
608+
}
609+
if self.context_model.startswith("models/gemini-1.5"):
610+
try:
611+
response = self.gemini.generate_content([prompt, audio_clip])
612+
except google.api_core.exceptions.InvalidArgument:
613+
return self.config.conditions.model_error_message
614+
else:
615+
logging.error(f"The {self.context_model} can't read an audio clip.")
616+
exit(1)
617+
return response
618+
619+
# Generate a response to video
620+
def ask_model_about_video(self, prompt: str, video):
621+
if not prompt:
622+
prompt = f"Describe this video clip:"
623+
video_size = os.path.getsize(video)
624+
# Limit is 2GB
625+
if video_size > 2147483648:
626+
logging.error(f"The video clip {video} is too large: {video_size} bytes.")
627+
exit(1)
628+
request_options = {
629+
"timeout": 600
630+
}
631+
mime_type = "video/" + pathlib.Path(video).suffix[:1]
632+
video_clip_uploaded =self.gemini.upload_file(video)
633+
video_clip = self.gemini.get_file(video_clip_uploaded)
634+
if self.context_model.startswith("models/gemini-1.5"):
635+
try:
636+
response = self.gemini.generate_content([prompt, video_clip],
637+
request_options=request_options)
638+
except google.api_core.exceptions.InvalidArgument:
639+
return self.config.conditions.model_error_message
640+
else:
641+
logging.error(f"The {self.context_model} can't see video clips.")
642+
exit(1)
643+
return response
576644

577645
# Function to give an embedding function for gemini using an API key
578646
def embedding_function_gemini_retrieval(api_key, embedding_model: str):

examples/gemini/python/docs-agent/docs_agent/interfaces/README.md

+11-5
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,16 @@ from your `$HOME` directory.
101101
poetry shell
102102
```
103103

104-
Entering the `poetry shell` environment is **required** for
105-
running the `agent` command.
104+
**Important**: You must always enter the `poetry shell` environment
105+
to run the `agent` command.
106106

107-
2. Run the `agent helpme` command, for example:
107+
2. Enable autocomplete for Docs Agent CLI options in your environment:
108+
109+
```
110+
source scripts/autocomplete.sh
111+
```
112+
113+
3. Run the `agent helpme` command, for example:
108114

109115
```
110116
agent helpme how do I cook pasta?
@@ -113,7 +119,7 @@ from your `$HOME` directory.
113119
This command returns the Gemini model's response of your input prompt
114120
`how do I cook pasta?`.
115121

116-
3. View the list of Docs Agent tasks available in your setup:
122+
4. View the list of Docs Agent tasks available in your setup:
117123

118124
```
119125
agent runtask
@@ -122,7 +128,7 @@ from your `$HOME` directory.
122128
This command prints a list of Docs Agent tasks that you can run.
123129
(See the `tasks` directory in your local Docs Agent setup.)
124130

125-
4. Run the `agent runtask` command, for example:
131+
5. Run the `agent runtask` command, for example:
126132

127133
```
128134
agent runtask --task IndexPageGenerator

0 commit comments

Comments
 (0)