Skip to content

Commit 8111648

Browse files
authored
Fix issues with images and tool use, add integration tests, docs (#100)
1 parent 213964f commit 8111648

11 files changed

+156
-107
lines changed

.elpaignore

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
.github
22
*test.el
3+
animal.jpeg
34
utilities/

NEWS.org

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
* Version 0.18.0
2+
- Add media handling, for images, videos, and audio.
23
- Add batch embeddings capability (currently for just Open AI and Ollama).
34
- Add Microsoft Azure's Open AI
45
- Remove testing and other development files from ELPA packaging.

README.org

+18-5
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,14 @@
33
* Introduction
44
This library provides an interface for interacting with Large Language Models (LLMs). It allows elisp code to use LLMs while also giving end-users the choice to select their preferred LLM. This is particularly beneficial when working with LLMs since various high-quality models exist, some of which have paid API access, while others are locally installed and free but offer medium quality. Applications using LLMs can utilize this library to ensure compatibility regardless of whether the user has a local LLM or is paying for API access.
55

6-
LLMs exhibit varying functionalities and APIs. This library aims to abstract functionality to a higher level, as some high-level concepts might be supported by an API while others require more low-level implementations. An example of such a concept is "examples," where the client offers example interactions to demonstrate a pattern for the LLM. While the GCloud Vertex API has an explicit API for examples, OpenAI's API requires specifying examples by modifying the system prompt. OpenAI also introduces the concept of a system prompt, which does not exist in the Vertex API. Our library aims to conceal these API variations by providing higher-level concepts in our API.
7-
8-
Certain functionalities might not be available in some LLMs. Any such unsupported functionality will raise a ~'not-implemented~ signal.
6+
This library abstracts several kinds of features:
7+
- Chat functionality: the ability to query the LLM and get a response, and continue to take turns writing to the LLM and receiving responses. The library supports both synchronous, asynchronous, and streaming responses.
8+
- Chat with image and other kinda of media inputs are also supported, so that the user can input images and discuss them with the LLM.
9+
- Function calling (aka "tool use") is supported, for having the LLM call elisp functions that it chooses, with arguments it provides.
10+
- Embeddings: Send text and receive a vector that encodes the semantic meaning of the underlying text. Can be used in a search system to find similar passages.
11+
- Prompt construction: Create a prompt to give to an LLM from one more sources of data.
12+
13+
Certain functionalities might not be available in some LLMs. Any such unsupported functionality will raise a ~'not-implemented~ signal, or it may fail in some other way. Clients are recommended to check =llm-capabilities= when trying to do something beyond basic text chat.
914
* Setting up providers
1015
Users of an application that uses this package should not need to install it themselves. The llm package should be installed as a dependency when you install the package that uses it. However, you do need to require the llm module and set up the provider you will be using. Typically, applications will have a variable you can set. For example, let's say there's a package called "llm-refactoring", which has a variable ~llm-refactoring-provider~. You would set it up like so:
1116

@@ -28,7 +33,7 @@ For embedding users. if you store the embeddings, you *must* set the embedding m
2833
** Open AI
2934
You can set up with ~make-llm-openai~, with the following parameters:
3035
- ~:key~, the Open AI key that you get when you sign up to use Open AI's APIs. Remember to keep this private. This is non-optional.
31-
- ~:chat-model~: A model name from the [[https://platform.openai.com/docs/models/gpt-4][list of Open AI's model names.]] Keep in mind some of these are not available to everyone. This is optional, and will default to a reasonable 3.5 model.
36+
- ~:chat-model~: A model name from the [[https://platform.openai.com/docs/models/gpt-4][list of Open AI's model names.]] Keep in mind some of these are not available to everyone. This is optional, and will default to a reasonable model.
3237
- ~:embedding-model~: A model name from [[https://platform.openai.com/docs/guides/embeddings/embedding-models][list of Open AI's embedding model names.]] This is optional, and will default to a reasonable model.
3338
** Open AI Compatible
3439
There are many Open AI compatible APIs and proxies of Open AI. You can set up one with ~make-llm-openai-compatible~, with the following parameter:
@@ -151,7 +156,7 @@ Conversations can take place by repeatedly calling ~llm-chat~ and its variants.
151156
** Caution about ~llm-chat-prompt-interactions~
152157
The interactions in a prompt may be modified by conversation or by the conversion of the context and examples to what the LLM understands. Different providers require different things from the interactions. Some can handle system prompts, some cannot. Some require alternating user and assistant chat interactions, others can handle anything. It's important that clients keep to behaviors that work on all providers. Do not attempt to read or manipulate ~llm-chat-prompt-interactions~ after initially setting it up for the first time, because you are likely to make changes that only work for some providers. Similarly, don't directly create a prompt with ~make-llm-chat-prompt~, because it is easy to create something that wouldn't work for all providers.
153158
** Function calling
154-
*Note: function calling functionality is currently alpha quality. If you want to use function calling, please watch the =llm= [[https://github.com/ahyatt/llm/discussions][discussions]] for any announcements about changes.*
159+
*Note: function calling functionality is currently beta quality. If you want to use function calling, please watch the =llm= [[https://github.com/ahyatt/llm/discussions][discussions]] for any announcements about changes.*
155160

156161
Function calling is a way to give the LLM a list of functions it can call, and have it call the functions for you. The standard interaction has the following steps:
157162
1. The client sends the LLM a prompt with functions it can call.
@@ -199,6 +204,14 @@ for a function than "write-email".
199204
Examples can be found in =llm-tester=. There is also a function call to generate
200205
function calls from existing elisp functions in
201206
=utilities/elisp-to-function-call.el=.
207+
** Media input
208+
*Note: media input functionality is currently alpha quality. If you want to use it, please watch the =llm= [[https://github.com/ahyatt/llm/discussions][discussions]] for any announcements about changes.*
209+
210+
Media can be used in =llm-chat= and related functions. To use media, you can use
211+
=llm-multipart= in =llm-make-chat-prompt=, and pass it an Emacs image or an
212+
=llm-media= object for other kinds of media. Besides images, some models support
213+
video and audio. Not all providers or models support these, with images being
214+
the most frequently supported media type, and video and audio more rare.
202215
** Advanced prompt creation
203216
The =llm-prompt= module provides helper functions to create prompts that can
204217
incorporate data from your application. In particular, this should be very

animal.jpeg

11 KB
Loading

llm-gemini.el

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ If STREAMING-P is non-nil, use the streaming endpoint."
100100
(append
101101
(list 'streaming 'embeddings)
102102
(when-let ((model (llm-models-match (llm-gemini-chat-model provider)))
103-
(capabilities (llm-model-capabilities model)))
103+
(capabilities (llm-model-capabilities model)))
104104
(append
105105
(when (member 'tool-use capabilities) '(function-calls))
106106
(seq-intersection capabilities '(image-input audio-input video-input))))))

llm-integration-test.el

+25-4
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,16 @@
9797
(defun llm-integration-test-rate-limit (provider)
9898
(cond ((eq (type-of provider) 'llm-azure)
9999
;; The free Azure tier has extremely restrictive rate limiting.
100-
(sleep-for (string-to-number (or (getenv "AZURE_SLEEP") "60"))))))
100+
(sleep-for (string-to-number (or (getenv "AZURE_SLEEP") "60"))))
101+
((member (type-of provider) '(llm-gemini llm-vertex))
102+
(sleep-for 15))))
103+
104+
(defun llm-integration-test-string-eq (target actual)
105+
"Test that TARGET approximately equals ACTUAL.
106+
This is a very approximate test because LLMs that aren't that great
107+
often mess up and put punctuation, or repeat the word, or something
108+
else. We really just want to see if it's in the right ballpark."
109+
(string-match-p (regexp-quote (downcase target)) (downcase actual)))
101110

102111
(defun llm-integration-test-providers ()
103112
"Return a list of providers to test."
@@ -214,7 +223,7 @@
214223
(while (not (or result err-result))
215224
(sleep-for 0.1))
216225
(if err-result (error err-result))
217-
(should (equal (string-trim result) llm-integration-test-chat-answer))))
226+
(should (llm-integration-test-string-eq llm-integration-test-chat-answer (string-trim result)))))
218227

219228
(llm-def-integration-test llm-chat-streaming (provider)
220229
(when (member 'streaming (llm-capabilities provider))
@@ -240,8 +249,8 @@
240249
(time-less-p (time-subtract (current-time) start-time) 10))
241250
(sleep-for 0.1))
242251
(if err-result (error err-result))
243-
(should (equal (string-trim returned-result) llm-integration-test-chat-answer))
244-
(should (equal (string-trim streamed-result) llm-integration-test-chat-answer)))))
252+
(should (llm-integration-test-string-eq llm-integration-test-chat-answer (string-trim returned-result)))
253+
(should (llm-integration-test-string-eq llm-integration-test-chat-answer (string-trim streamed-result))))))
245254

246255
(llm-def-integration-test llm-function-call (provider)
247256
(when (member 'function-calls (llm-capabilities provider))
@@ -261,6 +270,18 @@
261270
;; Test that we can send the function back to the provider without error.
262271
(llm-chat provider prompt))))
263272

273+
(llm-def-integration-test llm-image-chat (provider)
274+
(when (member 'image-input (llm-capabilities provider))
275+
(let* ((image-load-path (append image-load-path (list default-directory)))
276+
(result (llm-chat
277+
provider
278+
(llm-make-chat-prompt
279+
(llm-make-multipart
280+
"What is this animal? Please answer in one word, without punctuation or whitespace."
281+
(create-image "animal.jpeg"))))))
282+
(should (stringp result))
283+
(should (llm-integration-test-string-eq "owl" (string-trim (downcase result)))))))
284+
264285
(llm-def-integration-test llm-count-tokens (provider)
265286
(let ((result (llm-count-tokens provider "What is the capital of France?")))
266287
(should (integerp result))

llm-ollama.el

+23-23
Original file line numberDiff line numberDiff line change
@@ -112,25 +112,25 @@ PROVIDER is the llm-ollama provider."
112112
(let (request-alist messages options)
113113
(setq messages
114114
(mapcar (lambda (interaction)
115-
(let* ((role (llm-chat-prompt-interaction-role interaction))
116-
(content (llm-chat-prompt-interaction-content interaction))
117-
(content-text "")
118-
(images nil))
119-
(if (stringp content)
120-
(setq content-text content)
121-
(if (eq 'user role)
122-
(dolist (part (llm-multipart-parts content))
123-
(if (llm-media-p part)
124-
(setq images (append images (list part)))
125-
(setq content-text (concat content-text part))))
126-
(setq content-text (json-encode content))))
127-
(append
128-
`(("role" . ,(symbol-name role)))
129-
`(("content" . ,content-text))
130-
(when images
131-
`(("images" .
132-
,(mapcar (lambda (img) (base64-encode-string (llm-media-data img) t))
133-
images)))))))
115+
(let* ((role (llm-chat-prompt-interaction-role interaction))
116+
(content (llm-chat-prompt-interaction-content interaction))
117+
(content-text "")
118+
(images nil))
119+
(if (stringp content)
120+
(setq content-text content)
121+
(if (eq 'user role)
122+
(dolist (part (llm-multipart-parts content))
123+
(if (llm-media-p part)
124+
(setq images (append images (list part)))
125+
(setq content-text (concat content-text part))))
126+
(setq content-text (json-encode content))))
127+
(append
128+
`(("role" . ,(symbol-name role)))
129+
`(("content" . ,content-text))
130+
(when images
131+
`(("images" .
132+
,(mapcar (lambda (img) (base64-encode-string (llm-media-data img) t))
133+
images)))))))
134134
(llm-chat-prompt-interactions prompt)))
135135
(when (llm-chat-prompt-context prompt)
136136
(push `(("role" . "system")
@@ -196,10 +196,10 @@ PROVIDER is the llm-ollama provider."
196196
'(embeddings embeddings-batch))
197197
(when-let ((chat-model (llm-models-match
198198
(llm-ollama-chat-model provider)))
199-
(capabilities (llm-model-capabilities chat-model)))
200-
(append
201-
(when (member 'tool-use capabilities) '(function-calls))
202-
(seq-intersection capabilities '(image-input))))))
199+
(capabilities (llm-model-capabilities chat-model)))
200+
(append
201+
(when (member 'tool-use capabilities) '(function-calls))
202+
(seq-intersection capabilities '(image-input))))))
203203

204204
(provide 'llm-ollama)
205205

llm-openai.el

+27-26
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,11 @@ will use a reasonable default.
5151
5252
EMBEDDING-MODEL is the model to use for embeddings. If unset, it
5353
will use a reasonable default."
54-
key chat-model embedding-model)
54+
key (chat-model "gpt-4o") (embedding-model "text-embedding-3-small"))
5555

56-
(cl-defstruct (llm-openai-compatible (:include llm-openai))
56+
(cl-defstruct (llm-openai-compatible (:include llm-openai
57+
(chat-model nil)
58+
(embedding-model nil)))
5759
"A structure for other APIs that use the Open AI's API.
5860
5961
URL is the URL to use for the API, up to the command. So, for
@@ -70,8 +72,7 @@ https://api.example.com/v1/chat, then URL should be
7072
"Return the request to the server for the embedding of STRING-OR-LIST.
7173
PROVIDER is the Open AI provider struct."
7274
`(("input" . ,string-or-list)
73-
("model" . ,(or (llm-openai-embedding-model provider)
74-
"text-embedding-3-small"))))
75+
("model" . ,(llm-openai-embedding-model provider))))
7576

7677
(cl-defmethod llm-provider-batch-embeddings-request ((provider llm-openai) batch)
7778
(llm-provider-embedding-request provider batch))
@@ -173,27 +174,27 @@ STREAMING if non-nil, turn on response streaming."
173174
(append
174175
`(("role" . ,(llm-chat-prompt-interaction-role i)))
175176
(when-let ((content (llm-chat-prompt-interaction-content i)))
176-
`(("content"
177-
. ,(pcase content
178-
((pred llm-multipart-p)
179-
(mapcar (lambda (part)
180-
(if (llm-media-p part)
181-
`(("type" . "image_url")
182-
("image_url"
183-
. (("url"
184-
. ,(concat
185-
"data:"
186-
(llm-media-mime-type part)
187-
";base64,"
188-
(base64-encode-string (llm-media-data part)))))))
189-
`(("type" . "text")
190-
("text" . ,part))))
191-
(llm-multipart-parts content)))
192-
((pred listp) (llm-openai-function-call-to-response content))
193-
(_ content)))))))))
177+
(cond
178+
((listp content)
179+
(llm-openai-function-call-to-response content))
180+
((llm-multipart-p content)
181+
`(("content" . ,(mapcar (lambda (part)
182+
(if (llm-media-p part)
183+
`(("type" . "image_url")
184+
("image_url"
185+
. (("url"
186+
. ,(concat
187+
"data:"
188+
(llm-media-mime-type part)
189+
";base64,"
190+
(base64-encode-string (llm-media-data part)))))))
191+
`(("type" . "text")
192+
("text" . ,part))))
193+
(llm-multipart-parts content)))))
194+
(t `(("content" . ,content)))))))))
194195
(llm-chat-prompt-interactions prompt)))
195196
request-alist)
196-
(push `("model" . ,(or (llm-openai-chat-model provider) "gpt-4o")) request-alist)
197+
(push `("model" . ,(llm-openai-chat-model provider)) request-alist)
197198
(when (llm-chat-prompt-temperature prompt)
198199
(push `("temperature" . ,(* (llm-chat-prompt-temperature prompt) 2.0)) request-alist))
199200
(when (llm-chat-prompt-max-tokens prompt)
@@ -294,9 +295,9 @@ RESPONSE can be nil if the response is complete."
294295

295296
(cl-defmethod llm-capabilities ((provider llm-openai))
296297
(append '(streaming embeddings function-calls)
297-
(when-let ((model (llm-models-match (llm-openai-chat-model provider))))
298-
(seq-intersection (llm-model-capabilities model)
299-
'(image-input)))))
298+
(when-let ((model (llm-models-match (llm-openai-chat-model provider))))
299+
(seq-intersection (llm-model-capabilities model)
300+
'(image-input)))))
300301

301302
(cl-defmethod llm-capabilities ((provider llm-openai-compatible))
302303
(append '(streaming)

llm-provider-utils.el

+13-7
Original file line numberDiff line numberDiff line change
@@ -430,13 +430,19 @@ EXAMPLE-PRELUDE is the text to introduce any examples with."
430430
This should be used for providers that do not have a notion of a system prompt.
431431
432432
EXAMPLE-PRELUDE is the text to introduce any examples with."
433-
(when-let ((system-content (llm-provider-utils-get-system-prompt prompt example-prelude)))
434-
(setf (llm-chat-prompt-interaction-content (car (llm-chat-prompt-interactions prompt)))
435-
(concat system-content
436-
"\n"
437-
(llm-chat-prompt-interaction-content (car (llm-chat-prompt-interactions prompt))))
438-
(llm-chat-prompt-context prompt) nil
439-
(llm-chat-prompt-examples prompt) nil)))
433+
(let ((system-content (llm-provider-utils-get-system-prompt prompt example-prelude)))
434+
(when (> (length system-content) 0)
435+
(setf (llm-chat-prompt-interaction-content (car (llm-chat-prompt-interactions prompt)))
436+
(let ((initial-content (llm-chat-prompt-interaction-content (car (llm-chat-prompt-interactions prompt)))))
437+
(if (llm-multipart-p initial-content)
438+
(make-llm-multipart
439+
:parts (cons system-content
440+
(llm-multipart-parts initial-content)))
441+
(concat system-content
442+
"\n"
443+
initial-content)))
444+
(llm-chat-prompt-context prompt) nil
445+
(llm-chat-prompt-examples prompt) nil))))
440446

441447
(defun llm-provider-utils-collapse-history (prompt &optional history-prelude)
442448
"Collapse history to a single PROMPT.

0 commit comments

Comments
 (0)