chat example (command line) #277

davidkoski · 2025-04-21T23:17:28Z

@ibrahimcetin here is the command line chat

./mlx-run llm-tool chat --model mlx-community/Qwen2-VL-2B-Instruct-4bit --image /Users/dkoski/Desktop/IMG_0912.jpeg --resize 512 --system "You are a helpful assistant who answers questions in English."

> what animal is in the picture?
The animal in the picture is a dog.

> what is behind the dog?
Behind the dog is a Christmas tree.

> what else?
Yes, there is also a fireplace with a Christmas tree and decorations behind the dog.

Libraries/MLXLMCommon/UserInput.swift

davidkoski · 2025-04-21T23:19:22Z

Libraries/MLXVLM/Models/Qwen2VL.swift

            queries = rotaryEmbedding(queries, offset: offset)
            keys = rotaryEmbedding(keys, offset: offset)

            if let cache {
                (keys, values) = cache.update(keys: keys, values: values)
            }

+            let mask = mask?[.ellipsis, 0 ..< keys.dim(-2)]


I found that the dimensions were mismatched on a second message -- the keys dimension needs to be considered after the KVCache. In mlx-vlm it works out ok because:

it looks like KVCache isn't used persistently

the KVCache implementation doesn't window on the cache

davidkoski · 2025-04-21T23:19:44Z

Tools/llm-tool/LLMTool.swift

-/// Command line arguments for controlling generation of text.
-struct GenerateArguments: ParsableArguments, Sendable {
-
+struct PromptArguments: ParsableArguments, Sendable {


I split this out because chat doesn't need a prompt.

davidkoski · 2025-04-21T23:20:23Z

Tools/llm-tool/LLMTool.swift

+                size = CGSize(width: v0, height: v1)
+            }
+            userInput.processing.resize = size
+        }


The processing part is the only thing that is interesting I think -- this should probably go into reusable arguments anyway

davidkoski · 2025-04-21T23:21:24Z

Tools/llm-tool/LLMTool.swift

+        let modelContainer = try await memory.start { [args] in
+            do {
+                return try await args.load(
+                    defaultModel: defaultModel.name, modelFactory: LLMModelFactory.shared)


So how do we know VLM vs LLM? It could probe the registry but what if the id isn't registered? We can't rely on an image in the parameters because the user may load it at chat-time.

davidkoski · 2025-04-21T23:21:56Z

Tools/llm-tool/LLMTool.swift

+
+        // TODO: need to figure out the proper ownrship for this -- maybe the loop
+        // below needs to go inside the context?
+        var cache: [KVCache]?


This still needs work, but I think a chat application should show how to use a KVCache

davidkoski · 2025-04-21T23:33:38Z

Tools/llm-tool/LLMTool.swift

+            if chat.count >= 2 {
+                chatWithMedia[1].images = images
+                chatWithMedia[1].videos = videos
+            }


@ibrahimcetin

Still working on this -- it works great with a single set of media, but the model / image processing code doesn't expect two messages with independent sets of media.

On the swift side:

public struct LMInput { public let text: Text public let image: ProcessedImage? public let video: ProcessedVideo?

Tokens + one set of media. The python code in mlx-vlm has a similar setup. I think maybe that should be:

public struct LMInput { public let text: Text public let image: [ProcessedImage] public let video: [ProcessedVideo]

and then the model can inject the appropriate piece.

The Hashable conformance in the UserInput was me thinking that perhaps we should cache the ProcessedImage / Video so it doesn't need to be reprocessed.

Another option is to use the KVCache -- it will capture the image embedding and we could then omit the image from subsequent generations. Except we need to represent it space-wise so the size of the context remains consistent (we can't just drop the image).

FWIW @Blaizzy mlx-vlm's chat implementation has a problem here I think:

https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/chat_ui.py

The code only presents the image for the first message and then discards it. It doesn't use a KVCache so it actually "forgets" what the image is, though the model is pretty good at faking it:

<|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user what animal is in the picture. answer in english<|im_end|> <|im_start|>assistant The animal in the picture is a dog.<|im_end|> <|im_start|>user what is the dog wearing?<|im_end|> <|im_start|>assistant The dog is wearing a collar.<|im_end|> <|im_start|>user what is behind the dog?<|vision_start|><|image_pad|><|vision_end|><|im_end|> <|im_start|>assistant There is a person behind the dog.<|im_end|>

There are maybe two problems here:

dropping the image data

the fact that the image is attached to the last message -- I found that if the image data is present it needs to be associated with the first message that refers to it

Anyway, with just a single image I think this is pretty close. It needs to be cleaned up, but it works. Maybe we only deal with one image set in the thread for now.

@ibrahimcetin an example of the multi-image in chat issue:

so perhaps we document that as an issue (or disable to ability to add multiple media sets) until we figure it out. I don't think it needs to block the chat examples.

davidkoski · 2025-04-21T23:34:10Z

Tools/llm-tool/LLMTool.swift

+                // TODO: figure out ownership here
+                if cache == nil {
+                    cache = context.model.newCache(parameters: generate.generateParameters)
+                }


The cache is mutable and I am playing fast and loose with ownership here. To be fixed.

davidkoski · 2025-04-21T23:34:40Z

Tools/llm-tool/LLMTool.swift

+                // it should be a paramter to the higher level generate?
+                var iterator = try TokenIterator(
+                    input: input, model: context.model, cache: cache,
+                    parameters: generate.generateParameters)


I don't care about the iterator -- I think the generate() call should probably take an optional cache.

davidkoski · 2025-04-23T20:19:35Z

Libraries/MLXLMCommon/Evaluate.swift

@@ -729,10 +730,10 @@ public func generate(
 /// }
 /// ```
 public func generate(
-    input: LMInput, parameters: GenerateParameters, context: ModelContext
+    input: LMInput, cache: [KVCache]? = nil, parameters: GenerateParameters, context: ModelContext


Easily pass the optional cache in to the iterator

davidkoski · 2025-04-23T20:20:01Z

Libraries/MLXLMCommon/KVCache.swift

@@ -97,4 +97,7 @@ public class KVCacheSimple: KVCache, Evaluatable {
        )
    }

+    public var debugDescription: String {
+        "\(String(describing: Self.self)) \(Unmanaged.passUnretained(self).toOpaque()), offset: \(offset), step: \(step), keys: \(keys?.shape.description ?? "-"), values: \(values?.shape.description ?? "-")"


Just a convenience I needed while debugging Qwen2

davidkoski · 2025-04-23T20:26:20Z

Tools/llm-tool/Chat.swift

+                processing: media.processing,
+                images: media.images, videos: media.videos,
+                chat: [.system(generate.system)],
+                cache: context.model.newCache(parameters: parameters))


We may want a follow up PR for #276 to add the KVCache, but without the change to generate() it is hard to use.

davidkoski · 2025-04-23T20:28:31Z

Tools/llm-tool/LLMTool.swift

+}
+
+/// Argument package for supplying media files
+struct MediaArguments: ParsableArguments, Sendable {


Also split this out because we use it in eval and chat

chat example (command line

7f3d555

davidkoski commented Apr 21, 2025

View reviewed changes

Libraries/MLXLMCommon/UserInput.swift Outdated Show resolved Hide resolved

davidkoski commented Apr 21, 2025

View reviewed changes

davidkoski mentioned this pull request Apr 21, 2025

Proposal: Add MLX Chat Example App to the Repository #276

Closed

finish command line chat example

9d76c4e

davidkoski requested a review from awni April 23, 2025 20:18

davidkoski commented Apr 23, 2025

View reviewed changes

davidkoski marked this pull request as ready for review April 23, 2025 20:29

This was referenced Apr 23, 2025

chat / KVCache requires re-prepare of media #281

Open

LMInput restricts model input to a single collection of images and video frames #282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat example (command line) #277

chat example (command line) #277

davidkoski commented Apr 21, 2025 •

edited

Loading

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 21, 2025

davidkoski Apr 23, 2025

davidkoski Apr 23, 2025

davidkoski Apr 23, 2025

davidkoski Apr 23, 2025

chat example (command line) #277

Are you sure you want to change the base?

chat example (command line) #277

Conversation

davidkoski commented Apr 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkoski commented Apr 21, 2025 •

edited

Loading