Implement Structured Chat Messages #257

ibrahimcetin · 2025-03-27T10:43:22Z

This PR introduces structured chat messages, addressing issue #234.

I initially made these changes a few days ago but held off on opening a PR because I wanted to create an example app to test them. However, I haven't had time recently, so I’m submitting this now to continue the discussion.

For the implementation, I chose to add a new chat case to the Prompt enum, as I believe it is the simplest and most compatible approach.

I plan to create a project to demonstrate these changes when I have time.

Looking forward to feedback!

davidkoski · 2025-04-01T17:54:16Z

Libraries/MLXLMCommon/Chat.swift

+        public let images: [UserInput.Image]
+
+        /// Array of video data associated with the message.
+        public let videos: [UserInput.Video]


Just a thought -- this seems like a better structure than UserInput (which is too flat). I think UserInput was largely a reflection of what mlx-vlm (python) had for the command line tool, and while it was OK for that it is too simplistic.

We don't have to solve it all in one PR, but if we adopt this I think we should start deprecating pieces in UserInput that overlap with this. Possibly we can cover these values (perhaps referring to the first or last message), but we shouldn't have two ways of representing images in a conversation.

At the very least, I think part of this PR should mark the images etc. properties in UserInput as deprecated.

It might also be a good idea to try and integrate this into the examples (specifically as a multi-turn chat) -- it will help us make sure this covers the use cases. Again, that doesn't have to happen in this PR, I think we can iterate on it a bit before we cut tags for a release.

Libraries/MLXLMCommon/Chat.swift

Libraries/MLXLMCommon/UserInput.swift

davidkoski · 2025-04-15T16:55:16Z

@ibrahimcetin sorry to take so long to get back to this! I have some suggested changes based on what I was saying / we were discussing about the structure of UserInput -- helping it convert fully to Chat.Message.

How would you like to proceed?

I can push commits to this PR and you can look at them / revert / whatever
I can comment with suggested changes, but I think this may be hard to deal with in the PR view
we can work to get this merged and then I can make a followup PR with my changes

I think the first option is easiest and will let us land this in a more complete state, but please let me know what you think!

ibrahimcetin · 2025-04-15T17:09:25Z

@davidkoski That works for me too, please feel free to proceed in the way that’s easiest for you.

I'm also sorry for the delay. I didn’t expect it to take this long. Starting this weekend I’ll be available again and I’m working on a project that uses these changes.

Thanks a lot for your patience and for following up!

davidkoski · 2025-04-16T15:37:48Z

I was thinking that we should somehow force the prompt to be in the new format, but after playing with that I think that isn't quite right. I think what you have right now is the right approach. We have:

simple text prompt
raw prompt in the format required by the model
structured prompt that can be converted into the raw prompt

Getting rid of the raw prompt isn't right, but we can convert the examples to use the structured prompt -- it is easier to read and portable across models.

ibrahimcetin · 2025-04-16T18:07:37Z

Yes, I agree with you and I believe these changes are mostly ready. We should also add some unit tests to ensure everything works as expected.

I also think the examples should be updated, so I’d suggest handling that in a separate PR.

Let me know if you have any other thoughts or suggestions.

davidkoski · 2025-04-16T20:25:50Z

I am concerned that the UserInput.images and prompt can get out of sync -- I think we can make that work using the same thing you did in the init for the Chat (that is what I am playing with now).

davidkoski · 2025-04-17T19:23:54Z

Trying to set up tests but the way the project is set up is making it difficult. I think we want tests to be define in the Package.swift, but those are not callable from the enclosing xcodeproj. I tried making a shadow unit test target that referenced the same files but it is unable to link against the local Package.swift (e.g. MLXLMCommon).

davidkoski · 2025-04-17T20:10:43Z

OK, got the tests building -- just something trivial to start with.

davidkoski · 2025-04-17T20:45:26Z

Libraries/MLXLMCommon/Chat.swift

+/// public func prepare(input: UserInput) async throws -> LMInput {
+///     let messages = Qwen2VLMessageGenerator().generate(from: input)
+///     ...
+/// ```


I added some documentation

davidkoski · 2025-04-17T20:46:57Z

Libraries/MLXLMCommon/UserInput.swift

+    ///
+    /// If the ``prompt-swift.property`` is a ``Prompt-swift.enum/chat(_:)`` this will
+    /// collect the images from the chat messages, otherwise these are the stored images with the ``UserInput``.
+    public var images: [Image] {


@ibrahimcetin I made two changes of note, see what you think of them. The first is to make images and video reflect the chat messages OR store the values. This way you can mutate the prompt and these properties will still contain the right contents.

The computed property compute images every time. To avoid this, we may use didSet on prompt property like this:

public var prompt: Prompt { didSet { switch prompt { case .text, .messages: break case .chat(let messages): _images = messages.reduce(into: []) { result, message in result.append(contentsOf: message.images) } _videos = messages.reduce(into: []) { result, message in result.append(contentsOf: message.videos) } } } } public var images: [Image] { get { _images } set { switch prompt { case .text, .messages: _images = newValue case .chat: break } } } public var videos: [Video] { get { _videos } set { switch prompt { case .text, .messages: _videos = newValue case .chat: break } } }

And we can update the computed properties like this.

Even, maybe we can just use the didSet and keep the images and videos as they were.

This is just a thought. What do you think?

yes, good idea -- I think that would simplify things

Libraries/MLXLMCommon/UserInput.swift

Tools/llm-tool/LLMTool.swift

davidkoski · 2025-04-17T20:56:43Z

I am going to rebase on main and then write some tests

davidkoski · 2025-04-17T21:53:35Z

@ibrahimcetin I really like what you have built! Take a look at the couple of changes I made and see what you think. I also added tests and some documentation.

I think in a follow-on PR(s) we can:

convert the rest of the examples to use the structured format
consider this code https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/prompt_utils.py#L23
- we might want to create generic forms like this so that they can be reused across several models
convert the other VLMs to use MessageGenerators

and then perhaps:

implement a chat style example -- see what kind of interaction/API is useful there

ibrahimcetin · 2025-04-18T20:49:29Z

@davidkoski Thank you for these changes, and most of the changes look good to me. Please review my comments on the code.

I agree with you on following-on PRs and I can update the examples as soon as this PR is merged. I also consider to update ToolSpec similar to the ollama-swift tool.

davidkoski

Awesome addition, thank you!

davidkoski · 2025-04-18T21:16:28Z

I didn't see any new comments -- anything left before we merge?

ibrahimcetin · 2025-04-18T20:24:55Z

Libraries/MLXVLM/Models/SmolVLM2.swift

@@ -221,7 +221,7 @@ public class SmolVLMProcessor: UserInputProcessor {
    }

    public func prepare(input: UserInput) async throws -> LMInput {
-        let messages = input.prompt.asMessages()
+        let messages = Qwen2VLMessageGenerator().generate(from: input)  // TODO: Create SmolVLM2MessageGenerator


I temporarily used Qwen2VLMessageGenerator, but I’m not certain if this is correct. Therefore, we might need to consider adding SmolVLM2MessageGenerator.

I guess it is roughly equivalent to what we had before with the raw messages. Perhaps we address it with the followup:

consider this code https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/prompt_utils.py#L23

Tools/llm-tool/LLMTool.swift

Libraries/MLXLMCommon/UserInput.swift

ibrahimcetin · 2025-04-18T20:40:29Z

Libraries/MLXLMCommon/UserInput.swift

+    ///
+    /// If the ``prompt-swift.property`` is a ``Prompt-swift.enum/chat(_:)`` this will
+    /// collect the images from the chat messages, otherwise these are the stored images with the ``UserInput``.
+    public var images: [Image] {


The computed property compute images every time. To avoid this, we may use didSet on prompt property like this:

public var prompt: Prompt { didSet { switch prompt { case .text, .messages: break case .chat(let messages): _images = messages.reduce(into: []) { result, message in result.append(contentsOf: message.images) } _videos = messages.reduce(into: []) { result, message in result.append(contentsOf: message.videos) } } } } public var images: [Image] { get { _images } set { switch prompt { case .text, .messages: _images = newValue case .chat: break } } } public var videos: [Video] { get { _videos } set { switch prompt { case .text, .messages: _videos = newValue case .chat: break } } }

And we can update the computed properties like this.

Even, maybe we can just use the didSet and keep the images and videos as they were.

ibrahimcetin · 2025-04-18T21:28:57Z

I didn't see any new comments -- anything left before we merge?

My mistake, sorry. Now you are able to see.

davidkoski · 2025-04-18T23:14:28Z

Good feedback! I made the changes to the images/videos properties as you suggested -- ready to merge?

ibrahimcetin · 2025-04-18T23:16:38Z

Everything looks good to me.

davidkoski

Excellent addition, thank you!

davidkoski reviewed Apr 1, 2025

View reviewed changes

Libraries/MLXLMCommon/Chat.swift Outdated Show resolved Hide resolved

davidkoski reviewed Apr 1, 2025

View reviewed changes

Libraries/MLXLMCommon/UserInput.swift Show resolved Hide resolved

ibrahimcetin force-pushed the structured-messages branch from b70781e to 9cf39e7 Compare April 10, 2025 18:26

davidkoski mentioned this pull request Apr 15, 2025

llm-tool / VLMEval does not build prompt correctly when used with images/video #270

Closed

davidkoski reviewed Apr 17, 2025

View reviewed changes

Libraries/MLXLMCommon/UserInput.swift Show resolved Hide resolved

davidkoski reviewed Apr 17, 2025

View reviewed changes

Tools/llm-tool/LLMTool.swift Show resolved Hide resolved

ibrahimcetin and others added 12 commits April 17, 2025 13:56

Initial implementation for structured chat messages

016af9f

Move Qwen2VLMessageGenerator

52c568e

Do not require images and videos in init(messages:)

0b0f267

Refactor message generation to use unified generate(from:) method

47ea79e

Use Qwen2VLMessageGenerator temporarily

c723f56

swift-format

f8357cf

add stub tests

6826c78

add documentation, images/video are live

f08fbee

convert this to the new structured format

d3518e4

fix ml-explore#270

e450998

update qwen2.5 with structured message change

3b49a8d

implement tests

5f439e9

davidkoski force-pushed the structured-messages branch from 01b974f to 5f439e9 Compare April 17, 2025 21:46

davidkoski added 3 commits April 17, 2025 14:54

add scheme

a7e98dc

correct scheme name

f55502c

adjust deployment targets to run on CI

a2dd872

davidkoski approved these changes Apr 18, 2025

View reviewed changes

ibrahimcetin commented Apr 18, 2025

View reviewed changes

ibrahimcetin requested a review from davidkoski April 18, 2025 22:34

address PR feedback

0015016

davidkoski approved these changes Apr 18, 2025

View reviewed changes

davidkoski merged commit 2b78ff9 into ml-explore:main Apr 18, 2025
3 checks passed

adrgrondin mentioned this pull request Apr 30, 2025

Fix UserInput init(messages:images:videos:tools:additionalContext:) broken for VLM #298

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Structured Chat Messages #257

Implement Structured Chat Messages #257

ibrahimcetin commented Mar 27, 2025

davidkoski Apr 1, 2025

davidkoski Apr 1, 2025 •

edited

Loading

davidkoski commented Apr 15, 2025

ibrahimcetin commented Apr 15, 2025

davidkoski commented Apr 16, 2025

ibrahimcetin commented Apr 16, 2025

davidkoski commented Apr 16, 2025

davidkoski commented Apr 17, 2025

davidkoski commented Apr 17, 2025

davidkoski Apr 17, 2025

davidkoski Apr 17, 2025

ibrahimcetin Apr 18, 2025

ibrahimcetin Apr 18, 2025

davidkoski Apr 18, 2025

davidkoski commented Apr 17, 2025

davidkoski commented Apr 17, 2025 •

edited

Loading

ibrahimcetin commented Apr 18, 2025 •

edited

Loading

davidkoski left a comment

davidkoski commented Apr 18, 2025

ibrahimcetin Apr 18, 2025

davidkoski Apr 18, 2025

ibrahimcetin Apr 18, 2025

ibrahimcetin commented Apr 18, 2025

davidkoski commented Apr 18, 2025

ibrahimcetin commented Apr 18, 2025

davidkoski left a comment

Implement Structured Chat Messages #257

Implement Structured Chat Messages #257

Conversation

ibrahimcetin commented Mar 27, 2025

Choose a reason for hiding this comment

davidkoski Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

davidkoski commented Apr 15, 2025

ibrahimcetin commented Apr 15, 2025

davidkoski commented Apr 16, 2025

ibrahimcetin commented Apr 16, 2025

davidkoski commented Apr 16, 2025

davidkoski commented Apr 17, 2025

davidkoski commented Apr 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkoski commented Apr 17, 2025

davidkoski commented Apr 17, 2025 • edited Loading

ibrahimcetin commented Apr 18, 2025 • edited Loading

davidkoski left a comment

Choose a reason for hiding this comment

davidkoski commented Apr 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibrahimcetin commented Apr 18, 2025

davidkoski commented Apr 18, 2025

ibrahimcetin commented Apr 18, 2025

davidkoski left a comment

Choose a reason for hiding this comment

davidkoski Apr 1, 2025 •

edited

Loading

davidkoski commented Apr 17, 2025 •

edited

Loading

ibrahimcetin commented Apr 18, 2025 •

edited

Loading