Skip to content

Implement Structured Chat Messages #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 18, 2025

Conversation

ibrahimcetin
Copy link
Contributor

This PR introduces structured chat messages, addressing issue #234.

I initially made these changes a few days ago but held off on opening a PR because I wanted to create an example app to test them. However, I haven't had time recently, so I’m submitting this now to continue the discussion.

For the implementation, I chose to add a new chat case to the Prompt enum, as I believe it is the simplest and most compatible approach.

I plan to create a project to demonstrate these changes when I have time.

Looking forward to feedback!

public let images: [UserInput.Image]

/// Array of video data associated with the message.
public let videos: [UserInput.Video]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought -- this seems like a better structure than UserInput (which is too flat). I think UserInput was largely a reflection of what mlx-vlm (python) had for the command line tool, and while it was OK for that it is too simplistic.

We don't have to solve it all in one PR, but if we adopt this I think we should start deprecating pieces in UserInput that overlap with this. Possibly we can cover these values (perhaps referring to the first or last message), but we shouldn't have two ways of representing images in a conversation.

At the very least, I think part of this PR should mark the images etc. properties in UserInput as deprecated.

Copy link
Collaborator

@davidkoski davidkoski Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be a good idea to try and integrate this into the examples (specifically as a multi-turn chat) -- it will help us make sure this covers the use cases. Again, that doesn't have to happen in this PR, I think we can iterate on it a bit before we cut tags for a release.

@davidkoski
Copy link
Collaborator

@ibrahimcetin sorry to take so long to get back to this! I have some suggested changes based on what I was saying / we were discussing about the structure of UserInput -- helping it convert fully to Chat.Message.

How would you like to proceed?

  • I can push commits to this PR and you can look at them / revert / whatever
  • I can comment with suggested changes, but I think this may be hard to deal with in the PR view
  • we can work to get this merged and then I can make a followup PR with my changes

I think the first option is easiest and will let us land this in a more complete state, but please let me know what you think!

@ibrahimcetin
Copy link
Contributor Author

@davidkoski That works for me too, please feel free to proceed in the way that’s easiest for you.

I'm also sorry for the delay. I didn’t expect it to take this long. Starting this weekend I’ll be available again and I’m working on a project that uses these changes.

Thanks a lot for your patience and for following up!

@davidkoski
Copy link
Collaborator

I was thinking that we should somehow force the prompt to be in the new format, but after playing with that I think that isn't quite right. I think what you have right now is the right approach. We have:

  • simple text prompt
  • raw prompt in the format required by the model
  • structured prompt that can be converted into the raw prompt

Getting rid of the raw prompt isn't right, but we can convert the examples to use the structured prompt -- it is easier to read and portable across models.

@ibrahimcetin
Copy link
Contributor Author

Yes, I agree with you and I believe these changes are mostly ready. We should also add some unit tests to ensure everything works as expected.

I also think the examples should be updated, so I’d suggest handling that in a separate PR.

Let me know if you have any other thoughts or suggestions.

@davidkoski
Copy link
Collaborator

I am concerned that the UserInput.images and prompt can get out of sync -- I think we can make that work using the same thing you did in the init for the Chat (that is what I am playing with now).

@davidkoski
Copy link
Collaborator

Trying to set up tests but the way the project is set up is making it difficult. I think we want tests to be define in the Package.swift, but those are not callable from the enclosing xcodeproj. I tried making a shadow unit test target that referenced the same files but it is unable to link against the local Package.swift (e.g. MLXLMCommon).

@davidkoski
Copy link
Collaborator

OK, got the tests building -- just something trivial to start with.

/// public func prepare(input: UserInput) async throws -> LMInput {
/// let messages = Qwen2VLMessageGenerator().generate(from: input)
/// ...
/// ```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some documentation

///
/// If the ``prompt-swift.property`` is a ``Prompt-swift.enum/chat(_:)`` this will
/// collect the images from the chat messages, otherwise these are the stored images with the ``UserInput``.
public var images: [Image] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibrahimcetin I made two changes of note, see what you think of them. The first is to make images and video reflect the chat messages OR store the values. This way you can mutate the prompt and these properties will still contain the right contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computed property compute images every time. To avoid this, we may use didSet on prompt property like this:

public var prompt: Prompt {
        didSet {
            switch prompt {
            case .text, .messages:
               break
            case .chat(let messages):
                _images = messages.reduce(into: []) { result, message in
                    result.append(contentsOf: message.images)
                }

                _videos = messages.reduce(into: []) { result, message in
                    result.append(contentsOf: message.videos)
                }
            }
        }
    }

public var images: [Image] {
        get {
            _images
        }
        set {
            switch prompt {
            case .text, .messages:
                _images = newValue
            case .chat:
                break
            }
        }
    }

public var videos: [Video] {
        get {
            _videos
        }
        set {
            switch prompt {
            case .text, .messages:
                _videos = newValue
            case .chat:
                break
            }
        }
    }

And we can update the computed properties like this.

Even, maybe we can just use the didSet and keep the images and videos as they were.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a thought. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea -- I think that would simplify things

@davidkoski
Copy link
Collaborator

I am going to rebase on main and then write some tests

@davidkoski davidkoski force-pushed the structured-messages branch from 01b974f to 5f439e9 Compare April 17, 2025 21:46
@davidkoski
Copy link
Collaborator

davidkoski commented Apr 17, 2025

@ibrahimcetin I really like what you have built! Take a look at the couple of changes I made and see what you think. I also added tests and some documentation.

I think in a follow-on PR(s) we can:

and then perhaps:

  • implement a chat style example -- see what kind of interaction/API is useful there

@ibrahimcetin
Copy link
Contributor Author

ibrahimcetin commented Apr 18, 2025

@davidkoski Thank you for these changes, and most of the changes look good to me. Please review my comments on the code.

I agree with you on following-on PRs and I can update the examples as soon as this PR is merged. I also consider to update ToolSpec similar to the ollama-swift tool.

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome addition, thank you!

@davidkoski
Copy link
Collaborator

I didn't see any new comments -- anything left before we merge?

@@ -221,7 +221,7 @@ public class SmolVLMProcessor: UserInputProcessor {
}

public func prepare(input: UserInput) async throws -> LMInput {
let messages = input.prompt.asMessages()
let messages = Qwen2VLMessageGenerator().generate(from: input) // TODO: Create SmolVLM2MessageGenerator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I temporarily used Qwen2VLMessageGenerator, but I’m not certain if this is correct. Therefore, we might need to consider adding SmolVLM2MessageGenerator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is roughly equivalent to what we had before with the raw messages. Perhaps we address it with the followup:

///
/// If the ``prompt-swift.property`` is a ``Prompt-swift.enum/chat(_:)`` this will
/// collect the images from the chat messages, otherwise these are the stored images with the ``UserInput``.
public var images: [Image] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computed property compute images every time. To avoid this, we may use didSet on prompt property like this:

public var prompt: Prompt {
        didSet {
            switch prompt {
            case .text, .messages:
               break
            case .chat(let messages):
                _images = messages.reduce(into: []) { result, message in
                    result.append(contentsOf: message.images)
                }

                _videos = messages.reduce(into: []) { result, message in
                    result.append(contentsOf: message.videos)
                }
            }
        }
    }

public var images: [Image] {
        get {
            _images
        }
        set {
            switch prompt {
            case .text, .messages:
                _images = newValue
            case .chat:
                break
            }
        }
    }

public var videos: [Video] {
        get {
            _videos
        }
        set {
            switch prompt {
            case .text, .messages:
                _videos = newValue
            case .chat:
                break
            }
        }
    }

And we can update the computed properties like this.

Even, maybe we can just use the didSet and keep the images and videos as they were.

@ibrahimcetin
Copy link
Contributor Author

I didn't see any new comments -- anything left before we merge?

My mistake, sorry. Now you are able to see.

@ibrahimcetin ibrahimcetin requested a review from davidkoski April 18, 2025 22:34
@davidkoski
Copy link
Collaborator

Good feedback! I made the changes to the images/videos properties as you suggested -- ready to merge?

@ibrahimcetin
Copy link
Contributor Author

Everything looks good to me.

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent addition, thank you!

@davidkoski davidkoski merged commit 2b78ff9 into ml-explore:main Apr 18, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants