llm-tool / VLMEval does not build prompt correctly when used with images/video #270

davidkoski · 2025-04-15T17:38:51Z

In LLMTool.swift it doesn't build the prompt correctly when there are images or video:

        let messages: [[String: Any]] =
            if !images.isEmpty || !videos.isEmpty {
                [
                    [
                        "role": "user",
                        "content": [
                            [
                                "type": "text",
                                "text": generate.system,
                            ]
                        ]

It looks like this was inadvertently added in #206 -- it needs to include the use prompt. I think this should probably generate two dictionaries and this first one should be role: system.

I think we can fix this in #257

The text was updated successfully, but these errors were encountered:

davidkoski self-assigned this Apr 15, 2025

davidkoski mentioned this issue Apr 17, 2025

Implement Structured Chat Messages #257

Merged

davidkoski added a commit to ibrahimcetin/mlx-swift-examples that referenced this issue Apr 17, 2025

fix ml-explore#270

01b974f

davidkoski added a commit to ibrahimcetin/mlx-swift-examples that referenced this issue Apr 17, 2025

fix ml-explore#270

e450998

davidkoski closed this as completed in 2b78ff9 Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-tool / VLMEval does not build prompt correctly when used with images/video #270

llm-tool / VLMEval does not build prompt correctly when used with images/video #270

davidkoski commented Apr 15, 2025

llm-tool / VLMEval does not build prompt correctly when used with images/video #270

llm-tool / VLMEval does not build prompt correctly when used with images/video #270

Comments

davidkoski commented Apr 15, 2025