-
Notifications
You must be signed in to change notification settings - Fork 209
chat example (command line) #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
queries = rotaryEmbedding(queries, offset: offset) | ||
keys = rotaryEmbedding(keys, offset: offset) | ||
|
||
if let cache { | ||
(keys, values) = cache.update(keys: keys, values: values) | ||
} | ||
|
||
let mask = mask?[.ellipsis, 0 ..< keys.dim(-2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the dimensions were mismatched on a second message -- the keys dimension needs to be considered after the KVCache. In mlx-vlm it works out ok because:
- it looks like KVCache isn't used persistently
- the KVCache implementation doesn't window on the cache
/// Command line arguments for controlling generation of text. | ||
struct GenerateArguments: ParsableArguments, Sendable { | ||
|
||
struct PromptArguments: ParsableArguments, Sendable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this out because chat
doesn't need a prompt.
Tools/llm-tool/LLMTool.swift
Outdated
size = CGSize(width: v0, height: v1) | ||
} | ||
userInput.processing.resize = size | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processing part is the only thing that is interesting I think -- this should probably go into reusable arguments anyway
Tools/llm-tool/LLMTool.swift
Outdated
let modelContainer = try await memory.start { [args] in | ||
do { | ||
return try await args.load( | ||
defaultModel: defaultModel.name, modelFactory: LLMModelFactory.shared) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how do we know VLM vs LLM? It could probe the registry but what if the id isn't registered? We can't rely on an image in the parameters because the user may load it at chat-time.
Tools/llm-tool/LLMTool.swift
Outdated
|
||
// TODO: need to figure out the proper ownrship for this -- maybe the loop | ||
// below needs to go inside the context? | ||
var cache: [KVCache]? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs work, but I think a chat application should show how to use a KVCache
Tools/llm-tool/LLMTool.swift
Outdated
if chat.count >= 2 { | ||
chatWithMedia[1].images = images | ||
chatWithMedia[1].videos = videos | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on this -- it works great with a single set of media, but the model / image processing code doesn't expect two messages with independent sets of media.
On the swift side:
public struct LMInput {
public let text: Text
public let image: ProcessedImage?
public let video: ProcessedVideo?
Tokens + one set of media. The python code in mlx-vlm has a similar setup. I think maybe that should be:
public struct LMInput {
public let text: Text
public let image: [ProcessedImage]
public let video: [ProcessedVideo]
and then the model can inject the appropriate piece.
The Hashable
conformance in the UserInput
was me thinking that perhaps we should cache the ProcessedImage
/ Video
so it doesn't need to be reprocessed.
Another option is to use the KVCache
-- it will capture the image embedding and we could then omit the image from subsequent generations. Except we need to represent it space-wise so the size of the context remains consistent (we can't just drop the image).
FWIW @Blaizzy mlx-vlm's chat implementation has a problem here I think:
The code only presents the image for the first message and then discards it. It doesn't use a KVCache so it actually "forgets" what the image is, though the model is pretty good at faking it:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
what animal is in the picture. answer in english<|im_end|>
<|im_start|>assistant
The animal in the picture is a dog.<|im_end|>
<|im_start|>user
what is the dog wearing?<|im_end|>
<|im_start|>assistant
The dog is wearing a collar.<|im_end|>
<|im_start|>user
what is behind the dog?<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant
There is a person behind the dog.<|im_end|>
There are maybe two problems here:
- dropping the image data
- the fact that the image is attached to the last message -- I found that if the image data is present it needs to be associated with the first message that refers to it
Anyway, with just a single image I think this is pretty close. It needs to be cleaned up, but it works. Maybe we only deal with one image set in the thread for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibrahimcetin an example of the multi-image in chat issue:

so perhaps we document that as an issue (or disable to ability to add multiple media sets) until we figure it out. I don't think it needs to block the chat examples.
Tools/llm-tool/LLMTool.swift
Outdated
// TODO: figure out ownership here | ||
if cache == nil { | ||
cache = context.model.newCache(parameters: generate.generateParameters) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is mutable and I am playing fast and loose with ownership here. To be fixed.
Tools/llm-tool/LLMTool.swift
Outdated
// it should be a paramter to the higher level generate? | ||
var iterator = try TokenIterator( | ||
input: input, model: context.model, cache: cache, | ||
parameters: generate.generateParameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care about the iterator -- I think the generate()
call should probably take an optional cache.
@@ -729,10 +730,10 @@ public func generate( | |||
/// } | |||
/// ``` | |||
public func generate( | |||
input: LMInput, parameters: GenerateParameters, context: ModelContext | |||
input: LMInput, cache: [KVCache]? = nil, parameters: GenerateParameters, context: ModelContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easily pass the optional cache in to the iterator
@@ -97,4 +97,7 @@ public class KVCacheSimple: KVCache, Evaluatable { | |||
) | |||
} | |||
|
|||
public var debugDescription: String { | |||
"\(String(describing: Self.self)) \(Unmanaged.passUnretained(self).toOpaque()), offset: \(offset), step: \(step), keys: \(keys?.shape.description ?? "-"), values: \(values?.shape.description ?? "-")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a convenience I needed while debugging Qwen2
processing: media.processing, | ||
images: media.images, videos: media.videos, | ||
chat: [.system(generate.system)], | ||
cache: context.model.newCache(parameters: parameters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want a follow up PR for #276 to add the KVCache, but without the change to generate()
it is hard to use.
} | ||
|
||
/// Argument package for supplying media files | ||
struct MediaArguments: ParsableArguments, Sendable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also split this out because we use it in eval
and chat
@ibrahimcetin here is the command line chat