clip : refactor clip_init, add tests #12757

ngxson · 2025-04-04T17:33:45Z

In this PR:

Add clip_model_loader
Add llava/tests.sh script which allow testing multiple models in one go

Smaller changes:

Add enum patch_merge_type, so that we no longer need to dostrcmp(const char)
Remove the bool has_(tensor name) pattern

Tests can be run via ./examples/llava/tests.sh script, you may need ~20GB to download model weights

Result:

OK:   llama-gemma3-cli ggml-org/gemma-3-4b-it-GGUF
OK:   llama-llava-cli guinmoon/MobileVLM-3B-GGUF
OK:   llama-llava-cli THUDM/glm-edge-v-5b-gguf
OK:   llama-llava-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
OK:   llama-llava-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K
OK:   llama-llava-cli ibm-research/granite-vision-3.2-2b-GGUF
OK:   llama-minicpmv-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
OK:   llama-minicpmv-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
OK:   llama-qwen2vl-cli bartowski/Qwen2-VL-2B-Instruct-GGUF

ggerganov

Very useful!

Does the Qwen2-VL test fail for you too? It segfaults on my mac:

...
0.01.596.379 I llama_context:        CPU  output buffer size =     0.58 MiB
0.01.596.383 I init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
0.01.607.493 I init:      Metal KV buffer size =   112.00 MiB
0.01.607.497 I llama_context: KV self size  =  112.00 MiB, K (f16):   56.00 MiB, V (f16):   56.00 MiB
0.01.620.985 I llama_context:      Metal compute buffer size =   299.75 MiB
0.01.620.986 I llama_context:        CPU compute buffer size =    11.51 MiB
0.01.620.986 I llama_context: graph nodes  = 1042
0.01.620.987 I llama_context: graph splits = 114
Segmentation fault: 11

examples/llava/clip.cpp

ggerganov · 2025-04-05T11:47:30Z

examples/llava/clip.h

+enum clip_log_level {
+    CLIP_LOG_NONE    = 0,
+    CLIP_LOG_ERROR   = 1,
+    CLIP_LOG_WARNING = 2,
+    CLIP_LOG_INFO    = 3,
+    CLIP_LOG_DEBUG   = 4,
+};
+


Suggested change

enum clip_log_level {

CLIP_LOG_NONE = 0,

CLIP_LOG_ERROR = 1,

CLIP_LOG_WARNING = 2,

CLIP_LOG_INFO = 3,

CLIP_LOG_DEBUG = 4,

};

enum clip_log_level {

CLIP_LOG_LEVEL_NONE = 0,

CLIP_LOG_LEVEL_ERROR = 1,

CLIP_LOG_LEVEL_WARNING = 2,

CLIP_LOG_LEVEL_INFO = 3,

CLIP_LOG_LEVEL_DEBUG = 4,

};

Also align the values with the existing ggml_log_level enum or even use it directly.

I also refactored the logging logic in 88aec68

(Most of the code copied from common/log.h)

ggerganov · 2025-04-05T11:58:25Z

examples/llava/tests.sh

+add_test "llama-gemma3-cli"   "ggml-org/gemma-3-4b-it-GGUF"
+add_test "llama-llava-cli"    "guinmoon/MobileVLM-3B-GGUF"
+add_test "llama-llava-cli"    "THUDM/glm-edge-v-5b-gguf"
+add_test "llama-llava-cli"    "second-state/Llava-v1.5-7B-GGUF:Q2_K"
+add_test "llama-llava-cli"    "cjpais/llava-1.6-mistral-7b-gguf:Q3_K"
+add_test "llama-llava-cli"    "ibm-research/granite-vision-3.2-2b-GGUF"
+add_test "llama-minicpmv-cli" "second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K" # model from openbmb is corrupted
+add_test "llama-minicpmv-cli" "openbmb/MiniCPM-V-2_6-gguf:Q2_K"
+add_test "llama-qwen2vl-cli"  "bartowski/Qwen2-VL-2B-Instruct-GGUF"


At some point we have to source all of these models from ggml-org, for 2 main reasons:

Stability (i.e. we know they won't disappear)

Safety (i.e. cannot be replaced with malicious versions)

Yes I completely agree with this.

Also FYI, I ran this test script on an A10G space on HF and they all passed. My space was a ipynb, but I think it would be nice if we can have a gradio space which we can simply enter the PR number or commit SHA to be tested.

examples/llava/tests.sh

examples/llava/clip-impl.h

examples/llava/clip.cpp

ngxson · 2025-04-05T12:25:13Z

Does the Qwen2-VL test fail for you too? It segfaults on my mac

No it doesn't. The command that I used is: llama-qwen2vl-cli -hf bartowski/Qwen2-VL-2B-Instruct-GGUF --image ... -p "what do you see"

If it still fails, could you try gdb or lldb to see the stack trace?

Co-authored-by: Georgi Gerganov <[email protected]>

ggerganov · 2025-04-05T12:42:01Z

The problem is for some reason the Metal crashes in the ggml_scale_inplace op:

Target 0: (llama-qwen2vl-cli) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x30000002b)
  * frame #0: 0x00000001031346c0 AGXMetalG14X`void AGX::ComputeContext<AGX::G14X::Encoders, AGX::G14X::Classes, AGX::G14X::ObjClasses, AGX::G14X::EncoderComputeServiceClasses>::setBuffer_impl<AGXBuffer>(AGXBuffer const*, unsigned long, unsigned int, unsigned long) + 64
    frame #1: 0x0000000100ecc288 libggml-metal.dylib`ggml_metal_encode_node(backend=0x0000600000020c00, idx=0, encoder=0x0000600001c70f30) at ggml-metal.m:1826:17
    frame #2: 0x0000000100eca890 libggml-metal.dylib`__ggml_backend_metal_set_n_cb_block_invoke(.block_descriptor=0x0000600003f8dfb0, iter=1) at ggml-metal.m:4962:25
    frame #3: 0x0000000100eca214 libggml-metal.dylib`ggml_metal_graph_compute(backend=0x0000600000020c00, gf=0x00000001300d8348) at ggml-metal.m:4539:13
    frame #4: 0x0000000100ec9eac libggml-metal.dylib`ggml_backend_metal_graph_compute(backend=0x0000600000020c00, cgraph=0x00000001300d8348) at ggml-metal.m:4942:12
    frame #5: 0x00000001010b7edc libggml-base.dylib`ggml_backend_graph_compute_async(backend=0x0000600000020c00, cgraph=0x00000001300d8348) at ggml-backend.cpp:334:12
    frame #6: 0x00000001010bb8f8 libggml-base.dylib`ggml_backend_sched_compute_splits(sched=0x000000012b310a00) at ggml-backend.cpp:1399:35
    frame #7: 0x00000001010bb588 libggml-base.dylib`ggml_backend_sched_graph_compute_async(sched=0x000000012b310a00, graph=0x0000000130008020) at ggml-backend.cpp:1590:12
    frame #8: 0x00000001010bb4f0 libggml-base.dylib`ggml_backend_sched_graph_compute(sched=0x000000012b310a00, graph=0x0000000130008020) at ggml-backend.cpp:1574:28
    frame #9: 0x000000010002d770 llama-qwen2vl-cli`clip_image_batch_encode(ctx=0x000000012a609a10, n_threads=16, imgs=0x000000016fdfd218, vec=0x0000000132388000) at clip.cpp:2651:19
    frame #10: 0x000000010002c688 llama-qwen2vl-cli`clip_image_encode(ctx=0x000000012a609a10, n_threads=16, img=0x00006000038727d0, vec=0x0000000132388000) at clip.cpp:2457:12
    frame #11: 0x0000000100017a04 llama-qwen2vl-cli`encode_image_with_clip(ctx_clip=0x000000012a609a10, n_threads=16, img=0x0000600002ef03c0, image_embd=0x0000000131c70000, n_img_pos=0x000000016fdfd5fc) at llava.cpp:277:27
    frame #12: 0x0000000100017694 llama-qwen2vl-cli`llava_image_embed_make_with_clip_img(ctx_clip=0x000000012a609a10, n_threads=16, img=0x0000600002ef03c0, image_embd_out=0x000000016fdfd668, n_img_pos_out=0x000000016fdfd664) at llava.cpp:430:10
    frame #13: 0x00000001000189d4 llama-qwen2vl-cli`llava_image_embed_make_with_bytes(ctx_clip=0x000000012a609a10, n_threads=16, image_bytes="\xff\xd8\xff\xe0", image_bytes_length=124071) at llava.cpp:503:31
    frame #14: 0x0000000100018b0c llama-qwen2vl-cli`llava_image_embed_make_with_filename(ctx_clip=0x000000012a609a10, n_threads=16, image_path="/Users/ggerganov/development/github/llama.cpp/examples/llava/test-1.jpeg") at llava.cpp:565:32
    frame #15: 0x0000000100004ccc llama-qwen2vl-cli`load_image(ctx_llava=0x0000600002ef03a0, params=0x000000016fdfd8e8, fname="/Users/ggerganov/development/github/llama.cpp/examples/llava/test-1.jpeg") at qwen2vl-cli.cpp:228:17
    frame #16: 0x0000000100004498 llama-qwen2vl-cli`main(argc=9, argv=0x000000016fdfed40) at qwen2vl-cli.cpp:565:34
    frame #17: 0x000000018bc04274 dyld`start + 2840

I think the Metal is not happy to have 2 different buffers point to the same data?

It crashes M1 Pro, M2 Ultra and M4 Max. Which chip do you have?

In any case, this patch fixes it:

diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
index 1399a29b6..dd9afc6b0 100644
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -465,7 +465,7 @@ static ggml_cgraph * clip_image_build_graph_siglip(clip_ctx * ctx, const clip_im
             V = ggml_cont(ctx0, ggml_permute(ctx0, V, 1, 2, 0, 3));
 
             struct ggml_tensor * KQ = ggml_mul_mat(ctx0, K, Q);
-            KQ = ggml_scale_inplace(ctx0, KQ, 1.0f / sqrtf((float)d_head));
+            KQ = ggml_scale(ctx0, KQ, 1.0f / sqrtf((float)d_head));
             KQ = ggml_soft_max_inplace(ctx0, KQ);
 
             struct ggml_tensor * KQV = ggml_mul_mat(ctx0, V, KQ);
@@ -721,7 +721,7 @@ static ggml_cgraph * clip_image_build_graph_legacy(clip_ctx * ctx, const clip_im
                     ctx0, Q, positions, nullptr,
                     d_head/2, mrope_sections, GGML_ROPE_TYPE_VISION, 32768, 10000, 1, 0, 1, 32, 1);
             }
-            Q = ggml_scale_inplace(ctx0, Q, 1.0f / sqrt((float)d_head));
+            Q = ggml_scale(ctx0, Q, 1.0f / sqrt((float)d_head));
             Q = ggml_cont(ctx0, ggml_permute(ctx0, Q, 0, 2, 1, 3));
             Q = ggml_reshape_3d(ctx0, Q, d_head, num_positions, n_head * batch_size);
 
@@ -1033,7 +1033,7 @@ static ggml_cgraph * clip_image_build_graph_legacy(clip_ctx * ctx, const clip_im
                 }
 
                 struct ggml_tensor * Q = ggml_add(ctx0, ggml_mul_mat(ctx0, model.mm_model_attn_q_w, q), model.mm_model_attn_q_b);
-                Q = ggml_scale_inplace(ctx0, Q, 1.0f / sqrt((float)d_head));
+                Q = ggml_scale(ctx0, Q, 1.0f / sqrt((float)d_head));
                 struct ggml_tensor * K = ggml_add(ctx0, ggml_mul_mat(ctx0, model.mm_model_attn_k_w, k), model.mm_model_attn_k_b);
                 struct ggml_tensor * V = ggml_add(ctx0, ggml_mul_mat(ctx0, model.mm_model_attn_v_w, v), model.mm_model_attn_v_b);
                 // permute

ngxson · 2025-04-05T12:47:31Z

It crashes M1 Pro, M2 Ultra and M4 Max. Which chip do you have?

I'm using M3 Max (a bit funny, but how do you have 1, 2, 4 but skip 3 😂 )

Can you also give a try with ggml_soft_max_ext? Something like this:

KQ = ggml_soft_max_ext(ctx0, KQ, nullptr, 1.0f / sqrtf((float)d_head), 0.0f);

ggerganov · 2025-04-05T13:25:22Z

I did some more debugging - it's not related to Metal, there is actually a legitimate bug somewhere. The reason is that the multi-rope is not supported by the Metal backend so it is offloaded to the CPU. When the next op is ggml_scale_inplace something goes wrong, though I am not 100% sure what exactly. This is the specific code that triggers it:

llama.cpp/examples/llava/clip.cpp

Lines 712 to 727 in 376f80a

    
           // self-attention 
        
           { 
        
               struct ggml_tensor * Q = 
        
                   ggml_add(ctx0, ggml_mul_mat(ctx0, model.layers[il].q_w, cur), model.layers[il].q_b); 
        
               Q = ggml_reshape_4d(ctx0, Q, d_head, n_head, num_positions, batch_size); 
        
               if (ctx->has_qwen2vl_merger) { 
        
                   Q = ggml_rope_multi( 
        
                       ctx0, Q, positions, nullptr, 
        
                       d_head/2, mrope_sections, GGML_ROPE_TYPE_VISION, 32768, 10000, 1, 0, 1, 32, 1); 
        
               } 
        
               Q = ggml_scale_inplace(ctx0, Q, 1.0f / sqrt((float)d_head)); 
        
               Q = ggml_cont(ctx0, ggml_permute(ctx0, Q, 0, 2, 1, 3)); 
        
               Q = ggml_reshape_3d(ctx0, Q, d_head, num_positions, n_head * batch_size);

Here is the problematic part of the generated graph. The node that crashes is # 42:

llama_context:      Metal compute buffer size =   299.75 MiB
llama_context:        CPU compute buffer size =    11.51 MiB
llama_context: graph nodes  = 1042
llama_context: graph splits = 114
## SPLIT #0: Metal # 1 inputs: [inp_raw (   3M)] 
node #  0 (    IM2COL):               node_0 (   1M) [Metal         ]:  v.patch_embd.weight (   1M) [Metal         ]      Metal#inp_raw#0 (   3M) [ NULL         ]
node #  3 (   MUL_MAT):               node_3 (   8M) [Metal         ]:           (reshaped) (   1M) [Metal         ] v.patch_embd.weight  (   1M) [Metal         ]
node #  6 (      CONT):  (reshaped) (permute (   8M) [Metal         ]:  (reshaped) (permute (   8M) [Metal         ]
node #  7 (    IM2COL):               node_7 (   1M) [Metal         ]: v.patch_embd.weight. (   1M) [Metal         ]      Metal#inp_raw#0 (   3M) [ NULL         ]
node # 10 (   MUL_MAT):              node_10 (   8M) [Metal         ]:           (reshaped) (   1M) [Metal         ] v.patch_embd.weight. (   1M) [Metal         ]
node # 13 (      CONT):  (reshaped) (permute (   8M) [Metal         ]:  (reshaped) (permute (   8M) [Metal         ]
node # 14 (       ADD):              node_14 (   8M) [Metal         ]:  (reshaped) (permute (   8M) [Metal         ]  (reshaped) (permute (   8M) [Metal         ]
node # 16 (      CONT):    (permuted) (cont) (   8M) [Metal         ]:           (permuted) (   8M) [Metal         ]
node # 20 (      CONT):  (permuted) (cont) ( (   8M) [Metal         ]:  (permuted) (cont) ( (   8M) [Metal         ]
node # 22 (      NORM):              node_22 (   8M) [Metal         ]:  (permuted) (cont) ( (   8M) [Metal         ]
node # 23 (       MUL):              node_23 (   8M) [Metal         ]:              node_22 (   8M) [Metal         ]   v.blk.0.ln1.weight (   5K) [Metal         ]
node # 24 (       ADD):              node_24 (   8M) [Metal         ]:              node_23 (   8M) [Metal         ]     v.blk.0.ln1.bias (   5K) [Metal         ]
node # 25 (   MUL_MAT):              node_25 (   8M) [Metal         ]: v.blk.0.attn_v.weigh (   3M) [Metal         ]              node_24 (   8M) [Metal         ]
node # 26 (       ADD):              node_26 (   8M) [Metal         ]:              node_25 (   8M) [Metal         ]  v.blk.0.attn_v.bias (   5K) [Metal         ]
node # 29 (      CONT):  (reshaped) (permute (   8M) [Metal         ]:  (reshaped) (permute (   8M) [Metal         ]
node # 31 (   MUL_MAT):              node_31 (   8M) [Metal         ]: v.blk.0.attn_k.weigh (   3M) [Metal         ]              node_24 (   8M) [Metal         ]
node # 32 (       ADD):              node_32 (   8M) [Metal         ]:              node_31 (   8M) [Metal         ]  v.blk.0.attn_k.bias (   5K) [Metal         ]
## SPLIT #1: CPU # 0 inputs
node # 34 (      ROPE):              node_34 (   8M) [  CPU         ]:           (reshaped) (   8M) [Metal         ]            positions (  25K) [  CPU         ]
## SPLIT #2: Metal # 1 inputs: [ (permuted) (   8M)] 
node # 36 (      CONT):    (permuted) (cont) (   8M) [Metal         ]:  Metal# (permuted)#0 (   8M) [ NULL         ]
node # 38 (   MUL_MAT):              node_38 (   8M) [Metal         ]: v.blk.0.attn_q.weigh (   3M) [Metal         ]              node_24 (   8M) [Metal         ]
node # 39 (       ADD):              node_39 (   8M) [Metal         ]:              node_38 (   8M) [Metal         ]  v.blk.0.attn_q.bias (   5K) [Metal         ]
## SPLIT #3: CPU # 0 inputs
node # 41 (      ROPE):              node_41 (   8M) [  CPU         ]:           (reshaped) (   8M) [Metal         ]            positions (  25K) [  CPU         ]
## SPLIT #4: Metal # 2 inputs: [node_41 (   8M)] [ (view) (permuted) (   8M)] 
node # 42 (     SCALE):               (view) (   8M) [Metal         ]:      Metal#node_41#0 (   8M) [ NULL         ]
node # 44 (      CONT):  (view) (permuted) ( (   8M) [Metal         ]: Metal# (view) (permu (   8M) [ NULL         ]
node # 46 (   MUL_MAT):              node_46 ( 167M) [Metal         ]:  (permuted) (cont) ( (   8M) [Metal         ]  (view) (permuted) ( (   8M) [Metal         ]
node # 47 (  SOFT_MAX):               (view) ( 167M) [Metal         ]:              node_46 ( 167M) [Metal         ]
node # 48 (   MUL_MAT):              node_48 (   8M) [Metal         ]:  (reshaped) (permute (   8M) [Metal         ]               (view) ( 167M) [Metal         ]
node # 51 (      CONT):  (reshaped) (permute (   8M) [Metal         ]:  (reshaped) (permute (   8M) [Metal         ]
...

Running with a debugger, the problems is that the buffer of the view_src of the scale node is bogus.

Simply changing the operation in the code above from ggml_scale_inplace to ggml_scale fixes the issue.

@slaren Do you have any guess what could be the root cause for this?

slaren · 2025-04-05T13:48:29Z

ggml_backend_sched cannot deal with this situation correctly. The output from the CPU backend is never copied back to the Metal backend because it is using an inplace operation, so rather than using the output from the CPU it is using the original tensor. It is best if explicit in-place operations are only used when strictly necessary, e.g. to modify a static tensor such as the KV cache. ggml-alloc will automatically make operations in-place when possible.

ggerganov · 2025-04-05T13:55:20Z

Got it.

@ngxson I think this is good to merge.

but how do you have 1, 2, 4 but skip 3

M3 seemed like a too minor upgrade. Not that M4 was really that significant either, but my M1 laptop was getting old and I needed a new one.

ngxson · 2025-04-05T14:03:25Z

Thanks for reviewing and testing this. I'll merge once CI is green.

In the last commit, I also fixed issue with Yi-VL model. Although the model passes "the NY times" image test, it doesn't seem to be able to describe more complex scene. I think the model is quite old anyway and judging by the number of downloads, I'm doubt if someone actually using it: https://huggingface.co/cmp-nct/Yi-VL-6B-GGUF

(Also leaving a link to the original PR here, for reference: #5093)

Anyway, it's truly a surprise to see how many models are supported by this clip/llava infrastructure. We're currently having 11 different model archs in the tests.sh

ngxson · 2025-04-05T14:07:58Z

examples/llava/tests.sh

+add_test "llama-gemma3-cli"   "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M"
+add_test "llama-llava-cli"    "cmp-nct/Yi-VL-6B-GGUF:Q5_K"
+add_test "llama-llava-cli"    "guinmoon/MobileVLM-3B-GGUF:Q4_K_M"
+add_test "llama-llava-cli"    "THUDM/glm-edge-v-5b-gguf:Q4_K_M"
+add_test "llama-llava-cli"    "second-state/Llava-v1.5-7B-GGUF:Q2_K"
+add_test "llama-llava-cli"    "cjpais/llava-1.6-mistral-7b-gguf:Q3_K"
+add_test "llama-llava-cli"    "ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M"
+add_test "llama-minicpmv-cli" "second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K" # model from openbmb is corrupted
+add_test "llama-minicpmv-cli" "openbmb/MiniCPM-V-2_6-gguf:Q2_K"
+add_test "llama-minicpmv-cli" "openbmb/MiniCPM-o-2_6-gguf:Q4_0"
+add_test "llama-qwen2vl-cli"  "bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M"


Btw @bartowski1182 do you have any other models to add to the list?

I don't think so, can't think of any other vision models off the top of my head, but i can take a closer look

LostRuins · 2025-04-29T15:20:03Z

@ngxson I think this PR might have broken clip quantization, https://github.com/ggml-org/llama.cpp/blob/master/examples/llava/clip-quantize-cli.cpp no longer works after this (determined by bisecting).

* refactor clip_init * fix loading file * fix style * test ok * better test with report * add missing headers * clarify * add KEY_MM_PATCH_MERGE_TYPE * remove bool has_* pattern * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <[email protected]> * use ggml_soft_max_ext * refactor logging system * add minicpm-v-o 2.6 for testing * use nullptr everywhere * fix Yi-VL model --------- Co-authored-by: Georgi Gerganov <[email protected]>

LostRuins · 2025-04-30T12:54:09Z

Well I found out why clip-quantize-cli was broken, since in #12869 ctx_gguf was changed to a smart pointer. The way clip_init() is structured, the clip_model_loader goes out of scope before the rest of clip_model_quantize runs. And then this PR removes the new_clip->ctx_gguf = ctx; line as well.

A very ugly hack to keep clip_model_loader in scope while doing a clip_model_quantize LostRuins@dbb6bbf

Don't know if such a band aid fix would be accepted here, but I'd be happy to PR it if desired.

ngxson · 2025-04-30T13:33:27Z

Tbh I don't really like the code of clip_model_quantize and it will be removed in near future, to be replaced with something more manageable.

And also, the quantization code can be completely outside of clip.cpp

ngxson added 4 commits April 4, 2025 18:45

refactor clip_init

44adfae

fix loading file

79c5656

fix style

dd50832

test ok

7b9e7b8

github-actions bot added the examples label Apr 4, 2025

ngxson added 5 commits April 4, 2025 22:17

better test with report

ee1fadd

add missing headers

b41acc4

clarify

6fe6846

add KEY_MM_PATCH_MERGE_TYPE

eeea35a

remove bool has_* pattern

17be2f9

ngxson marked this pull request as ready for review April 4, 2025 20:39

ngxson requested a review from ggerganov April 4, 2025 20:39

ggerganov approved these changes Apr 5, 2025

View reviewed changes

ngxson and others added 2 commits April 5, 2025 14:35

Apply suggestions from code review

853705e

Co-authored-by: Georgi Gerganov <[email protected]>

Update examples/llava/clip.cpp

376f80a

Co-authored-by: Georgi Gerganov <[email protected]>

use ggml_soft_max_ext

84b35d2

ngxson added 3 commits April 5, 2025 15:28

refactor logging system

88aec68

add minicpm-v-o 2.6 for testing

c4bb063

use nullptr everywhere

9d4baa6

fix Yi-VL model

13b2d8c

ngxson commented Apr 5, 2025

View reviewed changes

ngxson merged commit 0364178 into ggml-org:master Apr 5, 2025
51 checks passed

HimariO mentioned this pull request Apr 9, 2025

Feature Request: Support for Qwen2-VL #9246

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip : refactor clip_init, add tests #12757

clip : refactor clip_init, add tests #12757

ngxson commented Apr 4, 2025 •

edited

Loading

ggerganov left a comment •

edited

Loading

ggerganov Apr 5, 2025

ngxson Apr 5, 2025

ggerganov Apr 5, 2025

ngxson Apr 5, 2025

ngxson commented Apr 5, 2025 •

edited

Loading

ggerganov commented Apr 5, 2025

ngxson commented Apr 5, 2025

ggerganov commented Apr 5, 2025

slaren commented Apr 5, 2025

ggerganov commented Apr 5, 2025

ngxson commented Apr 5, 2025

ngxson Apr 5, 2025

bartowski1182 Apr 5, 2025

LostRuins commented Apr 29, 2025

LostRuins commented Apr 30, 2025

ngxson commented Apr 30, 2025

clip : refactor clip_init, add tests #12757

clip : refactor clip_init, add tests #12757

Conversation

ngxson commented Apr 4, 2025 • edited Loading

ggerganov left a comment • edited Loading

Choose a reason for hiding this comment

ggerganov Apr 5, 2025

Choose a reason for hiding this comment

ngxson Apr 5, 2025

Choose a reason for hiding this comment

ggerganov Apr 5, 2025

Choose a reason for hiding this comment

ngxson Apr 5, 2025

Choose a reason for hiding this comment

ngxson commented Apr 5, 2025 • edited Loading

ggerganov commented Apr 5, 2025

ngxson commented Apr 5, 2025

ggerganov commented Apr 5, 2025

slaren commented Apr 5, 2025

ggerganov commented Apr 5, 2025

ngxson commented Apr 5, 2025

ngxson Apr 5, 2025

Choose a reason for hiding this comment

bartowski1182 Apr 5, 2025

Choose a reason for hiding this comment

LostRuins commented Apr 29, 2025

LostRuins commented Apr 30, 2025

ngxson commented Apr 30, 2025

ngxson commented Apr 4, 2025 •

edited

Loading

ggerganov left a comment •

edited

Loading

ngxson commented Apr 5, 2025 •

edited

Loading