qwen 1.5 Beta 1.8B output incoherently #5459

sorasoras · 2024-02-12T10:44:12Z

latest llama cpp output incoherently compare to Transformers output.

transformers/vllm work ok but llama cpp gguf does not

zhengxingmao · 2024-02-20T06:01:29Z

+1 Both Qwen1.5-72B-Chat and Qwen-72B-Chat output incoherently. The old llama_cpp which nearly 2023 Dec worked normal.

sorasoras · 2024-02-20T07:39:23Z

+1 Both Qwen1.5-72B-Chat and Qwen-72B-Chat output incoherently. The old llama_cpp which nearly 2023 Dec worked normal.

That's Great info to know. Can you pinpoint which version is the last version work ? If we can pinpoint which change is the cause of incoherence, it might get us close to solving the problem.

riverzhou · 2024-02-20T13:09:49Z

Same problem.

zhengxingmao · 2024-03-01T01:36:06Z

There are some mistakes in model config files . I used the Qwen1.5'gguf from huaggingface which run successfully. Maybe relate to this PR https://huggingface.co/Qwen/Qwen1.5-72B-Chat/commit/bc11a298a0c6a5cd737064db62c6ad20ec6331be

RonanKMcGovern · 2024-03-01T17:15:11Z

Hmm, I'm unsure that's the only issue. I chat-fine tuned and tried to quantize since then.

…

On Fri, Mar 1, 2024 at 1:36 AM weimy ***@***.***> wrote: There are some mistake in model config files . I used the Qwen1.5'gguf from huaggingface which run successfully. Maybe relate to this PR https://huggingface.co/Qwen/Qwen1.5-72B-Chat/commit/bc11a298a0c6a5cd737064db62c6ad20ec6331be — Reply to this email directly, view it on GitHub <#5459 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASVG6CVVN3PLRQJLRZ75QTDYV7LRFAVCNFSM6AAAAABDEP5PX6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGI4DQMBWGM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

sorasoras · 2024-03-01T18:43:47Z

Hmm, I'm unsure that's the only issue. I chat-fine tuned and tried to quantize since then.
…
On Fri, Mar 1, 2024 at 1:36 AM weimy @.> wrote: There are some mistake in model config files . I used the Qwen1.5'gguf from huaggingface which run successfully. Maybe relate to this PR https://huggingface.co/Qwen/Qwen1.5-72B-Chat/commit/bc11a298a0c6a5cd737064db62c6ad20ec6331be — Reply to this email directly, view it on GitHub <#5459 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CVVN3PLRQJLRZ75QTDYV7LRFAVCNFSM6AAAAABDEP5PX6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGI4DQMBWGM . You are receiving this because you are subscribed to this thread.Message ID: @.>

mostly,but there might me some conf need to adjust in EOS in the conf of the original model.

NineMeowICT · 2024-04-12T12:42:24Z

So, is this problem solved?

sorasoras · 2024-04-12T14:07:06Z

So, is this problem solved?

Not in the official repo

ChaoII · 2024-04-25T08:06:54Z

I have the same problem.

(llama) D:\llama.cpp\build\install\bin>main.exe -m D:/Qwen1.5-0.5B-Chat/ggml-model-f16.gguf -p "What's your name?"
Log start
main: build = 2725 (784e11de)
main: built with MSVC 19.35.32215.0 for
main: seed  = 1714032293
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from D:/Qwen1.5-0.5B-Chat/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.name str              = Qwen1.5-0.5B-Chat
llama_model_loader: - kv   2:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   3:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   4:                     qwen2.embedding_length u32              = 1024
llama_model_loader: - kv   5:                  qwen2.feed_forward_length u32              = 2816
llama_model_loader: - kv   6:                 qwen2.attention.head_count u32              = 16
llama_model_loader: - kv   7:              qwen2.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv   9:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 1
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  13:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  14:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  16:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  18:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type  f16:  170 tensors
llm_load_vocab: special tokens definition check successful ( 293/151936 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 151936
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 1024
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 2816
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 0.5B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 619.57 M
llm_load_print_meta: model size       = 1.15 GiB (16.00 BPW)
llm_load_print_meta: general.name     = Qwen1.5-0.5B-Chat
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
llm_load_tensors: ggml ctx size =    0.14 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/25 layers to GPU
llm_load_tensors:        CPU buffer size =  1181.97 MiB
....................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =    48.00 MiB
llama_new_context_with_model: KV self size  =   48.00 MiB, K (f16):   24.00 MiB, V (f16):   24.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.58 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   595.50 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     5.01 MiB
llama_new_context_with_model: graph nodes  = 846
llama_new_context_with_model: graph splits = 340

system_info: n_threads = 10 / 20 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 0


What's your name?<|im_end|> [end of text]

llama_print_timings:        load time =     361.57 ms
llama_print_timings:      sample time =       0.08 ms /     1 runs   (    0.08 ms per token, 12195.12 tokens per second)
llama_print_timings: prompt eval time =      37.07 ms /     5 tokens (    7.41 ms per token,   134.88 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =      38.76 ms /     6 tokens
Log end

github-actions · 2024-06-09T01:07:13Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

sorasoras added the bug-unconfirmed label Feb 12, 2024

sorasoras mentioned this issue Feb 21, 2024

support qwen2 #5037

Merged

github-actions bot added the stale label Apr 3, 2024

github-actions bot removed the stale label Apr 13, 2024

github-actions bot added the stale label May 26, 2024

github-actions bot closed this as completed Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen 1.5 Beta 1.8B output incoherently #5459

qwen 1.5 Beta 1.8B output incoherently #5459

sorasoras commented Feb 12, 2024

zhengxingmao commented Feb 20, 2024 •

edited

Loading

Uh oh!

sorasoras commented Feb 20, 2024

Uh oh!

riverzhou commented Feb 20, 2024

Uh oh!

zhengxingmao commented Mar 1, 2024 •

edited

Loading

Uh oh!

RonanKMcGovern commented Mar 1, 2024 via email

Uh oh!

sorasoras commented Mar 1, 2024

Uh oh!

NineMeowICT commented Apr 12, 2024

Uh oh!

sorasoras commented Apr 12, 2024

Uh oh!

ChaoII commented Apr 25, 2024

Uh oh!

github-actions bot commented Jun 9, 2024

Uh oh!

qwen 1.5 Beta 1.8B output incoherently #5459

qwen 1.5 Beta 1.8B output incoherently #5459

Comments

sorasoras commented Feb 12, 2024

zhengxingmao commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sorasoras commented Feb 20, 2024

Uh oh!

riverzhou commented Feb 20, 2024

Uh oh!

zhengxingmao commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RonanKMcGovern commented Mar 1, 2024 via email

Uh oh!

sorasoras commented Mar 1, 2024

Uh oh!

NineMeowICT commented Apr 12, 2024

Uh oh!

sorasoras commented Apr 12, 2024

Uh oh!

ChaoII commented Apr 25, 2024

Uh oh!

github-actions bot commented Jun 9, 2024

Uh oh!

zhengxingmao commented Feb 20, 2024 •

edited

Loading

zhengxingmao commented Mar 1, 2024 •

edited

Loading