Skip to content

qwen 1.5 Beta 1.8B output incoherently #5459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sorasoras opened this issue Feb 12, 2024 · 10 comments
Closed

qwen 1.5 Beta 1.8B output incoherently #5459

sorasoras opened this issue Feb 12, 2024 · 10 comments

Comments

@sorasoras
Copy link

latest llama cpp output incoherently compare to Transformers output.

transformers/vllm work ok but llama cpp gguf does not

@zhengxingmao
Copy link

zhengxingmao commented Feb 20, 2024

+1 Both Qwen1.5-72B-Chat and Qwen-72B-Chat output incoherently. The old llama_cpp which nearly 2023 Dec worked normal.

@sorasoras
Copy link
Author

+1 Both Qwen1.5-72B-Chat and Qwen-72B-Chat output incoherently. The old llama_cpp which nearly 2023 Dec worked normal.

That's Great info to know. Can you pinpoint which version is the last version work ? If we can pinpoint which change is the cause of incoherence, it might get us close to solving the problem.

@riverzhou
Copy link

Same problem.

@zhengxingmao
Copy link

zhengxingmao commented Mar 1, 2024

There are some mistakes in model config files . I used the Qwen1.5'gguf from huaggingface which run successfully. Maybe relate to this PR https://huggingface.co/Qwen/Qwen1.5-72B-Chat/commit/bc11a298a0c6a5cd737064db62c6ad20ec6331be

@RonanKMcGovern
Copy link

RonanKMcGovern commented Mar 1, 2024 via email

@sorasoras
Copy link
Author

Hmm, I'm unsure that's the only issue. I chat-fine tuned and tried to quantize since then.

On Fri, Mar 1, 2024 at 1:36 AM weimy @.> wrote: There are some mistake in model config files . I used the Qwen1.5'gguf from huaggingface which run successfully. Maybe relate to this PR https://huggingface.co/Qwen/Qwen1.5-72B-Chat/commit/bc11a298a0c6a5cd737064db62c6ad20ec6331be — Reply to this email directly, view it on GitHub <#5459 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CVVN3PLRQJLRZ75QTDYV7LRFAVCNFSM6AAAAABDEP5PX6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGI4DQMBWGM . You are receiving this because you are subscribed to this thread.Message ID: @.>

mostly,but there might me some conf need to adjust in EOS in the conf of the original model.

@github-actions github-actions bot added the stale label Apr 3, 2024
@NineMeowICT
Copy link

So, is this problem solved?

@sorasoras
Copy link
Author

So, is this problem solved?

Not in the official repo

@github-actions github-actions bot removed the stale label Apr 13, 2024
@ChaoII
Copy link

ChaoII commented Apr 25, 2024

I have the same problem.

(llama) D:\llama.cpp\build\install\bin>main.exe -m D:/Qwen1.5-0.5B-Chat/ggml-model-f16.gguf -p "What's your name?"
Log start
main: build = 2725 (784e11de)
main: built with MSVC 19.35.32215.0 for
main: seed  = 1714032293
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from D:/Qwen1.5-0.5B-Chat/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.name str              = Qwen1.5-0.5B-Chat
llama_model_loader: - kv   2:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   3:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   4:                     qwen2.embedding_length u32              = 1024
llama_model_loader: - kv   5:                  qwen2.feed_forward_length u32              = 2816
llama_model_loader: - kv   6:                 qwen2.attention.head_count u32              = 16
llama_model_loader: - kv   7:              qwen2.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv   9:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 1
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  13:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  14:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  16:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  18:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type  f16:  170 tensors
llm_load_vocab: special tokens definition check successful ( 293/151936 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 151936
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 1024
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 2816
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 0.5B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 619.57 M
llm_load_print_meta: model size       = 1.15 GiB (16.00 BPW)
llm_load_print_meta: general.name     = Qwen1.5-0.5B-Chat
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
llm_load_tensors: ggml ctx size =    0.14 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/25 layers to GPU
llm_load_tensors:        CPU buffer size =  1181.97 MiB
....................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =    48.00 MiB
llama_new_context_with_model: KV self size  =   48.00 MiB, K (f16):   24.00 MiB, V (f16):   24.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.58 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   595.50 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     5.01 MiB
llama_new_context_with_model: graph nodes  = 846
llama_new_context_with_model: graph splits = 340

system_info: n_threads = 10 / 20 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 0


What's your name?<|im_end|> [end of text]

llama_print_timings:        load time =     361.57 ms
llama_print_timings:      sample time =       0.08 ms /     1 runs   (    0.08 ms per token, 12195.12 tokens per second)
llama_print_timings: prompt eval time =      37.07 ms /     5 tokens (    7.41 ms per token,   134.88 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =      38.76 ms /     6 tokens
Log end

@github-actions github-actions bot added the stale label May 26, 2024
Copy link
Contributor

github-actions bot commented Jun 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants