-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Large model starts to repeat itself / gets stuck on a phrase #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Possibly related to: openai/whisper#1253 |
This can be easily reproduced with the sample: ./main -m ./models/ggml-large-v3-q5_0.bin -f samples/gb1.wav whisper_init_from_file_with_params_no_state: loading model from './models/ggml-large-v3-q5_0.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 8
whisper_model_load: qntvr = 2
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: CPU buffer size = 1080.97 MB
whisper_model_load: model size = 1080.47 MB
whisper_init_state: kv self size = 220.20 MB
whisper_init_state: kv cross size = 245.76 MB
whisper_init_state: compute buffer (conv) = 32.42 MB
whisper_init_state: compute buffer (encode) = 212.42 MB
whisper_init_state: compute buffer (cross) = 9.38 MB
whisper_init_state: compute buffer (decode) = 99.24 MB
system_info: n_threads = 1 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |
main: processing 'samples/gb1.wav' (3179927 samples, 198.7 sec), 1 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.980 --> 00:00:08.720] My fellow Americans, this day has brought terrible news and great sadness to our country.
[00:00:08.720 --> 00:00:17.280] At 9:00 this morning, Mission Control in Houston lost contact with our space shuttle Columbia.
[00:00:17.280 --> 00:00:24.640] A short time later, debris was seen falling from the skies above Texas.
[00:00:24.640 --> 00:00:27.200] The Columbia is lost.
[00:00:27.200 --> 00:00:29.860] There are no survivors.
[00:00:29.860 --> 00:00:32.920] On board was a crew of seven.
[00:00:32.920 --> 00:00:39.760] Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain
[00:00:39.760 --> 00:00:50.120] David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in
[00:00:50.120 --> 00:00:52.780] the Israeli Air Force.
[00:00:52.780 --> 00:00:59.720] These men and women assumed great risk in the service to all humanity in an age when
[00:00:59.720 --> 00:01:03.100] flight has come to seem almost routine.
[00:01:03.100 --> 00:01:08.720] It is easy to overlook the dangers of travel by rocket and the difficulties of navigating
[00:01:08.720 --> 00:01:12.580] the fierce outer atmosphere of the Earth.
[00:01:12.580 --> 00:01:19.220] These astronauts knew the dangers, and they faced them willingly, knowing they had a high
[00:01:19.220 --> 00:01:22.940] and noble purpose in life.
[00:01:22.940 --> 00:01:29.580] Because of their courage and daring and idealism, we will miss them all the more.
[00:01:29.580 --> 00:01:36.360] All Americans today are thinking as well of the families of these men and women who
[00:01:36.360 --> 00:01:40.440] have been given this sudden shock and grief.
[00:01:40.440 --> 00:01:42.340] You're not alone.
[00:01:42.340 --> 00:01:45.420] Our entire nation grieves with you.
[00:01:45.420 --> 00:01:52.340] And those you loved will always have the respect and gratitude of this country.
[00:01:52.340 --> 00:01:57.060] The cause in which they died will continue.
[00:01:57.060 --> 00:01:59.440] Mankind is led into the darkness.
[00:01:59.440 --> 00:02:02.200] But we will not be left behind.
[00:02:02.200 --> 00:02:04.200] We will be led into the darkness.
[00:02:04.200 --> 00:02:06.200] We will be led into the darkness.
[00:02:06.200 --> 00:02:08.200] We will be led into the darkness.
[00:02:08.200 --> 00:02:10.200] We will be led into the darkness.
[00:02:10.200 --> 00:02:12.200] We will be led into the darkness.
[00:02:12.200 --> 00:02:14.200] We will be led into the darkness.
[00:02:14.200 --> 00:02:16.200] We will be led into the darkness.
[00:02:16.200 --> 00:02:18.200] We will be led into the darkness.
[00:02:18.200 --> 00:02:20.200] We will be led into the darkness.
[00:02:20.200 --> 00:02:22.200] We will be led into the darkness.
[00:02:22.200 --> 00:02:24.200] We will be led into the darkness.
[00:02:24.200 --> 00:02:26.200] We will be led into the darkness.
[00:02:26.200 --> 00:02:28.200] We will be led into the darkness.
[00:02:28.200 --> 00:02:29.300] We will be led into the darkness.
|
any updates on this? I had the same problem using the large model v3 |
try to use flag -mc 0. It helps to avoid adding previous text prompt to new one |
Great solution. I thought I'd post a Python function that removes sequentially repeated lines in case you would like to keep your token history. It's worked successfully on a media library with ~14,000 videos:
|
Hi there. using the large model it sometimes starts to go in a loop or so and just repeat a sentence the rest of the transcript. with the following command:
./main -m models/ggml-large.bin "$output_file" -t 11 -l nl --output-txt --print-colors --best-of 3
the audio file is 20 min long. but I have seen it with other files aswell
The text was updated successfully, but these errors were encountered: