-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Bug]: Expected there to be 4 prompt updates corresponding to 4 image items, but instead found 3 prompt updates! Either the prompt text has missing/incorrect tokens for multi-modal inputs #15338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you pull the latest code? It should be fixed by #14980 |
still error: |
Can you show your code? How did you send the prompt to the model? |
@DarkLight1337 I am also seeing a similar issue:
I'm using the code here for prompting gemma3:
|
Are you directly running the example script or did you adapt it into your own code? If it's the latter case, can you show your code? In particular, what does |
I got the issue. It was breaking for me because of how the prompt was written in my code. |
I met the same problem in verl when calling vllm:
the row of data is look like:
the shell is like:
|
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
CUDA_VISIBLE_DEVICES=0,1 vllm serve abhishekchohan/gemma-3-27b-it-quantized-W4A16 --limit-mm-per-prompt 'image=4' --max-model-len 16384 --port 11455 --tensor-parallel-size 2 --disable-frontend-multiprocessing
when t 'image=3' it
s OK,but when image>3,ERROR occurs:
/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
INFO 03-23 00:51:57 [init.py:256] Automatically detected platform cuda.
INFO 03-23 00:51:58 [api_server.py:977] vLLM API server version 0.8.1
INFO 03-23 00:51:58 [api_server.py:978] args: Namespace(subparser='serve', model_tag='abhishekchohan/gemma-3-27b-it-quantized-W4A16', config='', host=None, port=11455, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=True, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='abhishekchohan/gemma-3-27b-it-quantized-W4A16', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=16384, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt={'image': 4}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f1180b35120>)
INFO 03-23 00:52:05 [config.py:583] This model supports multiple tasks: {'embed', 'generate', 'score', 'reward', 'classify'}. Defaulting to 'generate'.
INFO 03-23 00:52:06 [config.py:1515] Defaulting to use mp for distributed inference
INFO 03-23 00:52:06 [config.py:1693] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 03-23 00:52:07 [api_server.py:166] V1 is enabled, but got --disable-frontend-multiprocessing. To disable frontend multiprocessing, set VLLM_USE_V1=0.
/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
INFO 03-23 00:52:12 [init.py:256] Automatically detected platform cuda.
INFO 03-23 00:52:15 [core.py:53] Initializing a V1 LLM engine (v0.8.1) with config: model='abhishekchohan/gemma-3-27b-it-quantized-W4A16', speculative_config=None, tokenizer='abhishekchohan/gemma-3-27b-it-quantized-W4A16', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=abhishekchohan/gemma-3-27b-it-quantized-W4A16, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
WARNING 03-23 00:52:15 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 64 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 03-23 00:52:15 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1], buffer_handle=(2, 10485760, 10, 'psm_fb3fe6ca'), local_subscribe_addr='ipc:///tmp/16744176-fc2a-4c3b-b41d-d61384e41cc5', remote_subscribe_addr=None, remote_addr_ipv6=False)
/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
INFO 03-23 00:52:19 [init.py:256] Automatically detected platform cuda.
WARNING 03-23 00:52:21 [logger.py:202] VLLM_TRACE_FUNCTION is enabled. It will record every function executed by Python. This will slow down the code. It is suggested to be used for debugging hang or crashes only.
INFO 03-23 00:52:21 [logger.py:206] Trace frame log is saved to /tmp/root/vllm/vllm-instance-8a254/VLLM_TRACE_FUNCTION_for_process_1044842_thread_140250240029696_at_2025-03-23_00:52:21.843371.log
WARNING 03-23 00:52:24 [utils.py:2282] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f8d621d54e0>
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:24 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_6488a66e'), local_subscribe_addr='ipc:///tmp/b32a810d-c827-4a21-b4ee-ec4a4256fc43', remote_subscribe_addr=None, remote_addr_ipv6=False)
/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/transformers/utils/hub.py:106: FutureWarning: Using
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.warnings.warn(
INFO 03-23 00:52:28 [init.py:256] Automatically detected platform cuda.
WARNING 03-23 00:52:31 [logger.py:202] VLLM_TRACE_FUNCTION is enabled. It will record every function executed by Python. This will slow down the code. It is suggested to be used for debugging hang or crashes only.
INFO 03-23 00:52:31 [logger.py:206] Trace frame log is saved to /tmp/root/vllm/vllm-instance-8a254/VLLM_TRACE_FUNCTION_for_process_1045201_thread_140466852819968_at_2025-03-23_00:52:31.020890.log
WARNING 03-23 00:52:33 [utils.py:2282] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fbfd13ed210>
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:33 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_879ea3f8'), local_subscribe_addr='ipc:///tmp/78c07136-1218-4cb4-9b82-8731982b2ce7', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:34 [utils.py:925] Found nccl from library libnccl.so.2
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:34 [pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:34 [utils.py:925] Found nccl from library libnccl.so.2
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:34 [pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:34 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:34 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
(VllmWorker rank=1 pid=1045201) WARNING 03-23 00:52:34 [custom_all_reduce.py:146] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorker rank=0 pid=1044842) WARNING 03-23 00:52:34 [custom_all_reduce.py:146] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:34 [shm_broadcast.py:258] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_0f0d6ce7'), local_subscribe_addr='ipc:///tmp/f1147929-3b1b-44ba-afcd-fb71098476b8', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:34 [parallel_state.py:967] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:34 [cuda.py:215] Using Flash Attention backend on V1 engine.
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:34 [parallel_state.py:967] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:34 [cuda.py:215] Using Flash Attention backend on V1 engine.
(VllmWorker rank=1 pid=1045201) Using a slow image processor as
use_fast
is unset and a slow processor was saved with this model.use_fast=True
will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor withuse_fast=False
.(VllmWorker rank=0 pid=1044842) Using a slow image processor as
use_fast
is unset and a slow processor was saved with this model.use_fast=True
will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor withuse_fast=False
.(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:41 [gpu_model_runner.py:1164] Starting to load model abhishekchohan/gemma-3-27b-it-quantized-W4A16...
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:41 [gpu_model_runner.py:1164] Starting to load model abhishekchohan/gemma-3-27b-it-quantized-W4A16...
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:42 [config.py:3222] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 296, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 104, 496, 368, 240, 112, 376]
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:42 [config.py:3222] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 296, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 104, 496, 368, 240, 112, 376]
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:42 [compressed_tensors_wNa16.py:85] Using MarlinLinearKernel for CompressedTensorsWNA16
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:42 [compressed_tensors_wNa16.py:85] Using MarlinLinearKernel for CompressedTensorsWNA16
(VllmWorker rank=0 pid=1044842) 2025-03-23 00:52:48,586 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
(VllmWorker rank=1 pid=1045201) 2025-03-23 00:52:48,591 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:48 [topk_topp_sampler.py:38] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling.
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:48 [topk_topp_sampler.py:38] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:02, 1.03it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:02<00:02, 1.06s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:03<00:01, 1.32s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:04<00:00, 1.26s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:04<00:00, 1.23s/it]
(VllmWorker rank=0 pid=1044842)
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:54 [loader.py:429] Loading weights took 5.12 seconds
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:54 [loader.py:429] Loading weights took 5.15 seconds
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:56 [gpu_model_runner.py:1176] Model loading took 8.2413 GB and 14.465126 seconds
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:56 [gpu_model_runner.py:1176] Model loading took 8.2413 GB and 14.544127 seconds
(VllmWorker rank=0 pid=1044842) INFO 03-23 00:52:56 [gpu_model_runner.py:1421] Encoder cache will be initialized with a budget of 2048 tokens, and profiled with 8 image items of the maximum feature size.
(VllmWorker rank=1 pid=1045201) INFO 03-23 00:52:56 [gpu_model_runner.py:1421] Encoder cache will be initialized with a budget of 2048 tokens, and profiled with 8 image items of the maximum feature size.
(VllmWorker rank=1 pid=1045201) Truncation was not explicitly activated but
max_length
is provided a specific value, please usetruncation=True
to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy totruncation
.(VllmWorker rank=0 pid=1044842) Truncation was not explicitly activated but
max_length
is provided a specific value, please usetruncation=True
to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy totruncation
.ERROR 03-23 00:52:57 [core.py:340] EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 332, in run_engine_core
ERROR 03-23 00:52:57 [core.py:340] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 287, in init
ERROR 03-23 00:52:57 [core.py:340] super().init(vllm_config, executor_class, log_stats)
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 62, in init
ERROR 03-23 00:52:57 [core.py:340] num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 121, in _initialize_kv_caches
ERROR 03-23 00:52:57 [core.py:340] available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 66, in determine_available_memory
ERROR 03-23 00:52:57 [core.py:340] output = self.collective_rpc("determine_available_memory")
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 133, in collective_rpc
ERROR 03-23 00:52:57 [core.py:340] raise e
ERROR 03-23 00:52:57 [core.py:340] File "/home/anaconda3/envs/xinference14/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 122, in collective_rpc
ERROR 03-23 00:52:57 [core.py:340] raise result
ERROR 03-23 00:52:57 [core.py:340] RuntimeError: Expected there to be 4 prompt updates corresponding to 4 image items, but instead found 3 prompt updates! Either the prompt text has missing/incorrect tokens for multi-modal inputs, or there is a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between
_call_hf_processor
and_get_prompt_updates
).ERROR 03-23 00:52:57 [core.py:340]
CRITICAL 03-23 00:52:57 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
已杀死
`
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: