Dockerfile.ppc64le changes to move to UBI #15402

Shafi-Hussain · 2025-03-24T14:51:21Z

What was changed?

torch, torchvision, torchaudio dependencies for ppc64le have been updated to sync with x86
Dockerfile.ppc64le has been updated to use UBI9 as the base image and dependencies built from source

Build & Test Instructions

Build

# podman build -t vllmups -f Dockerfile.ppc64le . --jobs=0

# podman images
REPOSITORY                                   TAG             IMAGE ID      CREATED       SIZE
localhost/vllmups                            latest          941a39050cd7  43 hours ago  1.82 GB

Test

# podman run -idt --name=vllm --entrypoint=/bin/bash localhost/vllmups
# podman exec -it vllm bash

Verify OpenAI endpoint in the attached container

# python -m vllm.entrypoints.openai.api_server
INFO 03-24 14:28:18 [__init__.py:256] Automatically detected platform cpu.
INFO 03-24 14:28:20 [api_server.py:981] vLLM API server version 0.8.2.dev49+gda6ea29f.d20250322
INFO 03-24 14:28:20 [api_server.py:982] args: Namespace(host=None, port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='facebook/opt-125m', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False)
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 6.69MB/s]
INFO 03-24 14:28:20 [config.py:2549] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC.
WARNING 03-24 14:28:20 [config.py:2593] Casting torch.float16 to torch.bfloat16.
INFO 03-24 14:28:27 [config.py:585] This model supports multiple tasks: {'score', 'embed', 'classify', 'reward', 'generate'}. Defaulting to 'generate'.
WARNING 03-24 14:28:27 [arg_utils.py:1783] device type=cpu is not supported by the V1 Engine. Falling back to V0.
WARNING 03-24 14:28:27 [cpu.py:94] Environment variable VLLM_CPU_KVCACHE_SPACE (GB) for CPU backend is not set, using 4 by default.
WARNING 03-24 14:28:27 [cpu.py:107] uni is not supported on CPU, fallback to mp distributed executor backend.
INFO 03-24 14:28:27 [api_server.py:241] Started engine process with PID 33
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 6.68MB/s]
vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 11.7MB/s]
merges.txt: 100%|█████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 37.0MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████| 441/441 [00:00<00:00, 6.38MB/s]
INFO 03-24 14:28:30 [__init__.py:256] Automatically detected platform cpu.
INFO 03-24 14:28:31 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev49+gda6ea29f.d20250322) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=facebook/opt-125m, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 1.39MB/s]
INFO 03-24 14:28:32 [cpu.py:40] Using Torch SDPA backend.
INFO 03-24 14:28:32 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 03-24 14:28:32 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 03-24 14:28:33 [weight_utils.py:257] Using model weights format ['*.bin']
pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████| 251M/251M [00:01<00:00, 230MB/s]
INFO 03-24 14:28:34 [weight_utils.py:273] Time spent downloading weights for facebook/opt-125m: 1.617761 seconds
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.61it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.60it/s]

INFO 03-24 14:28:34 [loader.py:429] Loading weights took 0.18 seconds
INFO 03-24 14:28:34 [executor_base.py:111] # cpu blocks: 7281, # CPU blocks: 0
INFO 03-24 14:28:34 [executor_base.py:116] Maximum concurrency for 2048 tokens per request: 56.88x
INFO 03-24 14:28:34 [llm_engine.py:447] init engine (profile, create kv cache, warmup model) took 0.14 seconds
INFO 03-24 14:28:35 [api_server.py:1028] Starting vLLM API server on http://0.0.0.0:8000
INFO 03-24 14:28:35 [launcher.py:26] Available routes are:
INFO 03-24 14:28:35 [launcher.py:34] Route: /openapi.json, Methods: HEAD, GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /docs, Methods: HEAD, GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /redoc, Methods: HEAD, GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /health, Methods: GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /load, Methods: GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /ping, Methods: GET, POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /tokenize, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /detokenize, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/models, Methods: GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /version, Methods: GET
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/chat/completions, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/completions, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/embeddings, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /pooling, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /score, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/score, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/audio/transcriptions, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /rerank, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v1/rerank, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /v2/rerank, Methods: POST
INFO 03-24 14:28:35 [launcher.py:34] Route: /invocations, Methods: POST
INFO:     Started server process [25]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

github-actions · 2025-03-24T14:51:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Md. Shafi Hussain <[email protected]>

DarkLight1337 · 2025-03-24T15:26:39Z

Please fix the commit errors

Shafi-Hussain · 2025-03-25T06:52:29Z

Please fix the commit errors

@DarkLight1337 The failed buildkite jobs are failing with some other Dockerfile with nvidia base image and not for the changes made in this PR

DarkLight1337 · 2025-03-25T06:54:22Z

Can you push a commit to trigger a new build?

mkumatag · 2025-03-25T06:57:51Z

Can you push a commit to trigger a new build?

may be a dummy commit.

Signed-off-by: Md. Shafi Hussain <[email protected]>

Signed-off-by: Md. Shafi Hussain <[email protected]> Signed-off-by: Wes Medford <[email protected]>

Signed-off-by: Md. Shafi Hussain <[email protected]>

Signed-off-by: Md. Shafi Hussain <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Md. Shafi Hussain <[email protected]>

mergify bot added the ci/build label Mar 24, 2025

Shafi-Hussain added 2 commits March 24, 2025 15:06

ppc64le Dockerfile changes for vllm

ee48cfb

Signed-off-by: Md. Shafi Hussain <[email protected]>

Bumping torch, torchvision, torchaudio version for ppc64le

328128e

Signed-off-by: Md. Shafi Hussain <[email protected]>

Shafi-Hussain force-pushed the vllm-dockerfile-ppc64le branch from 334b352 to 328128e Compare March 24, 2025 15:07

DarkLight1337 approved these changes Mar 24, 2025

View reviewed changes

Shafi-Hussain marked this pull request as draft March 25, 2025 05:15

Shafi-Hussain force-pushed the vllm-dockerfile-ppc64le branch from 67c1530 to 328128e Compare March 25, 2025 06:08

Shafi-Hussain marked this pull request as ready for review March 25, 2025 06:43

dummy commit

7febdfa

Signed-off-by: Md. Shafi Hussain <[email protected]>

DarkLight1337 enabled auto-merge (squash) March 25, 2025 07:06

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 25, 2025

DarkLight1337 merged commit 3e2f37a into vllm-project:main Mar 25, 2025
67 checks passed

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

Dockerfile.ppc64le changes to move to UBI (vllm-project#15402)

fcd2e4e

Signed-off-by: Md. Shafi Hussain <[email protected]>

wrmedford pushed a commit to wrmedford/vllm that referenced this pull request Mar 26, 2025

Dockerfile.ppc64le changes to move to UBI (vllm-project#15402)

d218000

Signed-off-by: Md. Shafi Hussain <[email protected]> Signed-off-by: Wes Medford <[email protected]>

lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025

Dockerfile.ppc64le changes to move to UBI (vllm-project#15402)

87309ae

Signed-off-by: Md. Shafi Hussain <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

Dockerfile.ppc64le changes to move to UBI (vllm-project#15402)

96d572e

Signed-off-by: Md. Shafi Hussain <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Shafi-Hussain mentioned this pull request Apr 8, 2025

[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet #16048

Open

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

Dockerfile.ppc64le changes to move to UBI (vllm-project#15402)

bafabe5

Signed-off-by: Md. Shafi Hussain <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile.ppc64le changes to move to UBI #15402

Dockerfile.ppc64le changes to move to UBI #15402

Shafi-Hussain commented Mar 24, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 24, 2025

DarkLight1337 commented Mar 24, 2025

Shafi-Hussain commented Mar 25, 2025

DarkLight1337 commented Mar 25, 2025

mkumatag commented Mar 25, 2025

Dockerfile.ppc64le changes to move to UBI #15402

Dockerfile.ppc64le changes to move to UBI #15402

Conversation

Shafi-Hussain commented Mar 24, 2025 • edited by github-actions bot Loading

What was changed?

Build & Test Instructions

Build

Test

github-actions bot commented Mar 24, 2025

DarkLight1337 commented Mar 24, 2025

Shafi-Hussain commented Mar 25, 2025

DarkLight1337 commented Mar 25, 2025

mkumatag commented Mar 25, 2025

Shafi-Hussain commented Mar 24, 2025 •

edited by github-actions bot

Loading